At no point do I guarantee usable results, even if you follow the instructions, all I can say is it worked for me, I also don’t suggest that any of this wont cause unintended data leakage, and I don’t imply that this is within local laws, guidelines, TOS agreements, or anything like that. If you do this perfectly it may still catch on fire or yell obscenities at grandma, but thats pretty unlikely.
This guide is designed to help you create a bot that can be used to archive RSS items to an email address, it does not do any real “web scraping”, but can be a great way to keep updated about gigs, as you read through, you’ll see other ways you can use this to archive almost anything. The drawbacks of this simple method are a few main things, RSS feeds are usually truncated, which means you may have to click through to read the full content, additionally, since this article is aimed at job hunting, the other big drawback is that it won’t give you contact information in the feed, like email or phone numbers. I’ll write up another article for shimming in the tools which will help us extract that information in the near future. Also, since we’re basically copying text from the RSS feed, we won’t be notified if the content changes, or when it goes away. This article is intended for research use so you can get into playing with RSS feeds, and is not intended for private or production use. If you are using it “for profit” you will likely be violating TOS and you generally should not rely on free services, as they may change how they work without notification.
This is a beginner level automation project, It was the first stage of a “bot” I’d written quite a long time ago, the finished bot included a lot more, a service called yahoo pipes (which is gone now) lots of regex and the end result was a bot that generated cover letters and automatically applied for jobs by email, it also attached a PDF generated version of my current resume as was found on linkedin, pipes is gone now, but as part of this series I’ll try to recreate the whole thing, this (part 1) requires very little technical knowledge, and as you’ll see, at the first stage requires no code be written at all. You need an email account for your bot, a basic understanding of using search filters, and an account at ifttt. Essentially, ifttt is the motor for this, and it works by getting permission to access your accounts, you tell it to do things based on events, you may not want ifttt to have unfettered access to your personal accounts, so it may be a better choice to use a dedicated “robot” email account I recommend gMail because of its “other” account features which we will be taking advantage of down the road.
We’ll start with is the Gigs section in Craigslist. We’ll be using search operators to reduce the junky-ness of the results, as you’ll see, the blacklisting of certain terms only offers an improvement, definitely not a final result. I’ve already set a search string, common words or pairings that generally look spammy so they wont be included in our results. Again, this is the “blacklisting” approach, which means removing stuff we think is bad, a better approach is to combine it with whitelisting (only include results that meet a certain requirement), which will get us usable results. Here’s the link for the search with Blacklisting, and heres a link with the search combined with whitelisting,, this adds lawn and grass so the results will only include fewer spamishly written posts which also include lawn or grass, it should default to the most recent city you used.
Work on that a bit, unless your looking for lawn cutting jobs nearby. The next step is to make it notify you, in this case we’ll be using CL’s RSS feature and IFTTT to send us an email.
Across the bottom of the page, you should see something like this,
Click the orange RSS icon on your own search results on the craigslist page. That will take you to a page called an RSS Feed, its not the prettiest, but you only need to look at it long enough to verify your results. Here’s my generated link, if you look closely at the url and your into scripting, you’ll notice it could be fun to integrate into something that automates creation of contextual searches by modifying the URL, thats another rabbit hole we’ll be exploring later.
On to ifttt, Setup your account, as I’d said earlier, you may want to use a dedicated bot account, if you haven’t yet created an email account for your bot, go ahead and do that, or use your personal account. Again, I don’t recommend using your personal email/gmail account. So log into ifttt…
Click your name/username on the top right.
Click the “+”(plus) Sign, next to the word “This”
Type “feed” in the searchbar, click feed.
Paste the Craigslist RSS url we created earlier and submit
Click the “+that”
Email, send yourself an email, and you can leave the default settings.
Your pretty much done at that point. There are some improvements we can make.
Find our free private WordPress hosting guide, Email the posts to that wordpress page. If its publicly accessible, then you may be violating the TOS.
Forward it to your phone via text message (or sign in to the email account) so you can get notifications immediately about new jobs that fit what your looking for.
Use some web scraping techniques to grab the full text of the post, and extract contact information, we’ll see that this has LOT’s of caveats, and can be quite time consuming and may require regular maintenance when one of your 10’s or hundreds of source sites makes changes to their formatting.
As we work through the automated job robot, next we’ll look into integrating other job board type sites, we’ll start with a list of easier ones which can email you. Well take those emails and run them through some processing/filtering to make sure they actually fit *you*, and then well look at integrating more complex sites, like job boards without RSS feeds.
Lastly, we could may a scoring system, that way when a perfect job gets posted you get an alert with extra enthusiasm. We can look into generating a (kind of) unique cover letter, that gets deep, it can be as unique as you want, but you’ll see as it gets more complex it gets exponentially more complex, which is why those automatic generated emails usually look like crap, ours will be better. And we’ll look into automatically attaching a fresh copy of our resume. Once the whole thing is complete, if you get it just right, you’ll might be able to simply sit back and wait for the calls of employers who might like to meet you.
Any images are owned by their respective companies, the gmail logo was taken from the gmail wikipedia page, craigslist was a screenshot of the craigslist.org page, and ifttt logo was taken from a screenshot of the ifttt page. Craigslist results images are from craigslist.