Scraping Websites for Fun and Profit
Posted on December 4th, 2007 in Automation, Coding, Traffic Generation
Last week Nickycakes wrote an auto scraper, auto friend adder, auto message sender, and auto comment poster for a major social networking site in under a day. The software works flawlessly and could potentially bring in thousands of dollars from spam comment posts with affiliate links if the user were so inclined. But that’s not the point of this post.
Scraping content from other websites is one of the best ways to set up fully automated content rich websites. What types of websites? Local business directories can be created in under an hour by scraping contact information from a yellow pages style site. Weather data can be scraped from numerous sites to be reformatted and turned into a widget of some sort. Wikipedia can be easily scraped for a blurb of relevant info on pretty much any topic. Ebay can be scraped for bid information. The possibilities are pretty much endless, but one thing is for sure. There’s nothing better than making fully automated websites that update themselves with new content and pull in money without any real work after they’ve been created.
Nickycakes has been building content scrapers of different kinds for a while using php, but wanted to step up his game and learn how to make scrapers that properly handled logging into websites, storing cookies, and actually looking like a web browser. All that stuff can be easily accomplished with php/curl. So a couple weeks ago, the Cakes ordered a book off amazon called Webbots, Spiders, and Screen Scrapers. You can probably find it at borders books or something as well.
In that book was enough information to write scripts for logging on, submitting forms, parsing pretty much anything you could imagine, etc. Writing a social networking scraper/adder/commenter was a breeze.
So if you don’t want to buy a book, there are quite a few resources online to help you get started. Here are a few:
PHP CURL manual
Smaxor’s newbie curl form submitting tutorial
Using libcurl with PHP
Make sure you grab this toolbar which will help you dissect any webpage’s forms quick and easy like:
Webmaster Toolbar
So the real question is: Anyone know an affiliate network that doesn’t mind spammy social networking traffic? ;D






December 4th, 2007 at 10:11 pm
Nice book find, Im gonna have to add it to my xmas gift list(for myself). Any plans on selling some of these scrapers, or making custom scrapers for sale ?
December 5th, 2007 at 9:14 am
if you have something you want you could just hit me up about it. i’m not opposed to the idea by any stretch.
December 5th, 2007 at 10:30 pm
Great book recommendation. Bought it yesterday and should have it by Friday =)
BTW nickycakes, whats your usual hourly rate for building a customized script? or do you charge on a per job basis?
December 5th, 2007 at 10:31 pm
One more thing, doesn’t webmaster toolbar consist of the same quality are firebug?
December 5th, 2007 at 10:47 pm
Per job. Feel free to hit me up via my contact page any time. And I’ve never used firebug, so I would suggest trying both to see which is better. I’m happy with webmaster toolbar, though =)
December 9th, 2007 at 11:25 pm
I just finished reading this book, great read.
I highly recommend it.
December 22nd, 2007 at 8:53 pm
[…] Scraping Websites for Fun and Profit Part 2 Posted by nickycakes Published in Automation, Coding submit_url = “http://www.nickycakes.com/scraping-websites-for-fun-and-profit-part-2/”; If you have not read Part 1, please take a moment to do so: Scraping Websites for Fun and Profit Part 1 […]