Posted on April 2nd, 2008 in Coding
If you’re a webmaster or coder like Nickycakes, you rely quite heavily on transferring files back and forth from your hard drive to your webhosting server. With the most rudimentary of ftp programs, the process goes something like this: 1) write code 2) upload it to the server with ftp 3) test it 4) make changes 5) upload to server again, etc. This can be quite annoying, so what Nickycakes first did was set up a webserver on his home machine to test everything first before uploading it to save some time. That works OK, but hosting a test server can be annoying, and you still eventually have to ftp everything to the server, and you run into issues with different versions of files in different places, etc. Short story: BIG HEADACHE.
The next step up from just your basic ftp program in terms of productivity would be a more advanced ftp client like SmartFTP. The main reason Nickycakes likes SmartFTP and programs like it is the ability to right-click and click edit on a file on the remote server, allowing you to edit the file locally and then upload when you’ve finished making the changes. There are, of course, drawbacks to this. The main one being you can’t edit more than one file at a time, which is many times required when you start working on big projects. Again: BIG HEADACHE.
A few weeks ago, a buddy on IRC mentioned a program he uses to map a remote FTP directory to a local drive in windows. It wasn’t until yesterday that Nickycakes had enough with FTP clients and look into this solution. The Cakes tried two different programs for this, which do pretty much the exact same thing: Webdrive and SFTPDrive.
They were pretty much identical in functionality, but had a few small….availability issues. Webdrive is produced by a company in Nickycakes’ home town of Annapolis, MD, however, has a much shorter free trial period and is less…available…on *ahem* piratebay… So the winner ended up being SFTPDrive, even though they’re both about the same.
Just since yesterday this has done a few things to the way Nickycakes operates when designing and fixing his stuff online. First off, things are much faster. Creating and launching new php code has had a few major steps removed from the process which is awesome. But the biggest improvement has been the ability to seamlessly switch from computer to computer to do coding work. Before, Nicky had to set up shared folders on the main local computer with all the files he was working on if he wanted to switch to a different pc to do some work (for example going out to a cafe with the laptop). Now, it’s as easy as installing PSPad (cakes’ editor of choice) and this SFTPDrive and let the fun begin.
That’s about it. Hope this saves you a ton of time. Keep it real.
Published by nickycakes //
For a recent project, Nickycakes had to code a Wikipedia scraper. Here’s a simplified version of the function for you to use if you want. This code requires a few library files, which are included in LIB_http.zip.
Enjoy:
include('LIB_http.php');
include('LIB_parse.php');
function wikiscrape($topic){
$target = "http://en.wikipedia.org/wiki/".urlencode($topic);
$results = http_get($target,"");
$paragraphs = parse_array($results['FILE'],"<p>","</p>",EXCL);
foreach($paragraphs as $paragraph){
$paragraph = strip_tags($paragraph);
$paragraph = preg_replace("[\[.*\]]","",$paragraph);
if ($paragraph){
$final = $final . $paragraph . "\n\n";
}
}
return $final;
}
Published by nickycakes //
Posted on December 22nd, 2007 in Automation, Coding
If you have not read Part 1, please take a moment to do so:
Scraping Websites for Fun and Profit Part 1
A few weeks ago, Nickycakes wrote about getting your feet wet with website scraping. If you’re interested in learning how to use php to grab content from other sites automatically, you should check out this book. It basically has everything you need to get started.
Anyway, the author of said book has published a set of library files for php that make scraping and parsing anything on the web fairly painless. You can download the entire set of library files here:
http://www.nickycakes.com/files/LIB_http.zip
There are a bunch of files inside, but you will probably only be using a couple of them for most tasks: LIB_http.php, LIB_parse.php. You can include the functions from these libraries by putting them in the same directory as your php script and, in your php script putting the line include (”LIB_http.php”); Inside each of the files is a description of the functions they include. LIB_http.php will have to be edited a little bit if you’re writing a scraper to make it look like your script is a browser and not a php script.
Here’s a description of the most useful functions in these files for scraping websites:
LIB_http.php
- http_get($target, $ref)
You give it $target (url you want to grab) and $ref (where you want the website to think you came from) and it will return an array with 3 variables, FILE, STATUS, and ERROR. $return_array[’FILE’] will have the contents of the webpage, $return_array[’STATUS’] will have the curl status of the transfer, and $return_array[’ERROR’] will have the curl error status. Example:
$target = “http://www.google.com”;
$ref = “http://www.yahoo.com”;
$google_frontpage = http_get($target, $ref);
echo($google_frontpage[’FILE’]);
Displays google frontpage.
- http_post_form($target, $ref, $data_array)
Submit a form with POST method. Same $target and $ref information as above. $data_array should include the information you’re submitting. Example:
$target = “https://login.facebook.com/login.php”;
$ref = “http://www.facebook.com”;
$data_array[’email’]=”your@email.com”;
$data_array[’pass’]=”password”;
$results = http_post_form($target,$ref,$data_array);
echo($results[’FILE’]);
Congrats…now you’re logged into facebook. (you may have to run the script twice initially as curl sets up your cookie file.)
- http_get_form($target,$ref,$data_array)
Works the same way as post, but does it with GET method.
LIB_parse.php
Ok, so it’s probably better for you to just open this one up and read the comments. There are a few simple functions in here that should let you easily parse any website without knowing how to use Regular Expressions. Regular Expression functions in php, in addition to being hard to learn for a newbie, are stupidly inefficient and will slow your programs down, so you don’t want to use them for parsing websites anyway.
If you are going to be scraping websites that require you submit form information, you will want to download and install Web Developer Toolbar for Firefox to help you figure out the form field information in a hurry without viewing the page source.
Hope this helps.
Published by nickycakes //
Posted on December 5th, 2007 in Cloaking, Coding
One thing about google that you most likely know already, but may not, is that they penalize you for having affiliate links on your page. If you use adwords, this equates to a lower quality score which in turn makes your cost per click more expensive. For normal ranking in google search, this means a lower position in the SERPs (search engine result pages).
One thing about spammers is, they like to scrape email addresses off webpages. Put your email addy on the front page of your blog, and wait a month, and feel the love as you get hundreds of ads for penis pumps and viagra clogging your inbox.
Fortunately for you, both of these birds can be killed with one stone. Just convert the link to unicode, throw it on your site, and it will be near invisible to nearly all scrapers and web spiders. Now you’re probably asking yourself, “But nickycakes, how do we convert a link to unicode?” You do it with this tool, duh:
Published by nickycakes //
Last week Nickycakes wrote an auto scraper, auto friend adder, auto message sender, and auto comment poster for a major social networking site in under a day. The software works flawlessly and could potentially bring in thousands of dollars from spam comment posts with affiliate links if the user were so inclined. But that’s not the point of this post.
Scraping content from other websites is one of the best ways to set up fully automated content rich websites. What types of websites? Local business directories can be created in under an hour by scraping contact information from a yellow pages style site. Weather data can be scraped from numerous sites to be reformatted and turned into a widget of some sort. Wikipedia can be easily scraped for a blurb of relevant info on pretty much any topic. Ebay can be scraped for bid information. The possibilities are pretty much endless, but one thing is for sure. There’s nothing better than making fully automated websites that update themselves with new content and pull in money without any real work after they’ve been created.
Nickycakes has been building content scrapers of different kinds for a while using php, but wanted to step up his game and learn how to make scrapers that properly handled logging into websites, storing cookies, and actually looking like a web browser. All that stuff can be easily accomplished with php/curl. So a couple weeks ago, the Cakes ordered a book off amazon called Webbots, Spiders, and Screen Scrapers. You can probably find it at borders books or something as well.
In that book was enough information to write scripts for logging on, submitting forms, parsing pretty much anything you could imagine, etc. Writing a social networking scraper/adder/commenter was a breeze.
So if you don’t want to buy a book, there are quite a few resources online to help you get started. Here are a few:
PHP CURL manual
Smaxor’s newbie curl form submitting tutorial
Using libcurl with PHP
Make sure you grab this toolbar which will help you dissect any webpage’s forms quick and easy like:
Webmaster Toolbar
So the real question is: Anyone know an affiliate network that doesn’t mind spammy social networking traffic? ;D
Published by nickycakes //