Posted on February 1st, 2008 in Automation
Ok folks, here are some of Nickycakes’ favorite firefox plugins for doin internet marketing webdesign coding affiliate type stuff.
Search Status - http://www.quirk.biz/searchstatus/
This plugin does a few cool things. It shows you the Google Pagewank and Alexa rank of the current page, and by right clicking on the icon it puts at the bottom of your browser, you can opt to have it highlight no-follow links, which is invaluable on the fly sometimes. Especially useful when doing link building and looking for good places to drop your bombs.
Web Developer - http://chrispederick.com/work/web-developer/
Man, when Nickycakes started using this plugin, things changed man…things changed. Want to see detailed information about every form on a page? No prob. Want to look at/clear/whatever cookies for the current page only? No prob. View divs on a page by mousing over them? No prob. Disable images, view meta tags, view javascript, edit the page html, validate pretty much anything, whatever. No prob. This tool is especially useful for writing scraper scripts due to the form display functionality.
Firebug - http://www.getfirebug.com/
Awesome for showing exactly how fast your page is loading and what’s causing it to load slowly, etc. Probably a bunch of other functions the Cakes hasn’t bothered looking at too.
RefControl - http://www.stardrifter.org/refcontrol/
Fake your referer url. Fun for the whole family.
Adsense Notifier - https://addons.mozilla.org/en-US/firefox/addon/500
Saves you a trip to the adsense website.
Published by nickycakes //

Today, Nickycakes is releasing Linky. Linky lets you input an unlimited number of anchor texts and urls, and generate random interlinking between all the urls. No links will be reciprocal, there are no self-links, and no links are displayed twice on the same page. There are an even number of incoming and outgoing links to each page.
Once the script has generated a database of the links, you put a simple code on each of your pages that will automatically detect what page it’s on and display the proper outgoing links.
Anyway, there’s a readme file in the zip. Hope someone finds it useful. Nickycakes sure did. If it doesn’t work properly, just shoot the Cakes an email.
Download: linky.zip
Published by nickycakes //
Almost exactly one year ago, Eli from BluehatSEO made a post on MadLib sites. The basic idea behind a MadLib site is this: You make a script that will randomly (or not so randomly) generate pages from a database of information by inserting different pieces of information into each post to make them unique. For example: Today I saw *girlsname* go into *storename* and buy *productname*. You can generate thousands and thousands of unique pages this way with very little effort and get indexed by google without duplicate content penalty if you do a decent job.
Well, the problem is, not everyone has the ability to code their own madlib system, and not everyone owns large databases to use for the process. That’s where Datapresser comes in.
About 6 months ago, Rob at Seocracy started working on Datapresser. It’s a comprehensive madlib blog generating package. All you have to do is enter your madlib text and title using their formatting system using their huge databases of information, and it will generate thousands of unique content pages, in wordpress format, for you to slap on your wordpress blog and let ‘er go.
Saturday morning, Rob launched Datapresser.com with a fee of a modest $45 a month and 60 available signups. The 60 signups were gone in under 30 minutes. $2700 in 30 minutes ain’t bad, to be honest. If you weren’t able to get an account on Saturday, don’t worry. Rob is working on getting more server’s up to be able to handle more users, so there will still be opportunity to get your spot.
Anyway, Nickycakes is extremely grateful to have been given an account to check out Datapresser and see what it’s all about. At first blush, it’s great. The interface is clean, which the Cakes loves. The help file is…..well…helpful. And it works damn quick. There are 3 output formats for the data, including 2 for wordpress, and one is text output for use with pretty much anything.
If you don’t understand the scope of a tool like this, think about it a little harder. With Datapresser, or any madlib script, you can automatically generate thousands of unique articles for ANY niche you can think of. Go to your favorite affiliate network, open up the offers, and pick one. Want to make a site with thousands of keyword content rich pages related to that offer? Want to do it in under an hour? No prob!
So anyway, if you’re interested, keep checking seocracy and datapresser for updates on when the next batch of signups will be available.
And thanks again Rob. Great tool.
Published by nickycakes //
For a recent project, Nickycakes had to code a Wikipedia scraper. Here’s a simplified version of the function for you to use if you want. This code requires a few library files, which are included in LIB_http.zip.
Enjoy:
include('LIB_http.php');
include('LIB_parse.php');
function wikiscrape($topic){
$target = "http://en.wikipedia.org/wiki/".urlencode($topic);
$results = http_get($target,"");
$paragraphs = parse_array($results['FILE'],"<p>","</p>",EXCL);
foreach($paragraphs as $paragraph){
$paragraph = strip_tags($paragraph);
$paragraph = preg_replace("[\[.*\]]","",$paragraph);
if ($paragraph){
$final = $final . $paragraph . "\n\n";
}
}
return $final;
}
Published by nickycakes //
Posted on December 22nd, 2007 in Automation, Coding
If you have not read Part 1, please take a moment to do so:
Scraping Websites for Fun and Profit Part 1
A few weeks ago, Nickycakes wrote about getting your feet wet with website scraping. If you’re interested in learning how to use php to grab content from other sites automatically, you should check out this book. It basically has everything you need to get started.
Anyway, the author of said book has published a set of library files for php that make scraping and parsing anything on the web fairly painless. You can download the entire set of library files here:
http://www.nickycakes.com/files/LIB_http.zip
There are a bunch of files inside, but you will probably only be using a couple of them for most tasks: LIB_http.php, LIB_parse.php. You can include the functions from these libraries by putting them in the same directory as your php script and, in your php script putting the line include (”LIB_http.php”); Inside each of the files is a description of the functions they include. LIB_http.php will have to be edited a little bit if you’re writing a scraper to make it look like your script is a browser and not a php script.
Here’s a description of the most useful functions in these files for scraping websites:
LIB_http.php
- http_get($target, $ref)
You give it $target (url you want to grab) and $ref (where you want the website to think you came from) and it will return an array with 3 variables, FILE, STATUS, and ERROR. $return_array[’FILE’] will have the contents of the webpage, $return_array[’STATUS’] will have the curl status of the transfer, and $return_array[’ERROR’] will have the curl error status. Example:
$target = “http://www.google.com”;
$ref = “http://www.yahoo.com”;
$google_frontpage = http_get($target, $ref);
echo($google_frontpage[’FILE’]);
Displays google frontpage.
- http_post_form($target, $ref, $data_array)
Submit a form with POST method. Same $target and $ref information as above. $data_array should include the information you’re submitting. Example:
$target = “https://login.facebook.com/login.php”;
$ref = “http://www.facebook.com”;
$data_array[’email’]=”your@email.com”;
$data_array[’pass’]=”password”;
$results = http_post_form($target,$ref,$data_array);
echo($results[’FILE’]);
Congrats…now you’re logged into facebook. (you may have to run the script twice initially as curl sets up your cookie file.)
- http_get_form($target,$ref,$data_array)
Works the same way as post, but does it with GET method.
LIB_parse.php
Ok, so it’s probably better for you to just open this one up and read the comments. There are a few simple functions in here that should let you easily parse any website without knowing how to use Regular Expressions. Regular Expression functions in php, in addition to being hard to learn for a newbie, are stupidly inefficient and will slow your programs down, so you don’t want to use them for parsing websites anyway.
If you are going to be scraping websites that require you submit form information, you will want to download and install Web Developer Toolbar for Firefox to help you figure out the form field information in a hurry without viewing the page source.
Hope this helps.
Published by nickycakes //