For a recent project, Nickycakes had to code a Wikipedia scraper. Here’s a simplified version of the function for you to use if you want. This code requires a few library files, which are included in LIB_http.zip.
Enjoy:
include('LIB_http.php');
include('LIB_parse.php');
function wikiscrape($topic){
$target = "http://en.wikipedia.org/wiki/".urlencode($topic);
$results = http_get($target,"");
$paragraphs = parse_array($results['FILE'],"<p>","</p>",EXCL);
foreach($paragraphs as $paragraph){
$paragraph = strip_tags($paragraph);
$paragraph = preg_replace("[\[.*\]]","",$paragraph);
if ($paragraph){
$final = $final . $paragraph . "\n\n";
}
}
return $final;
}
Published by nickycakes //
Wikipedia sucks. Well, for bringing relevant, mostly accurate content, it’s awesome, but who wants to compete with that? Search for nearly anything on google and you will find a wikipedia article at or near the top result. Wikipedia is the bane of many a webmaster’s existence, making it near impossible to rank at the #1 spot in many google searches. The crafty, however, can use wikipedia’s popularity to their advantage. People have been doing this almost as long as wikipedia has been around, and they’ve brought about numerous changes to the way wikipedia does things. Wikipedia has added rel:nofollow to their outbound links for a while now, meaning that websites don’t get direct credit in google for any links from the site. The moderators have been cracking down really hard on spam links as well, and you’ll often find a spammy link removed within an hour. So what can you do to get wikipedia to work to your advantage?
Link Spam
Ok, nobody hates getting spammed more than Nickycakes. However, Nickycakes also loves to be hated, and the bottom line is, a link from a popular wikipedia article can bring you a metric fuckton of traffic if it slips under the radar and doesn’t get edited out. Don’t be flagrant and deface 100 wikipedia pages in an hour or anything, or you’ll just get them all removed by some mod. If you have a blog or something, maybe find a related wikipedia article once a day or every couple of days, and add a link to one of your pages in the “External Links” section, make it inconspicuous and format it like the rest of the links on the page, and hope it sticks. It is more likely your link will be removed from popular articles, so you don’t want to put your stuff on the page for China or something. Alternatively, you can just make your own wikipedia article if you can’t find one related, and then add links to that article from other related articles. This is all a bit basic and spammy, but it WILL get you traffic if you do it right.
So should you post with your home IP? Should you make an account? Well, you can post anonymously, but if you keep using the same IP, you will get banned. Fire up a proxy, or switch wireless networks, or public computers at your college, or whatever, and you’ll have more chances to have your links stick without being edited out.
Generating Real Backlinks
Ok, so as was mentioned earlier, wikipedia automatically nofollows all outbound links, so you don’t get credit from google. Other search engines give credit for nofollowed links, so it will help your rankings on those, for what it’s worth. But you will get natural backlinks from spamming your site around wikipedia eventually if the links stay for long enough and are relevant. There are plenty of wikipedia scraper sites out there which will happily pull your link and drop it on their site, and generally, if enough people visit the wikipedia page, they will use your link on their website as an accepted related article.
Ruin Someone’s Google Rankings
Ok, so this is really cruddy, but works pretty well. Basically you find a google keyword that someone you hate is ranking well for, say at #1. You find the wikipedia page of the related topic, spiff it up, add links to that wikipedia page from other wikipedia pages, which will increase it’s pagerank, and with a little luck, it will rise to the #1 rank in google for that keyword. This idea was discussed a while back on seomoz, and the Cakes found it hilarious.
Be a Trusted Moderator
If you want to get serious about messing with wikipedia, you will have to get a trusted account. This means either scouring wikipedia looking for vandalism, and adding correct information to other articles with the hopes of getting noticed and being given one or more of wikipedia’s worthless “awards”. If you’re too lazy to do this the hard way, just spam wikipedia with a proxy and then come along with your normal account and edit out the spam until you build yourself some respect. Holding a trusted account will allow you to make edits in your favor and be much more likely to get away with it.
Be Careful
If you overuse this, you will get banned from wikipedia. They have a big list of banned sites, and aren’t afraid to add yours. It has been rumored that you can get penalized by google for being on wikipedia’s spam blacklist, but you should be ok if you don’t overuse it.
Published by nickycakes //