Wikipedia Scraper

For a recent project, Nickycakes had to code a Wikipedia scraper. Here’s a simplified version of the function for you to use if you want. This code requires a few library files, which are included in LIB_http.zip.

Enjoy:

include('LIB_http.php');
include('LIB_parse.php'); 

function wikiscrape($topic){
   $target = "http://en.wikipedia.org/wiki/".urlencode($topic);
   $results = http_get($target,"");
   $paragraphs = parse_array($results['FILE'],"<p>","</p>",EXCL);
   foreach($paragraphs as $paragraph){
     $paragraph = strip_tags($paragraph);
     $paragraph = preg_replace("[\[.*\]]","",$paragraph);
     if ($paragraph){
       $final = $final . $paragraph . "\n\n";
     }
   }
   return $final;
}

Peanut Gallery

Reply

Add a new comment