PHP Web Scraping for Munin

There is a web page created by someone else containing figures I would like to use to draw some graphs in Munin. The HTML in that page is a bit ugly and looks like:

<TR ALIGN = “Center”>
<TD ALIGN = “Left” WIDTH = “49%” BGCOLOR = “#ECECEC”>
<B>Building</B>
</TD>
<TD ALIGN = “Center” WIDTH = “9%” BGCOLOR = “#A0F0A0”>
<B>8</B>
</TD>
<TD ALIGN = “Center” WIDTH = “7%” BGCOLOR = “#ECECEC”>
22
</TD>
<TD ALIGN = “Center” WIDTH = “6%” BGCOLOR = “#E0E0E0”>
14
</TD>
<TD ALIGN = “Center” WIDTH = “29%” BGCOLOR = “#D0F0D0”>
&nbsp;
</TD>
</TR>

So I read this article about Web Scraping in PHP and created the following as my Munin plugin:

#!/usr/bin/php
<?php

$html = file_get_contents(“http://www.somewebsite.com/thewebpage.html”);
preg_match_all(
‘/<TR ALIGN = “Center”>.*?<TD ALIGN = “Left” WIDTH = “49%” BGCOLOR = “#ECECEC”>.*?<B>(.*?)<\/B>.*?<\/TD>.*?<TD ALIGN = “Center” WIDTH = “9%” BGCOLOR = “#A0F0A0”>.*?<B>(.*?)<\/B>.*?<\/TD>.*?<TD ALIGN = “Center” WIDTH = “7%” BGCOLOR = “#ECECEC”>(.*?)<\/TD>.*?<TD ALIGN = “Center” WIDTH = “6%” BGCOLOR = “#E0E0E0”>(.*?)<\/TD>.*?<TD ALIGN = “Center” WIDTH = “29%” BGCOLOR = “#D0F0D0”>(.*?)<\/TD>.*?<\/TR>/s’,
$html,
$posts, // will contain the blog posts
PREG_SET_ORDER // formats data into an array of posts
);

if ((count($argv) > 1) && ($argv[1] == ‘config’))
{
print(“graph_title SCC Usage
graph_category SCC
graph_vlabel Computers in Use\n”);
$count=0;
foreach ($posts as $post) {
$scc_name = trim($post[1]);
print($count.’.label ‘ .$scc_name. “\n”);
$count++;
}
//      print(“total.label Total”);
exit();
}
$total=0;
$count=0;
foreach ($posts as $post) {
$scc_name = trim($post[1]);
$scc_free = trim($post[2]);
$scc_max = trim($post[3]);
$scc_used = trim($post[4]);
$scc_comment = trim($post[5]);
// do something with data
//        echo $scc_name.” “.$scc_used.”\n”;
print($count.’.value ‘ .$scc_used. “\n”);
$total=$total+$scc_used;
$count++;
}
//print(‘total.value ‘ .$total. “\n”);
?>

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>