天天乐棋牌

天天乐棋牌-冠通棋牌官-博客来丰禾棋牌官网-手机捕鱼游戏下载

WordPress技巧:手工设计蜘蛛爬取系统

每次准备看博客里的蜘蛛爬取记录的时候,我都会很头疼,太麻烦了,你要登录FTP,找到日志,然后上传到表格自己做出来,再挨个筛选出不同的蜘蛛,所以我一直在找一个功能来记录蜘蛛的抓取频次,因为wp本身强大的插件,确实能够实现,但是插件用多了,网站卡的不行,所以就想通过代码来实现这个功能,还真让我找着了,不仅有我想要的功能,而且比插件还要好使。

 

WordPress技巧:手工设计蜘蛛爬取系统

 

你看,只要点击进去,你就可以看到我的蜘蛛爬行系统了,很强大吧?这里要特别感谢千丝海阁大神,提供这么牛叉的办法,这是他的博客:千丝海阁

 

蜘蛛爬行系统

 

下面我来分享如何实现这个功能,其实非常简单,就两步,第一步复制粘贴一段代码,第二步新建一个页面。

 

就这么简单。

 

我先提供代码,请保留作者注释:

 

? <?php
/*
文件功能:WordPress自动分析搜索引擎蜘蛛爬行日志
使用方法:http://seofangfa.com/wordpress-study/wordpress-spider.html
本文件制作人:千丝海阁
修改人:方法@http://seofangfa.com
修改日期:2015.7.8
*/
?>
<?php
//自动分析蜘蛛
make_log_file();
function make_log_file(){
//log文件名
$filename = ‘mylogs.txt’;
//去除rc-ajax评论以及cron机制访问记录
if(strstr($_SERVER[“REQUEST_URI”],”rc-ajax”)== false
&& strstr($_SERVER[“REQUEST_URI”],”wp-cron.php”)== false ) {
$word .= date(‘mdHis’,$_SERVER[‘REQUEST_TIME’] + 3600*8) . ” “;
//访问页面
$word .= $_SERVER[“REQUEST_URI”] .” “;
//协议
$word .= $_SERVER[‘SERVER_PROTOCOL’] .” “;
//方法,POST OR GET
$word .= $_SERVER[‘REQUEST_METHOD’] . ” “;
//$word .= $_SERVER[‘HTTP_ACCEPT’] . ” “;
//获得浏览器信息
$word .= getbrowser(). ” “;
//传递参数
$word .= “[“. $_SERVER[‘QUERY_STRING’] . “] “;
//跳转地址
$word .= $_SERVER[‘HTTP_REFERER’] . ” “;
//获取IP
$word .= getIP() . ” “;
$word .= “\n”;
$fh = fopen($filename, “a”);
fwrite($fh, $word);
fclose($fh);
}
}
//获取IP地址,网上现成代码
function getIP() //get ip address
{
if (getenv(‘HTTP_CLIENT_IP’))
{
$ip = getenv(‘HTTP_CLIENT_IP’);
}
else if (getenv(‘HTTP_X_FORWARDED_FOR’))
{
$ip = getenv(‘HTTP_X_FORWARDED_FOR’);
}
else if (getenv(‘REMOTE_ADDR’))
{
$ip = getenv(‘REMOTE_ADDR’);
}
else
{
$ip = $_SERVER[‘REMOTE_ADDR’];
}
return $ip;
}
//获取浏览器信息,移动端,平板电脑数据还未加上。
function getbrowser()
{
$Agent = $_SERVER[‘HTTP_USER_AGENT’];
$browser = ”;
$browserver = ”;
if(ereg(‘Mozilla’, $Agent) && ereg(‘Chrome’, $Agent))
{
$temp = explode(‘(‘, $Agent);
$Part = $temp[2];
$temp = explode(‘/’, $Part);
$browserver = $temp[1];
$temp = explode(‘ ‘, $browserver);
$browserver = $temp[0];
$browserver = $browserver;
$browser = ‘Chrome’;
}
if(ereg(‘Mozilla’, $Agent) && ereg(‘Firefox’, $Agent))
{
$temp = explode(‘(‘, $Agent);
$Part = $temp[1];
$temp = explode(‘/’, $Part);
$browserver = $temp[2];
$temp = explode(‘ ‘, $browserver);
$browserver = $temp[0];
$browserver = $browserver;
$browser = ‘Firefox’;
}
if(ereg(‘Mozilla’, $Agent) && ereg(‘Opera’, $Agent))
{
$temp = explode(‘(‘, $Agent);
$Part = $temp[1];
$temp = explode(‘)’, $Part);
$browserver = $temp[1];
$temp = explode(‘ ‘, $browserver);
$browserver = $temp[2];
$browserver = $browserver;
$browser = ‘Opera’;
}
if(ereg(‘Mozilla’, $Agent) && ereg(‘MSIE’, $Agent))
{
$temp = explode(‘(‘, $Agent);
$Part = $temp[1];
$temp = explode(‘;’, $Part);
$Part = $temp[1];
$temp = explode(‘ ‘, $Part);
$browserver = $temp[2];
$browserver = $browserver;
$browser = ‘Internet Explorer’;
}
if($browser != ”)
{
$browseinfo = $browser.’ ‘.$browserver;
}
else
{
$browseinfo = $_SERVER[‘HTTP_USER_AGENT’];
}
return $browseinfo;
}
function get_spider_log($atts) {
extract(shortcode_atts(array(
‘text’ => ‘yes’),$atts));
$fh = fopen(site_url() .”/mylogs.txt”, “r”);
$contents = “”;
while(!feof($fh)){
$contents .= fread($fh, 8080);
}
fclose($fh);
$str = “”;
$showtime=date(“md”);
if($text == “yes”) {
$str.= “当天蜘蛛爬行记录:”;
$str.= “<div style=’background-color:#33A1C9;color:white;text-align:center;’>以下为国内常用蜘蛛。</div>”;
}
$mytmp = array();
//google
$google = 0;
if($text == “yes”)
$str.= “<a href=http://www.google.com/bot.html target=_blank>Google Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”Googlebot\/”,$text);
$google += $mytmp[0];
$str.= $mytmp[1];
$mytmp = show_spider_result($showtime,$contents,”Googlebot-Image\/”,$text);
$google += $mytmp[0];
$str.= $mytmp[1];
$mytmp = show_spider_result($showtime,$contents,”Googlebot-Mobile\/”,$text);
$google += $mytmp[0];
$str.= $mytmp[1];
$mytmp = show_spider_result($showtime,$contents,”Feedfetcher-Google”,$text);
$google += $mytmp[0];
$str.= $mytmp[1];
// baidu
$baidu = 0;
if($text == “yes”)
$str.= “<br><a href=http://www.baidu.com/search/spider.html target=_blank>Baidu Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”Baiduspider\/”,$text);
$baidu += $mytmp[0];
$str.= $mytmp[1];
$mytmp = show_spider_result($showtime,$contents,”Baiduspider-image”,$text);
$baidu += $mytmp[0];
$str.= $mytmp[1];
//bing
$bing = 0;
if($text == “yes”)
$str.= “<br><a href=http://www.bing.com/bingbot.htm target=_blank>bingbot Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”bingbot\/”,$text);
$bing += $mytmp[0];
$str.= $mytmp[1];
$mytmp = show_spider_result($showtime,$contents,”msnbot-media\/”,$text);
$bing += $mytmp[0];
$str.= $mytmp[1];
//sogou
$sogou = 0;
if($text == “yes”)
$str.= “<br><a href=http://www.sogou.com/docs/help/webmasters.htm#07 target=_blank>Sogou Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”Sogou web spider\/”,$text);
$sogou += $mytmp[0];
$str.= $mytmp[1];
//soso
$soso = 0;
if($text == “yes”)
$str.= “<br><a href=http://help.soso.com/webspider.htm target=_blank>Soso Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”Sosospider\/”,$text);
$soso += $mytmp[0];
$str.= $mytmp[1];
if($text == “yes”)
$str.= “<div style=’background-color:#FA8072;color:white;text-align:center;’>以下为垃圾蜘蛛,可屏蔽抓取。</div>”;
//jike
$else = 0;
if($text == “yes”)
$str.= “<a href=http://shoulu.jike.com/spider.html target=_blank>Jike Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”JikeSpider”,$text);
$else += $mytmp[0];
$str.= $mytmp[1];
//easou
if($text == “yes”)
$str.= “<br><a href=http://www.easou.com/search/spider.html target=_blank>Easou Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”EasouSpider”,$text);
$else += $mytmp[0];
$str.= $mytmp[1];
//yisou
if($text == “yes”)
$str.= “<br>YisouSpider:”;
$mytmp = show_spider_result($showtime,$contents,”YisouSpider”,$text);
$else += $mytmp[0];
$str.= $mytmp[1];
if($text == “yes”)
$str.= “<br><a href=http://yandex.com/bots target=_blank>YandexBot Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”YandexBot\/”,$text);
$else += $mytmp[0];
$str.= $mytmp[1];
if($text == “yes”)
$str.= “<br><a href=http://go.mail.ru/help/robots target=_blank>Mail.RU Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”Mail.RU_Bot\/”,$text);
$else += $mytmp[0];
$str.= $mytmp[1];
if($text == “yes”)
$str.= “<br><a href=http://www.acoon.de/robot.asp target=_blank>AcoonBot Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”AcoonBot\/”,$text);
$else += $mytmp[0];
$str.= $mytmp[1];
if($text == “yes”)
$str.= “<br><a href=http://www.exabot.com/go/robot target=_blank>Exabot Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”Exabot\/”,$text);
$else += $mytmp[0];
$str.= $mytmp[1];
if($text == “yes”)
$str.= “<br><a href=http://www.seoprofiler.com/bot target=_blank>spbot Spider</a>: “;
$mytmp = show_spider_result($showtime,$contents,”spbot\/”,$text);
$else += $mytmp[0];
$str.= $mytmp[1];
$str.= draw_canvas($google,$baidu,$bing,$sogou,$soso,$else);
return $str;
}
function show_spider_result($time,$contents,$str,$text){
$count = array();
$count[0] = preg_match_all(“/”.$time.”\d*\s\/\S*\s.*”.$str.”/”,$contents,$mymatches);
if($text == “yes”) {
$str = preg_replace(“{\\\/}”,””,$str);
$count[1].= “<br> 蜘蛛类型=>”.$str.”: 爬行次数=”.$count[0];
if($count[0] >0) {
$tmp = substr($mymatches[0][$count[0]-1],4,6);
$tmp = substr($tmp,0,2) .”:” . substr($tmp,2,2) .”:” .substr($tmp,4,2) ;
$count[1].= ” 最后爬行时间:”. $tmp;
}
}
return $count;
}
function draw_canvas($google,$baidu,$bing,$sogou,$soso,$else){
$tmp = $google + $baidu + $bing + $sogou + $soso + $else;
if($tmp == 0) {
return “<br><br>数据不足,无法生成分析图。<br><br>”;
}
$google2 = $google*100/$tmp;
$baidu2 = $baidu*100/$tmp;
$bing2 = $bing*100/$tmp;
$sogou2 = $sogou*100/$tmp;
$soso2 = $soso*100/$tmp;
$else2 = $else*100/$tmp;
$str.= “<br><div style=’border-top: 1px solid #e6e6e6;’><br>
<div style=’float:left;width:150px;border-width:1px;border-style:groove;padding:15px;’><b>蜘蛛爬行分析图:</b><br>”;
$str.= “日期:” . date(“Y-m-d”);
$str.= “<br>蜘蛛一共爬行”. $tmp . “次:<br>”;
$str.= “<li><span style=’color:#33A1C9;’>google:”. $google .”次(“. intval($google2) .”%)</span></li>”;
$str.= “<li><span style=’color:#0033ff;’>baidu:”. $baidu .”次(“. intval($baidu2) .”%)</span></li>”;
$str.= “<li><span style=’color:#872657;’>bing:”. $bing .”次(“. intval($bing2) .”%)</span></li>”;
$str.= “<li><span style=’color:#FF9912;’>sogou:”. $sogou .”次(“. intval($sogou2) .”%)</span></li>”;
$str.= “<li><span style=’color:#FF6347;’>soso:”. $soso .”次(“. intval($soso2) .”%)</span></li>”;
$str.= “<li><span style=’color:#55aa00;’>else:”. $else .”次(“. (100 – intval($google2) – intval($baidu2) – intval($bing2) – intval($sogou2) – intval($soso2)) .”%)</span></li></div>”;
$str.=”<img src = ‘http://chart.apis.google.com/chart?cht=p3&chco=33A1C9,0033ff,872657,FF9912,FF6347,55aa00&chd=t:”.$google2 .”,”.$baidu2.”,”.$bing2.”,”.$sogou2.”,”.$soso2.”,”.$else2.”&chs=400×200&chl=google|baidu|bing|sogou|soso|else’ /></div><br>”;
return $str;
}
add_shortcode(‘spiderlogs’,’get_spider_log’);
//自动分析蜘蛛结束
?>

 

代码有点长,完整版的我已经传到了QQ群共享,需要的朋友进群下载就可以了:522603693

 

OK,复制了这串代码后,然后把他粘贴到你的博客的functions.php文件里最后一个?(问号)后面就行了(记住是后面,后面,后面),我已截图标记:

 

functions.php文件

 

第二步,新建一个页面,比如myblog页面,粘贴下面一句spiderlogs就行了:

 

新建一个页面

 

弄好了之后,就直接生成就可以了,然后你可以把它放在你的首页底部,在footer.php里修改就可以了:

 

footer.php里修改

 

这样就弄好了,不难吧。

 

这里是北京SEO博客www.d-taro.com,息心个人QQ:369442071,一个重视基础,畅谈技巧,凝练思维的SEO营销推广学习平台!这里,你将学习到满满的网络营销干货——SEO,推广引流,软文文案,营销模式等等技巧,从基础到高级,可能你遇到的棘手问题都会在这里解决。

 

WordPress技巧推荐:

WordPress牛逼技巧:自动检查文章是否被收录(免插件)

神器!WP博客优化加速实操

 

? ? ?

本文欢迎转载:北京SEO » WordPress技巧:手工设计蜘蛛爬取系统

赞 (0)

评论 3

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
  1. 图王下载站不喜欢,只喜欢帝国,回复
  2. 一元营销确实不错,这个要实话实说!回复
  3. zengda研究研究,学习学习。回复
手机棋牌外挂
天天乐棋牌-冠通棋牌官-博客来丰禾棋牌官网-手机捕鱼游戏下载「北京SEO」网站优化-息心SEO营销推广博客天天乐棋牌