php里常用的远程采集函数

函数

代码如下

复制代码

/**
* 获取远程url的内容
* @param string $url
* @return string
*/
function get_url_content($url) {
if(function_exists(curl_init)) {
    $ch = curl_init();
    $timeout = 5;
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt ($ch, CURLOPT_TIMEOUT, $timeout);

    $file_contents = curl_exec($ch);
    curl_close($ch);
} else {
    $file_contents = file_get_contents($url);
}

return $file_contents;
}

调用方法

代码如下	复制代码
$url = 'http://www.111cn.net'; $a = get_url_content($url); echo $a;

上面只是一个简单的实例，如果我们想应用可参考我自己写的采集程序了。

1,获取目标网页数据;
2,截取相关内容;
3,写入数据库/生成HMTL文件;
下面就按照步骤来试试!
获取目标网页数据
1, 确定好,要获取的网页地址甚至形式,这里我们采用的网址是:/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=1&tr=59
这个页面是有分页的,根据规律,我们找到只需要改变page参数就可以翻页!即:

我们的网页形式是:/index.html?pageconfig=catalog_byproducttype& amp;intProductTypeID=1&strStartChar=A&intResultsPage= NUMBER &tr=59

红色部分是当前页码对应值!只需要改变该值就可以了!

2,获取页面内容:自然要用到PHP函数了!这里,两个函数都可以!他们分别是:

file_get_contents() 把整个文件读入一个字符串中。和 file() 一样，不同的是file_get_contents() 把文件读入一个字符串。file_get_contents() 函数是用于将文件的内容读入到一个字符串中的首选方法。如果操作系统支持，还会使用内存映射技术来增强性能。语法: file_get_contents( path , include_path , context , start , max_length ) curl() 了解详细,请参阅官网文档:http://cn.php.net/curl fopen()函数打开文件或者 URL。如果打开失败，本函数返回 FALSE。语法: fopen(filename,mode,include_path,context) 当然,我们采用的是第一个!其实,所有的都差不多,有兴趣的童子可以常识常识其他的!

代码如下	复制代码
<?php $oldcontent = file_get_contents(“http://www.abcam.cn/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59”); echo $oldcontent; ?>

运行PHP程序,上面的代码可以显示出整个网页!由于原网页采用的是绝地路径,所以现在显示的效果和原来的是一模一样的!
接下来就是要,截取内容了!截取内容的方法也有很多,今天介绍的一种比较简单:

代码如下

复制代码

<?php
$oldcontent = file_get_contents(“http://www.abcam.cn/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59″);
$oldcontent;
$pfirst = ‘<table border=”0″ cellspacing=”0″ cellpadding=”0″> <tr> <th style=”padding-left: 0px;”><p style=”font-size:12px”><strong>Code</strong></p></th>’;
$plast = ‘Goat polyclonal’;
$b= strpos($oldcontent,$pfirst);
$c= strpos($oldcontent,$plast);
echo substr($oldcontent,$b,$c-1);
?>

输出的,即为所需要的结果!
写入数据库和写入文件都是比较简单的!这里就写入文件了!

代码如下

复制代码

<?php
$oldcontent = file_get_contents(“index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59″);
$oldcontent;
$pfirst = ‘<table border=”0″ cellspacing=”0″ cellpadding=”0″> <tr> <th style=”padding-left: 0px;”><p style=”font-size:12px”><strong>Code</strong></p></th>’;
$plast = ‘Goat polyclonal’;
$b= strpos($oldcontent,$pfirst);
$c= strpos($oldcontent,$plast);
$a = substr($oldcontent,$b,$c-1);
$file = date(‘YmdHis’).”.html”;
$fp = fopen($file,”w+”);
if(!is_writable($file)){
die(“File “.$file.” can not be written”);
}
else {
file_put_contents($file, $a);
echo “success”;
}
fclose($fp);
?>

OK,继续上班,今天的截取就到这里,下次就说说正则表达式提取内容

时间： 2024-09-19 23:58:04

php里常用的远程采集函数

php里常用的远程采集函数的相关文章

PHP采集程序常用的采集函数收藏

基于curl数据采集之单页面采集函数get_html的使用_php实例

基于curl数据采集之单页面并行采集函数get_htmls的使用_php实例

php实现图片远程采集

ORACLE常用数值函数、转换函数、字符串函数介绍

PHP实例：常用的数值判断函数

MySQL中几个常用的数据库操作函数

请问matlab里自带的deconvblind函数运用的是哪种盲解卷积的算法，求原理过程~~~

让你提前认识软件开发(18)：C语言中常用的文件操作函数总结及使用方法演示代码