问题描述
我想做个数据采集器,把中国体育彩票开奖的信息取出来。不懂怎么动手,请详细指点一下。
解决方案
解决方案二:
可以用http协议获取中国体育彩票网站的信息,然后根据获取到的html数据进行正则匹配出开奖的信息……
解决方案三:
protectedvoidButton1_Click(objectsender,EventArgse){WebRequestwc=HttpWebRequest.Create("http://www.cznd.gov.cn/node/jrgxq_qnyw/2013-7-5/137512575342148320.html");wc.ContentType="application/x-www-form-urlencoded;charset=gb2312";using(WebResponsewq=wc.GetResponse()){using(Streams=wq.GetResponseStream()){using(StreamReadersr=newStreamReader(s,Encoding.GetEncoding("gb2312"))){stringhtml=sr.ReadToEnd();Matchm=Regex.Match(html,@"(?i)<td[^>]*?class=(['""]?)NewsContent1[^>]*?>s*?<p[^>]*?>s*?([sS]*?)</p>");stringresult=m.Groups[2].Value;Console.Write(result);Console.ReadLine();}}}}
前几天看到的一个案例.
解决方案四:
</div><TABLEwidth="366"align="center"cellpadding="0"cellspacing="0"style="color:#4a4a48;"><TRbgcolor="#ececec"align="center"><TDwidth="54"height="24">玩法</TD><TDwidth="50">期号</TD><TDwidth="166">开奖号</TD><TDwidth="32"><FONTstyle="font-size:13px;">详情</FONT></TD><TDwidth="32"><FONTstyle="font-size:13px;">历史</FONT></TD><TDwidth="32"><FONTstyle="font-size:13px;">图表</FONT></TD></TR><TRalign="center"><TDheight="40"><FONT>大乐透</FONT></TD><TD>13082</TD><TDalign="left"><TABLEwidth='159'height='21'align='left'cellpadding='0'cellspacing='0'style='color:#ffffff;font-weight:bold;font-family:宋体;'><TRalign='center'><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>03</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>09</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>25</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>26</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>33</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20056.gif'style='color:#ffffff'>03</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20056.gif'style='color:#ffffff'>12</TD></TR></TABLE></TD><TD><Ahref='/news/11010219.shtml'target='_blank'><IMGsrc='/images/20014.gif'border='0'/></A></TD><TD><Ahref='/lottery/dlt/History.aspx'target="_blank"><IMGsrc="/images/20016.gif"border="0"/></A></TD><TD><Ahref='http://data.lottery.gov.cn/chart_tc2/chart.shtml?LotID=23529&ChartID=20001&StatType=0&MinIssue=2012026&MaxIssue=2012125&IssueTop=100&tab=0'target="_blank"><IMGsrc="/images/20017.gif"border="0"/></A></TD></TR><TR><TDcolspan='7'height='1'background='/images/20022.gif'></TD></TR><TRalign="center"><TDheight="40"><FONT>排列3</FONT></TD><TD>13191</TD><TDalign="left"><TABLEwidth='67'height='21'align='left'cellpadding='0'cellspacing='0'style='color:#000000;font-weight:bold;font-family:宋体;'><TRalign='center'><TDwidth='21'background='/images/20057.gif'style='color:#ffffff'>4</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20057.gif'style='color:#ffffff'>7</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20057.gif'style='color:#ffffff'>3</TD></TR></TABLE></TD><TD><Ahref='/news/11010220.shtml'target='_blank'><IMGsrc='/images/20014.gif'border='0'/></A></TD><TD><Ahref='/lottery/pls/History.aspx'target="_blank"><IMGsrc="/images/20016.gif"border="0"/></A></TD><TD><Ahref='http://data.lottery.gov.cn/chart_tc2/chart.shtml?LotID=33&ChartID=20001&StatType=0&MinIssue=2012263&MaxIssue=2012292&IssueTop=30&tab=0'target="_blank"><IMGsrc="/images/20017.gif"border="0"/></A></TD></TR><TR><TDcolspan='7'height='1'background='/images/20022.gif'></TD></TR><TRalign="center"><TDheight="40"><FONT>排列5</FONT></TD><TD>13191</TD><TDalign="left"><TABLEwidth='113'height='21'align='left'cellpadding='0'cellspacing='0'style='color:#000000;font-weight:bold;font-family:宋体;'><TRalign='center'><TDwidth='21'background='/images/20057.gif'style='color:#ffffff'>4</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20057.gif'style='color:#ffffff'>7</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20057.gif'style='color:#ffffff'>3</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20057.gif'style='color:#ffffff'>4</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20057.gif'style='color:#ffffff'>4</TD></TR></TABLE></TD><TD><Ahref='/news/11010221.shtml'target='_blank'><IMGsrc='/images/20014.gif'border='0'/></A></TD><TD><Ahref='/lottery/plw/History.aspx'target="_blank"><IMGsrc="/images/20016.gif"border="0"/></A></TD><TD><Ahref='http://data.lottery.gov.cn/chart_tc2/chart.shtml?LotID=35&ChartID=20001&StatType=0&MinIssue=&MaxIssue=&IssueTop=30'target="_blank"><IMGsrc="/images/20017.gif"border="0"/></A></TD></TR><TR><TDcolspan='7'height='1'background='/images/20022.gif'></TD></TR><TRalign="center"><TDheight="40"><FONT>22选5</FONT></TD><TD>13172</TD><TDalign="left"><TABLEwidth='113'height='21'align='left'cellpadding='0'cellspacing='0'style='color:#ffffff;font-weight:bold;font-family:宋体;'><TRalign='center'><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>08</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>09</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>14</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>15</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>19</TD></TR></TABLE></TD><TD><Ahref='/news/11009537.shtml'target='_blank'><IMGsrc='/images/20014.gif'border='0'/></A></TD><TD><Ahref='/lottery/eexw/History.aspx'target="_blank"><IMGsrc="/images/20016.gif"border="0"/></A></TD><TD><Ahref='http://data.lottery.gov.cn/chart_tc2/chart.shtml?LotID=23525&ChartID=20001&StatType=0&MinIssue=&MaxIssue=&IssueTop=30'target="_blank"><IMGsrc="/images/20017.gif"border="0"/></A></TD></TR><TR><TDcolspan='7'height='1'background='/images/20022.gif'></TD></TR><TRalign="center"><TDheight="40"><FONT>7星彩</FONT></TD><TD>13082</TD><TDalign="left"><TABLEwidth='159'height='21'align='left'cellpadding='0'cellspacing='0'style='color:#ffffff;font-weight:bold;font-family:宋体;'><TRalign='center'><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>6</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>0</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>8</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>0</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>1</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>4</TD><TDwidth='2'></TD><TDwidth='21'background='/images/20055.gif'style='color:#ffffff'>8</TD></TR></TABLE></TD><TD><Ahref='/news/11010180.shtml'target='_blank'><IMGsrc='/images/20014.gif'border='0'/></A></TD><TD><Ahref='/lottery/qxc/History.aspx'target="_blank"><IMGsrc="/images/20016.gif"border="0"/></A></TD><TD><Ahref='http://data.lottery.gov.cn/chart_tc2/chart.shtml?LotID=10022&ChartID=20001&StatType=0&MinIssue=&MaxIssue=&IssueTop=30'target="_blank"><IMGsrc="/images/20017.gif"border="0"/></A></TD></TR></tr><tr><TR><TDcolspan="6"height="31"background="/images/20078.gif"><TABLEwidth="360"align="center"cellpadding="0"cellspacing="0"border="0"><TR><TDcolspan="2"height="2"></TD></TR><TR><TDwidth="65"></TD><TDwidth="295">超级大乐透<spanid="LabelDLT"class="FontPool">1.71亿元</span> 派奖<spanid="LabelQXC"class="FontPool">500万元</span></TD></TR></TABLE></TD></TR></TABLE><SCRIPTtype="text/javascript">var_bdhmProtocol=(("https:"==document.location.protocol)?"https://":"http://");document.write(unescape("%3Cscriptsrc='"+_bdhmProtocol+"hm.baidu.com/h.js%3F8929ffae85e1c07a7ded061329fbf441'type='text/javascript'%3E%3C/script%3E"));</SCRIPT></form></BODY></HTML>怎么写正则取出如下数据
解决方案五:
其实主页面采用了框架iframe,指向地址为因此你得到该地址的内容就可以了示例代码如下WebRequestwc=HttpWebRequest.Create("http://www.lottery.gov.cn/lottery/draws/Global.aspx");wc.ContentType="application/x-www-form-urlencoded;charset=gb2312";using(WebResponsewq=wc.GetResponse()){using(Streams=wq.GetResponseStream()){using(StreamReadersr=newStreamReader(s,Encoding.GetEncoding("utf-8"))){stringhtml=sr.ReadToEnd();stringpattern=@"(?i)<tr((?!.*?bgcolor)[^>]*?)>s*?<td[^>]*?>s*?<font>([^>]*?)</font>s*?</td>s*?<td[^>]*?>([^<>]*?)</td>s*?<td[^>]*?>s*?<table[^>]*?>[sS]*?(<td[^>]*?>((?<Num>d+)|s*?)</td>)*?s*?</tr>s*?[sS]*?</table>";varresult=Regex.Matches(html,pattern).OfType<Match>().Select(a=>new{玩法=a.Groups[2].Value,期号=a.Groups[3].Value,开奖号=string.Join("",a.Groups["Num"].Captures.OfType<Capture>().Select(b=>b.Value))});/*+[0]{玩法="大乐透",期号="13082",开奖号="03092526330312"}<AnonymousType>+[1]{玩法="排列3",期号="13191",开奖号="473"}<AnonymousType>+[2]{玩法="排列5",期号="13191",开奖号="47344"}<AnonymousType>+[3]{玩法="22选5",期号="13172",开奖号="0809141519"}<AnonymousType>+[4]{玩法="7星彩",期号="13082",开奖号="6080148"}<AnonymousType>*/}}
解决方案六:
爬取数据啊主要还是html分析可以使用htmlagilitypack参考http://www.cnblogs.com/wangchuang/archive/2013/03/11/2953638.html
解决方案七:
我已经得到该地址的内容,但是正则啊,我还是搞不定啊
解决方案八:
我没有自己写我现在使用的是ET你可以试试
解决方案九:
你不使用正则那就要学xpath,我建议在html解析的时候就使用xpath就可以了,比其它的解析方法要好的多。