问题描述
比如说获取百度新闻"www.news,baidu.com"的的内容,该怎么写,现在只写出前半部分,获取了百度新闻网页的源代码,然后怎么用正则表达式来获取新闻列表和内容?有大神能帮忙写一下吗,最好能稍微解释一下,因为刚开始学,一片空白……,谢谢了前半部分:
解决方案
解决方案二:
首先,你要确定你要采集的网站的HTML信息,根据HTML信息才能匹配出正则表达式,然后用正则表达式筛选你想要的结果
解决方案三:
@"<ab[^<>]*?href=['""](?<url>[^'""]*)"
用这个匹配a标签,然后通过m.Groups["url"]取得对应的网址,再通过WebRequest模拟读取对应的网址内容,百度新闻上的地址都指向各个第三方网址,所以你还得为每个具体的网址写专门的内容读取方式(正则)
解决方案四:
是html的
解决方案五:
以前做过,恭喜你已经可以获取到原始网页,然后把他当成字符串用正则表达式把用的内容筛选出来。当然要找规律,就是看你要的列表是用什么字符分割,如TABLE,DT,下来差不多就是用正则表达式匹配
解决方案六:
protectedvoidPage_Load(objectsender,EventArgse){//Response.Clear();//Response.StatusCode=301;//Response.Status="301MovedPermanently";////Response.AddHeader("Location","http://"+Request.Url.Authority+"/ALLcatalog.aspx");//Response.AddHeader("Location","http://"+Request.Url.Authority+"/Default.aspx");//Response.End();switch(Request.QueryString["type"].ToString().ToLower()){case"meirong":Label1.Text="美容系列";////新浪化妆教室:妆美人============================================================================================================================stringhtmlCode1=GetHTML("http://eladies.sina.com.cn/beauty/makeup/index.shtml","gb2312");Regexr1=newRegex("{"title":"(?<title>.+)","url":"(?<lianjie>.+)","subtitle":"(?<xianshiming>.+)","time":"(?<shijian>.+)"}");MatchCollectionmar1=r1.Matches(htmlCode1);stringsOut1="";foreach(Matchm1inmar1){sOut1+="<li><atarget=_blankhref=article.aspx?type=meirong&id="+MD5.JiaMi(m1.Groups["lianjie"].Value)+">"+m1.Groups["title"].Value+"</a>"+m1.Groups["shijian"].Value.Substring(0,6)+"</li>";}Label2.Text=sOut1;break;case"chaoliu":Label1.Text="明星潮流";//明星潮流=============================================================================================================================================stringhtmlCode2=GetHTML("http://eladies.sina.com.cn/fa/jietou/index.shtml","gb2312");Regexr2=newRegex("{"title":"(?<title>.+)","url":"(?<lianjie>.+)","subtitle":"(?<xianshiming>.+)","time":"(?<shijian>.+)"}");MatchCollectionmar2=r2.Matches(htmlCode2);stringsOut2="";foreach(Matchm2inmar2){sOut2+="<li><atarget=_blankhref=article.aspx?type=chaoliu&id="+MD5.JiaMi(m2.Groups["lianjie"].Value)+">"+m2.Groups["title"].Value+"</a>"+m2.Groups["shijian"].Value.Substring(0,6)+"</li>";}Label2.Text=sOut2;break;case"dapei":Label1.Text="流行搭配";//流行装扮===============================================================================================================================================================stringhtmlCode3=GetHTML("http://eladies.sina.com.cn/fa/zhuangban/index.shtml","gb2312");Regexr3=newRegex("{"title":"(?<title>.+)","url":"(?<lianjie>.+)","subtitle":"(?<xianshiming>.+)","time":"(?<shijian>.+)"}");MatchCollectionmar3=r3.Matches(htmlCode3);stringsOut3="";foreach(Matchm3inmar3){sOut3+="<li><atarget=_blankhref=article.aspx?type=dapei&id="+MD5.JiaMi(m3.Groups["lianjie"].Value)+">"+m3.Groups["title"].Value+"</a>"+m3.Groups["shijian"].Value.Substring(0,6)+"</li>";}Label2.Text=sOut3;break;}}