问题描述
学校布置了一个大作业,是用javasocket做一个简单的浏览器,需要自己抓取HTML代码然后自己去解析HTML的标签,在做的过程中需要用GET的方法去发送HTTPHEADER,然后得到响应,现在有几个问题:1.有些是HTTP1.0有些是HTTP1.1这个头文件怎么设置才能得到正确的回应2,我在尝试链接GOOGLE的时候,返回的是302FOUND,我输入的地址是www.google.com,期中返回的头文件里有个LOCATION:www.google.co.uk,然后我直接输入www.google.co.uk还是得到同样的302FOUND,这个问题困扰我很多天了。怎么样才能解决。在线等待!!谢谢了。Happynewyear!:)
解决方案
解决方案二:
这是我之前一直用的一段代码,在很多项目中都用到了,并且没有出现问题,希望对你有作用:packagelhbd;importjava.io.BufferedReader;importjava.io.InputStreamReader;importjava.net.URL;importjava.util.Vector;importorg.apache.commons.httpclient.Cookie;importorg.apache.commons.httpclient.HttpClient;importorg.apache.commons.httpclient.NameValuePair;importorg.apache.commons.httpclient.methods.GetMethod;importorg.apache.commons.httpclient.methods.PostMethod;publicclassYouDengLu{KaixinProkai=newKaixinPro();privatestaticfinalStringLOGON_SITE="http://www.kaixin001.com";privatestaticfinalintLOGON_PORT=80;publicVectorv_user=kai.getUser();inthh=0;publicstaticvoidmain(String[]args){YouDengLuyou=newYouDengLu();try{you.denglu(3);}catch(Exceptione){e.printStackTrace();}}publicstaticvoidgetInfo(intnum)throwsException{BufferedReaderread=null;URLurl=newURL("http://www.kaixin001.com/home/?uid=26994759");//http://www.youku.com也行下面的utf-8和页面中的编码方式一样read=newBufferedReader(newInputStreamReader(url.openStream(),"UTF-8"));StringtempString=null;intline=1;while((tempString=read.readLine())!=null){System.out.println(tempString);}read.close();}publicvoiddenglu(intnum)throwsException{HttpClientclient=newHttpClient();intff=0;if(hh==v_user.size()){return;}client.getHostConfiguration().setHost(LOGON_SITE,LOGON_PORT);//登录页面PostMethodpost=newPostMethod("http://www.kaixin001.com/login/login.php");NameValuePairie=newNameValuePair("User-Agent","Mozilla/4.0(compatible;MSIE6.0;Windows2000)");NameValuePairurl=newNameValuePair("url","/home/");String[]arr=(String[])v_user.get(hh);Stringusname=arr[0];Stringpw=arr[1];System.out.println(usname+":"+pw);NameValuePairusername=newNameValuePair("email",usname);NameValuePairpassword=newNameValuePair("password",pw);post.setRequestBody(newNameValuePair[]{ie,url,username,password});client.executeMethod(post);System.out.println("******************************登录******************************");Cookie[]cookies=client.getState().getCookies();client.getState().addCookies(cookies);post.releaseConnection();System.out.println("******************************页面转向******************************");StringnewUrl="http://www.kaixin001.com/home/";System.out.println("==========Cookies============");inti=0;for(intj=0;j<cookies.length;j++){Cookiec=cookies[j];System.out.println(++j+":"+c);}client.getState().addCookies(cookies);post.releaseConnection();GetMethodget=newGetMethod(newUrl);get.setRequestHeader("Cookie",cookies.toString());client.executeMethod(get);StringresponseString=get.getResponseBodyAsString();//登录后首页的内容//System.out.println(responseString);get.releaseConnection();Stringslave="http://www.kaixin001.com/home/?uid=26994759";System.out.println(slave);get=newGetMethod(slave);get.setRequestHeader("Cookie",cookies.toString());client.executeMethod(get);BufferedReaderread=newBufferedReader(newInputStreamReader(get.getResponseBodyAsStream(),"utf-8"));StringtempString=null;while((tempString=read.readLine())!=null){System.out.println("。。。"+tempString);}}}
解决方案三:
apache里的httpclient集成了抓取网页的功能,在网上搜索一下使用方法吧,很简单
解决方案四:
我忘了说了,用httpClient技术一定要导入相关包,可以从网上下载
解决方案五:
谢谢大家帮忙!我们只能允许用socket链接服务器,通过HTTPGETrequest抓取网页,所有的HTTP工作都要自己完成。不能用任何HTTP包。抓下的HTML必须自己解析所有的标签。我现在不明白这个location的机制,不知道怎么样才能从302变成200Ok.怎么样才能跳到正确的网址上去。这么做才能实现谢谢大家了
解决方案六:
HTTP/1.0302FoundLocation:http://www.google.co.uk/Cache-Control:privateContent-Type:text/html;charset=UTF-8Set-Cookie:PREF=ID=ade7b3eab8a5e663:FF=0:TM=1294030900:LM=1294030900:S=wP7CbBiaExnXE-5y;expires=Wed,02-Jan-201305:01:40GMT;path=/;domain=.google.comSet-Cookie:NID=42=Un_KBN7EY4opN1V_AsE8m8ch9mwfJZMU1szXXPaZ7zbDtmTEzWFyibHHbun0oQWqdfGWKSYhr9xD4Ui91nOV9DlpX3Gok1E0K5oQ5INXb09nb2nb8PK1-kqtuMmKdyoi;expires=Tue,05-Jul-201105:01:40GMT;path=/;domain=.google.com;HttpOnlyDate:Mon,03Jan201105:01:40GMTServer:gwsContent-Length:221X-XSS-Protection:1;mode=block<HTML><HEAD><metahttp-equiv="content-type"content="text/html;charset=utf-8"><TITLE>302Moved</TITLE></HEAD><BODY><H1>302Moved</H1>Thedocumenthasmoved</BODY></HTML>
解决方案七:
当我输入www.google.com的时候抓到的是这个页面Location:http://www.google.co.uk/但是我直接输入http://www.google.co.uk/的时候还是同样的页面这个完全不明白了这个问题卡了很久啊。。希望大家帮帮我吧
解决方案八:
publicstaticvoidmain(String[]args)throwsException{Sockets=newSocket("www.google.com.hk",80);BufferedWriterbw=newBufferedWriter(newOutputStreamWriter(s.getOutputStream()));bw.write("GET/HTTP/1.1");bw.newLine();bw.write("Host:www.google.com.hk:80");bw.newLine();bw.write("Content-Type:text/html");bw.newLine();bw.newLine();bw.flush();BufferedReaderbr=newBufferedReader(newInputStreamReader(s.getInputStream()));Stringstr=null;while((str=br.readLine())!=null){System.out.println(str);}bw.close();br.close();s.close();}
这是响应头,google.com返回302是因为被临时重定向了HTTP/1.1200OKDate:Mon,03Jan201105:23:09GMTExpires:-1Cache-Control:private,max-age=0Content-Type:text/html;charset=ISO-8859-1
解决方案九:
你这边把头文件设置成了HTTP/1.1。但是有些网页是HTTP/1.0这个怎么区分呢?我有试过有些网页设置成HTTP/1.0可以访问但是设置成HTTP/1.1就完全打不开了!
解决方案十:
如果把HTTP/1.1删除了可以么?
解决方案十一:
Socketsocket=newSocket(domin,portNo);//Socketsocket=newSocket(ipaddress,portNo);PrintWriterout=newPrintWriter(socket.getOutputStream());out.write("GET"+httpPage+"rn");out.write("Host:"+domin+"rn");out.write("Content-Type:text/htmlrn");out.write("Connection:Keep-Alivern");out.write("rn");out.flush();BufferedReaderin=newBufferedReader(newInputStreamReader(socket.getInputStream()));Stringline=in.readLine();这个是我写的头,大家看看有没有错误谢谢了!!