问题描述
- java抓取https网页问题
-
public static void getDocument() throws Exception{ Map<String,String> headMap=new HashMap<String,String>(); headMap.put("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); headMap.put("Referer","https://www.vc.cn/"); headMap.put("Cookie", "Hm_lvt_a857c86b2e41abb55f29bf3e06d43818=1432969970,1432977344; _oauth-client-demo_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFRkkiJTIzNDNlNzhjNWVlYzNiMzhiNzBjODg1MzQzYTk1N2Y5BjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMUkyRjNEQmFNZ2t4MjNzNGYydjVpa0swS1pMVWM1T21YeGUwM0M1VE9qMlk9BjsARg%3D%3D--2135b5dbf22455ae1c7e31d75e41dc49e77e8539; Hm_lpvt_a857c86b2e41abb55f29bf3e06d43818=1432977344"); headMap.put("User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 BIDUBrowser/6.x Safari/537.36");
// headMap.put("Host", "www.vc.cn");
// headMap.put("User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:38.0) Gecko/20100101 Firefox/38.0");
// headMap.put("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
// headMap.put("Accept-Language","zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3");
// headMap.put("Cookie","Hm_lvt_a857c86b2e41abb55f29bf3e06d43818=1432969970,1432977344");
// headMap.put("Connection","keep-alive");
String str=HttpsUtil.doGet("https://www.vc.cn/users/5227/startups/5358", "", headMap, "gbk", 5000, 5000);
if(str!=null){
Document doc=Jsoup.parse(str);
System.out.println(doc);
}
}目前我模拟了 百度浏览器的请求消息头(会返回404页面),注释掉的我是我模拟的火狐浏览器请求消息头(也会返回404) 这是为什么呢????????求指教啊,我到底漏了什么呢?(我如果请求首页https://www.vc.cn/是可以取到的,这两个页面用浏览器不需要跳转,可以直接访问,求解答,快爆炸了)
解决方案
总之这种问题,都不必提问,自己用fiddler对比下浏览器和你的程序的差异,就能解决了。
解决方案二:
直接对比一下你程序发送的,跟浏览器发送的消息头,看一下数据的差别
解决方案三:
fiddler对比下浏览器发送的消息头,原谅我得废话,,,