java抓取雪球数据时连接老是失败,不知道是否被屏蔽还是参数不对

问题描述

尝试了好多参数,都是一样的 Server returned HTTP response code: 400 for URL,不知道是不是雪球有限制,但是对照浏览器的请求,一模一样的做了设置也不行,多谢! 也用jsoup做同样的事情,还是同样的错误。Java代码 收藏代码package com.test; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.Reader; import java.net.HttpURLConnection; import java.net.URL; import java.net.URLConnection; import java.nio.charset.Charset; import org.json.JSONException; import org.json.JSONObject; public class test { private static String readAll(Reader rd) throws IOException { StringBuilder sb = new StringBuilder(); int cp; while ((cp = rd.read()) != -1) { sb.append((char) cp); } return sb.toString(); } public static JSONObject readJsonFromUrl(String url) throws IOException, JSONException { URL u = new URL(url); URLConnection uc = (HttpURLConnection)u.openConnection(); uc.setRequestProperty("X-Requested-With","XMLHttpRequest"); uc.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36 LBBROWSER"); // give it 15 seconds to respond uc.setReadTimeout(15*1000); uc.connect(); InputStream is = uc.getInputStream(); try { BufferedReader rd = new BufferedReader(new InputStreamReader(is, Charset.forName("UTF-8"))); String jsonText = readAll(rd); JSONObject json = new JSONObject(jsonText); return json; } finally { is.close(); } } public static void main(String[] args) throws IOException, JSONException { // 设置代理 System.getProperties().setProperty("proxySet", "true"); System.getProperties().setProperty("http.proxyHost", "cn-proxy.xxx.com"); System.getProperties().setProperty("http.proxyPort", "80"); JSONObject json = readJsonFromUrl("http://xueqiu.com/stock/cata/stocklist.json?page=1&size=90&order=desc&orderby=name&exchange=CN&industry=%E5%9B%9E%E8%B4%AD&flag=1&_=1417428721184"); System.out.println(json.toString()); } } 引用Exception in thread "main" java.io.IOException: Server returned HTTP response code: 400 for URL: http://xueqiu.com/stock/cata/stocklist.json?page=1&size=90&order=desc&orderby=name&exchange=CN&industry=%E5%9B%9E%E8%B4%AD&flag=1&_=1417428721184at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1241)at com.test.test.readJsonFromUrl(test.java:37) at com.test.test.main(test.java:54) 这个url返回的是一个json文件,内容大概如下,具体可以点击链接查看: Json代码 收藏代码{"count":{"count":19.0},"success":"true","stocks":[{"symbol":"SZ395032","code":"395032","name":"债券回购","pettm":"","volume":"395277740","hasexist":"false","marketcapital":"0.0","current":"0.0","percent":"0.0","change":"0.0","high":"0.0","low":"0.0","high52w":"0.0","low52w":"0.0","trading_date":"","trading_days":"","actual_date":"","actual_days":"","net_profit":"","net_profit_day":"","net_profit_yield":"","net_cost":"","net_cost_day":"","net_cost_yield":""},{"symbol":"SH204001","code":"204001","name":"GC001","pettm":"","volume":"370904900","hasexist":"false","marketcapital":"0.0","current":"5.025","percent":"139.29","change":"2.925","high":"7.0","low":"4.0","high52w":"50.5","low52w":"0.1","trading_date":"","trading_days":"","actual_date":"","actual_days":"","net_profit":"","net_profit_day":"","net_profit_yield":"","net_cost":"","net_cost_day":"","net_cost_yield":""}]} 浏览器的heads:Remote Address:146.56.234.217:80Request URL:http://xueqiu.com/stock/cata/stocklist.json?page=1&size=90&order=desc&orderby=name&exchange=CN&industry=%E5%9B%9E%E8%B4%AD&flag=1&_=1417428721184Request Method:GETStatus Code:200 OKRequest Headersview sourceAccept:application/json, text/javascript, */*; q=0.01Accept-Encoding:gzip,deflate,sdchAccept-Language:zh-CN,zh;q=0.8Cache-Control:max-age=0Cookie:bid=2a0ffaa0c8c292e9752b4f52fa2e1a8e_i2zlirov; snbim_minify=true; last_account=35159618%40qq.com; xq_a_token=iXFl2kLorOVMsEDZ78hkeg; xq_r_token=Hs8CSFgNGhjhS6App0McWe; __utmt=1; Hm_lvt_1db88642e346389874251b5a1eded6e3=1417060711,1417146659,1417410467; Hm_lpvt_1db88642e346389874251b5a1eded6e3=1417428721; __utma=1.1126280283.1417060711.1417420694.1417428549.11; __utmb=1.2.10.1417428549; __utmc=1; __utmz=1.1417410467.8.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)Host:xueqiu.comProxy-Connection:keep-aliveReferer:http://xueqiu.com/hqUser-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36 LBBROWSERX-Requested-With:XMLHttpRequest

解决方案

换htmlunit试试,设置一个用户,模拟一个浏览器

时间: 2024-11-02 07:38:36

java抓取雪球数据时连接老是失败,不知道是否被屏蔽还是参数不对的相关文章

java抓取网页数据数据问题

问题描述 java抓取网页数据数据问题 !红线的位置有错误图片说明 解决方案 写的没见过--给你我常写的把. HttpClientBuilder httpClientBuilder = HttpClientBuilder.create(); CloseableHttpClient closeableHttpClient = httpClientBuilder.build(); HttpGet httpGet = new HttpGet(url); 解决方案二: org.apache.http.i

求助! 使用java抓取网页数据

问题描述 我想要抓取这个https://www.bicing.cat/es/formmap网上的有关自行车的信息改怎么样抓取??求指教! 解决方案 解决方案二:请求这个网址,然后把获取到的InputStream读出来,看看有没有你要的数据.newInputStreamReader(((HttpURLConnection)(newURL("https://www.bicing.cat/es/formmap")).openConnection()).getInputStream()); 解

html-如何用java抓取网页隐藏音频链接

问题描述 如何用java抓取网页隐藏音频链接 我想用java抓取荔枝FM网站上所有的音频文件,但是直接查看网页源码发现找不到页面的音频链接http://www.lizhi.fm/#/25734/20075765977745926,但是我用谷歌的Developer Tools能找到这个链接http://cdn.lizhi.fm/audio/2015/05/16/20075765977745926_hd.mp3,我有点不知道怎么搞了,求帮忙看看什么情况 解决方案 搜索这个页面的data-url 解决

抓取数据-用jsoup抓取网页数据的时候,在本地开发环境上面没有问题,但是把他部署到阿里云上面就不行

问题描述 用jsoup抓取网页数据的时候,在本地开发环境上面没有问题,但是把他部署到阿里云上面就不行 用jsoup抓取网页数据的时候,在本地开发环境上面没有问题, 但是把他部署到阿里云上面去的时候,就连接超时.但奇怪的是,我对4,5个网址进行抓取, 就一个网址连接超时,高手帮忙解决下 解决方案 有没有人在 有没有人在 有没有人在

java抓取https网页问题

问题描述 java抓取https网页问题 public static void getDocument() throws Exception{ Map<String,String> headMap=new HashMap<String,String>(); headMap.put("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8&qu

使用HtmlAgilityPack批量抓取网页数据

原文:使用HtmlAgilityPack批量抓取网页数据 相关软件点击下载 登录的处理.因为有些网页数据需要登陆后才能提取.这里要使用ieHTTPHeaders来提取登录时的提交信息. 抓取网页  HtmlAgilityPack.HtmlDocument htmlDoc;            if (!string.IsNullOrEmpty(登录URL))            {                htmlDoc = htmlWeb.Load(登录URL, 提交的用户验证信息,

导入-Java从excel读取数据时,能够自己选择excel文件,不用在代码中将文件目录写死!

问题描述 Java从excel读取数据时,能够自己选择excel文件,不用在代码中将文件目录写死! 项目是进行单机版和网络版的数据对比,需要将单机版导出来的数据(excel表格,固定格式)导入网络版进行对比,网络版在导入数据时能提示选择导入的excel文件,不用在Java代码中将要读取数据的excel表格的目录写死.谢谢您的帮助! 解决方案 JAVA读取EXCEL用的比较多的是POI类库,参考Java对Excel(0307)进行上传.解析.验证.入库,或者你搜索一下java poi,有很多文章的

PHP中4种常用的抓取网络数据方法

  本小节的名称为 fsockopen,curl与file_get_contents,具体是探讨这三种方式进行网络数据输入输出的一些汇总.关于 fsockopen 前面已经谈了不少,下面开始转入其它.这里先简单罗列一下一些常见的抓取网络数据的一些方法. 1. 用 file_get_contents 以 get 方式获取内容: ? 1 2 3 $url = 'http://localhost/test2.php'; $html = file_get_contents($url); echo $ht

java对hbase读取数据时运行代码到new htable就不动了,会的大神们可以加Q

问题描述 java对hbase读取数据时运行代码到new htable就不动了,会的大神们可以加Q 解决方案 有人么 大神们 帮忙看看