HttpClient学习
(1)下面列举几个主要的Http相关概念的类
类名 | 描述 |
---|---|
HttpClient | 建立请求客户端 |
HttpGet | 代表请求方法,类似的还有HttpHead, HttpPost, HttpPut, HttpDelete, HttpTrace, HttpOptions等 |
HttpResponse | 表示请求的响应(包括响应状态、协议等头信息,Header封装各种头信息,头信息又包括HeaderElement,都可以采用迭代器的方式进行迭代读取) |
HttpEntity | 表示相应的实体,用于存放传送的内容,也就是body体,存在于request和response中,request只有post和put方法有,response中都有Entity,除了一些特殊情况不包含内容。Entity根据来源分为三种:streamed,一次读取;wrapping,从其他entity封装;self-contained,从内存中读取,可反复读。 |
URIBuilder | 工具类用来生成url,主要是设置协议、域名和路径,还有各种参数等 |
(2)HttpEntity的几个主要函数
函数 | 描述 |
---|---|
getContentType | 获取内容类型 |
getContentLength | 获取内容长度 |
getContent | 获取内容的输入流InputStream |
- HttpEntity entity=response.getEntity();
- System.out.println(entity.getContentType());
- System.out.println(entity.getContentLength());
- InputStream in=entity.getContent();//直接获取输入流,一次读取
(3)HttpEntity直接获取的streamed流
- 只能读取一次,如果想读取多次,就要进行缓存,利用wrapping方式将streamed进行包装BufferedHttpEntity
- BufferedHttpEntity bufEntity=new BufferedHttpEntity(entity);//通过构造形式封装进缓存,可多次读取
(4)HttpEntity也可放在post和put方法的请求中
- 作为请求传递的内容。内容可以是文件,也可以提交form参数
- File file=new File("out.txt");
- FileEntity fileEntity=new FileEntity(file, ContentType.create("text/plain", "UTF-8"));//文件内容输入
- List<NameValuePair> formparams = new ArrayList<NameValuePair>();
- formparams.add(new BasicNameValuePair("param1", "value1"));
- formparams.add(new BasicNameValuePair("param2", "value2"));
- UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(formparams, "UTF-8");//form表单内容输入
- HttpPost post=new HttpPost("http://www.baidu.com");
- post.setEntity(fileEntity);
(5)response处理类最方便的是ResponseHandler类,它的功能是将entity转化为不同的内容格式
- ResponseHandler<byte[]> handler = new ResponseHandler<byte[]>() {
- public byte[] handleResponse(
- HttpResponse response) throws ClientProtocolException, IOException {
- HttpEntity entity = response.getEntity();
- if (entity != null) {
- return EntityUtils.toByteArray(entity);
- } else {
- return null;
- }
- }
- };
- byte[] response = httpclient.execute(httpget, handler);
- ResponseHandler<String> handler1=new BasicResponseHandler();
- String response1= httpclient.execute(httpget,handler1);
(6)request请求时可以设置一些http参数httpparam
和httpcontext相似,httpclient可以设置客户端范围的,httprequest也可以设置,但是请求范围的。
参数名 | 描述 |
---|---|
CoreProtocolPNames.PROTOCOL_VERSION='http.protocol.version' | 协议版本 |
CoreProtocolPNames.HTTP_ELEMENT_CHARSET='http.protocol.element-charset' | 协议元素编码 |
CoreProtocolPNames.HTTP_CONTENT_CHARSET='http.protocol.content-charset' | 协议内容编码 |
CoreProtocolPNames.USER_AGENT='http.useragent' | 用户端,写爬虫的时候有用 |
CoreProtocolPNames.STRICT_TRANSFER_ENCODING='http.protocol.strict-transfer-encoding' | (... |
CoreProtocolPNames.USE_EXPECT_CONTINUE='http.protocol.expect-continue' | ... |
CoreProtocolPNames.WAIT_FOR_CONTINUE='http.protocol.wait-for-continue' | ... |
- httpclient.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION,
- HttpVersion.HTTP_1_0); // Default to HTTP 1.0
- httpclient.getParams().setParameter(CoreProtocolPNames.HTTP_CONTENT_CHARSET,
- "UTF-8");
- HttpGet httpget = new HttpGet("http://www.google.com.hk/");
- httpget.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION,
- HttpVersion.HTTP_1_1); // Use HTTP 1.1 for this request only
- httpget.getParams().setParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE,
- Boolean.FALSE);
(7)httpclient完成了对connection的控制
但是上面的方法都没有涉及连接的设置,这里提供一些参数可以进行设置通过HttpParam设置
参数 | 描述 |
---|---|
CoreConnectionPNames.SO_TIMEOUT='http.socket.timeout' | 等待数据的最大时间,也就是两段连续数据读取之间的间隔 |
CoreConnectionPNames.TCP_NODELAY='http.tcp.nodelay' | bool值,设置是否应用Naple算法,该算法最小化发送的包数,因此每个包很大,占带宽,有延迟 |
CoreConnectionPNames.SOCKET_BUFFER_SIZE='http.socket.buffer-size' | 设置接发数据的缓冲区大小 |
CoreConnectionPNames.SO_LINGER='http.socket.linger' | ... |
CoreConnectionPNames.CONNECTION_TIMEOUT='http.connection.timeout' | 设置连接超时 |
CoreConnectionPNames.STALE_CONNECTION_CHECK='http.connection.stalecheck' | ... |
CoreConnectionPNames.MAX_LINE_LENGTH='http.connection.max-line-length' | 设置每行最大长度 |
CoreConnectionPNames.MAX_HEADER_COUNT='http.connection.max-header-count' | 设置头最大数量 |
ConnConnectionPNames.MAX_STATUS_LINE_GARBAGE='http.connection.max-status-line-garbage' | ... |
(8)实际应用的中,从连接池里获取连接是比较好的方法,连接池负责管理连接。
- BasicClientConnectionManager man=new BasicClientConnectionManager();//最基本的连接池,一次只维护一个连接
- System.out.println(httpclient.getConnectionManager().getClass());//输出class org.apache.http.impl.conn.BasicClientConnectionManager
- //下面采用PoolingClientConnectionManager连接池管理,该连接池支持多线程操作
- if(httpConnManger==null)
- {
- SchemeRegistry schemeRegistry = new SchemeRegistry();
- schemeRegistry.register(
- new Scheme("http", 80, PlainSocketFactory.getSocketFactory()));
- httpConnManger=new PoolingClientConnectionManager(schemeRegistry);
- httpConnManger.setMaxTotal(10);
- httpConnManger.setDefaultMaxPerRoute(20);
- }
- HttpParams params=new BasicHttpParams();
- params.setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, CONNECTION_TIME);
- HttpClient httpClient=new DefaultHttpClient(httpConnManger,params);
- HttpGet httpGet=new HttpGet(urlAddr);
- HttpResponse response;
- try {
- response = httpClient.execute(httpGet);
- } catch (ClientProtocolException e) {
- log.error(e.getMessage());
- return null;
- } catch (IOException e) {
- log.error(e.getMessage());
- return null;
- }
(9)需要代理的请求,设置HttpProxy
- HttpHost proxy = new HttpHost("127.0.0.1", 8080, "http");
- DefaultHttpClient httpclient = new DefaultHttpClient();
- httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy);
- HttpHost target = new HttpHost("issues.apache.org", 443, "https");
- HttpGet req = new HttpGet("/");
- System.out.println("executing request to " + target + " via " + proxy);
- HttpResponse rsp = httpclient.execute(target, req);
- HttpEntity entity = rsp.getEntity();
(10)需要登录验证的请求
- httpclient.getCredentialsProvider().setCredentials(
- new AuthScope("localhost", 443),
- new UsernamePasswordCredentials("username", "password"));
- HttpGet httpget = new HttpGet("https://localhost/protected");
- System.out.println("executing request" + httpget.getRequestLine());
- HttpResponse response = httpclient.execute(httpget);
- HttpEntity entity = response.getEntity();
- System.out.println("----------------------------------------");
- System.out.println(response.getStatusLine());
- if (entity != null) {
- System.out.println("Response content length: " + entity.getContentLength());
- }
- EntityUtils.consume(entity);