HttpClient获取响应内容类型Content-Type_Java开源博客系统-Powered by java1234

博客信息

HttpClient获取响应内容类型Content-Type

发布时间：『 2017-01-19 18:04』博客类别：httpclient

HttpClient获取响应内容类型Content-Type

响应的网页内容都有类型也就是Content-Type

通过火狐firebug，我们看响应头信息：

QQ鎴浘20170119175851.jpg

当然我们可以通过HttpClient接口来获取；

HttpEntity的getContentType().getValue() 就能获取到响应类型；

package com.open1111.httpclient.chap02;

import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

public class Demo2 {

	public static void main(String[] args) throws Exception{
		CloseableHttpClient httpClient=HttpClients.createDefault(); // 创建httpClient实例
		HttpGet httpGet=new HttpGet("http://www.java1234.com"); // 创建httpget实例
		httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0"); // 设置请求头消息User-Agent
		CloseableHttpResponse response=httpClient.execute(httpGet); // 执行http get请求
		HttpEntity entity=response.getEntity(); // 获取返回实体
		System.out.println("Content-Type:"+entity.getContentType().getValue());
		//System.out.println("网页内容："+EntityUtils.toString(entity, "utf-8")); // 获取网页内容
		response.close(); // response关闭
		httpClient.close(); // httpClient关闭
	}
}

运行输出：

Content-Type:text/html

一般网页是text/html当然有些是带编码的，

比如请求www.tuicool.com：输出：

Content-Type:text/html; charset=utf-8

假如请求js文件，比如 http://www.java1234.com/static/js/jQuery.js

运行输出：

Content-Type:application/javascript

假如请求的是文件，比如 http://central.maven.org/maven2/HTTPClient/HTTPClient/0.3-3/HTTPClient-0.3-3.jar

运行输出：

Content-Type:application/java-archive

当然Content-Type还有一堆，那这东西对于我们爬虫有啥用的，我们再爬取网页的时候，可以通过