當前位置:編程學習大全網 - 源碼下載 - 如何用JAVA爬取AJAX加載後的頁面

如何用JAVA爬取AJAX加載後的頁面

普通的爬取是抓不了js的之後的數據的 可以用phantomjs或者htmlUnit實現

附上phantomjs示列代碼

package cn.wang.utils;

import java.util.Random;

import com.gargoylesoftware.htmlunit.BrowserVersion;

import com.gargoylesoftware.htmlunit.CookieManager;

import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;

import com.gargoylesoftware.htmlunit.WebClient;

public class htmlUnitUtils {

static WebClient webClient = null;

static Random random = new Random();

static{

//1.創建對象

webClient = new WebClient(BrowserVersion.CHROME);

//2.設置參數

//啟動js

webClient.getOptions().setJavaScriptEnabled(true);

//關閉css渲染

webClient.getOptions().setCssEnabled(false);

//啟動重定向

webClient.getOptions().setRedirectEnabled(true);

//設置連接超時時間 ,這裏是10S。如果為0,則無限期等待

webClient.getOptions().setTimeout(1000 * 15);

//啟動cookie管理

webClient.setCookieManager(new CookieManager());

//啟動ajax代理

webClient.setAjaxController(new NicelyResynchronizingAjaxController());

//js運行時錯誤,是否拋出異常

webClient.getOptions().setThrowExceptionOnScriptError(false);

//設置瀏覽器請求信息

webClient.addRequestHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");

webClient.addRequestHeader("Accept-Encoding", "gzip, deflate");

webClient.addRequestHeader("Accept-Language", "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2");

webClient.addRequestHeader("Connection", "keep-alive");

webClient.addRequestHeader("Upgrade-Insecure-Requests", "1");

}

public static void runJs(String url){

try {

webClient.addRequestHeader("User-Agent", Constant.useragents[random.nextInt(Constant.useragents.length)]);

//等待js渲染執行 waitime等待時間(ms)

webClient.waitForBackgroundJavaScript(1000 * 10);

//3.獲取頁面

webClient.getPage(url);

} catch (Exception e) {

e.printStackTrace();

} finally {

if(webClient != null){

webClient.close();

}

}

}

public static void main(String[] args) {

runJs("http://www.gou.hk/");

System.setProperty("phantomjs.binary.path", "D:\\works\\tool\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe");

}

}

  • 上一篇:cocos creator 2.4.0 渲染流程詳解(七:ForwardRender)
  • 下一篇:企業戰略經典案例詳細分析(2)
  • copyright 2024編程學習大全網