How to crawl dynamic contents by Seaflower
Seaflower is the world's first DOM crawl server.
There are some ways to crawl dynamic contents in page:
1.crawl after waiting for a short time
You should send "wait-time xx" to Seaflower, make Seaflower waiting xx seconds, then crawl page contents.
2.crawl by executing some javascript functions
Send commands like this: EXEC test()
This will execute test() function in current page.
3.crawl by emulating button click
Use command like this: EXEC getNodeByXPath(xpath).onclick()
where xpath is the absolute xpath value of target node, eg. EXEC getNodeByXPath('/html/body/input[1]').onclick()
just like you clicking the first button on the page. Note, getNodeByXPath is a function provided by Seaflower.
4.Send specific HTTP headers, then crawl
You may use these:
http-header user-agent: good
http-header referer: http://www.google.com
and so on.
Let's crawl the hidden web by Seaflower!
(C) 2024
ZHUATANG.COM, All rights reserved
update: 2013-06-07