Spider 0.0.2.20 just changes the supported version of Firefox to 3.6 to match the updated versions for Shiretoko and trunk.
Bob Clary’s ramblings about Mozilla, Web Development, everything and nothing…
Spider 0.0.2.20 just changes the supported version of Firefox to 3.6 to match the updated versions for Shiretoko and trunk.
Mohamed says:
Hello,
Is there a way to handle (make use of) popups (and popunders) that spring out when the plugin spiders through the web pages?
Is it correct to assume that the document available for call back functions in the hook script, won’t contain the elements in the pop up windows?
Thank you.
June 23, 2009, 12:11 pmbc says:
You can use a “windowwatcher” like I used in the dialog closer. see http://bclary.com/projects/spider/spider/chrome/content/spider/dialog-closer.js for an example.
You can find other examples at http://mxr.mozilla.org/mozilla-central/search?string=nsIWindowWatcher&find=js&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central
June 23, 2009, 12:45 pmBrad says:
Is there a way to make spider enter a username and password when it either goes to a specific URL or it detects the presence of username and password fields? Is there a way to make spider ignore a specific link i.e. the Logout link on a page it’s crawling?
July 8, 2009, 9:35 ambc says:
Brad,
Unfortunately not at the moment, but I consider the inability to be a bug. I’ll try to fix it in the next few days.
Bob
July 8, 2009, 11:20 pmMohamed says:
When I tried to spider through http://www.hotmail.com , it does not pick up any link at all. Why is it so special? (Even after I logged into hotmail, it is the same case).
Platform: Windows XP, FF 3.5.2, Spider 0.0.2.20
Thanks
September 3, 2009, 2:08 pmMohamed
Mohamed says:
This is the message on the console:
Spider: Start: -url “http://www.hotmail.com” -depth 1 -timeout 120 -wait 5 -csserrors -httpresponses
September 3, 2009, 2:11 pmSpider: Begin loading http://www.hotmail.com
Spider: Finish loading http://www.hotmail.com
Spider: Current Url: http://www.hotmail.com, Referer: null, Depth: 0
Spider: HTTP Response: originalURI: http://www.hotmail.com/ URI: http://www.hotmail.com/ referer: undefined status: 302 status text: found content-type: text/html; charset=utf-8 succeeded: false
Spider: HTTP Response: originalURI: http://www.hotmail.com/ URI: http://login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1252011915&rver=5.5.4177.0&wp=MBI&wreply=http:%2F%2Fmail.live.com%2Fdefault.aspx&lc=1033&id=64855&mkt=en-US referer: undefined status: 200 status text: ok content-type: text/html; charset=iso-8859-1 succeeded: true
Spider: stopped… loaded 1 pages
bc says:
I think the problem is that the original url is redirected, but Spider is seeing the original response where there are no links available. I’ll have to handle redirected pages better I think. Sorry for the delay in responding, I missed the comments in my feeds somehow.
September 6, 2009, 6:36 amMohamed says:
When I invoke the spider from the command line in chrome mode passing parameters, the window resizes itself to a smaller dimension. If you don’t pass any parameter, this does not happen.
For example, if you issue the command (the example you have provided in your help page)
firefox -P test -chrome “chrome://spider/content/spider.xul?url%3Dhttp%253A%252F%252Fbclary.com%252F%26domain%3Dbclary.com%26depth%3D2%26timeout%3D120%26waittime%3D5%26autostart%3Don%26restrict%3Don”
you can see the window resizes itself. If you just call “firefox -P test -chrome chrome://spider/content/open.xul” no resizing occurs.
When used in automated environments, this resizing causes problems (Some web pages with new rich media content, for example, don’t load properly when the window size is smaller).
Is there a remedy for this?
Thanks
September 10, 2009, 9:12 pmMohamed
bc says:
try the command line arguments as described in help.
firefox -P test -spider -url http://bclary.com/ -domain bclary.com -depth 2 -timeout 120 -wait 5 -start
September 10, 2009, 11:41 pmMohamed says:
Thanks. That fixes the resizing issue. The drawback with this method is that I could not find a way to pass any parameter to the user hook function.
With the chrome application, I could pass parameters as querystring via open.xul.
September 11, 2009, 12:07 pmbc says:
I’ll look into why the resizing is occurring and see what I can do. I’ll also think about adding a new command line argument to pass name/value pairs to the Spider.
September 11, 2009, 1:24 pm