Spider 0.0.2.20 just changes the supported version of Firefox to 3.6 to match the updated versions for Shiretoko and trunk.
11 thoughts on “YASU – Spider 0.0.2.20”
Comments are closed.
Bob Clary's ramblings about Mozilla, everything and nothing
Spider 0.0.2.20 just changes the supported version of Firefox to 3.6 to match the updated versions for Shiretoko and trunk.
Comments are closed.
Hello,
Is there a way to handle (make use of) popups (and popunders) that spring out when the plugin spiders through the web pages?
Is it correct to assume that the document available for call back functions in the hook script, won’t contain the elements in the pop up windows?
Thank you.
You can use a “windowwatcher” like I used in the dialog closer. see http://bclary.com/projects/spider/spider/chrome/content/spider/dialog-closer.js for an example.
You can find other examples at http://mxr.mozilla.org/mozilla-central/search?string=nsIWindowWatcher&find=js&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central
Is there a way to make spider enter a username and password when it either goes to a specific URL or it detects the presence of username and password fields? Is there a way to make spider ignore a specific link i.e. the Logout link on a page it’s crawling?
Brad,
Unfortunately not at the moment, but I consider the inability to be a bug. I’ll try to fix it in the next few days.
Bob
When I tried to spider through http://www.hotmail.com , it does not pick up any link at all. Why is it so special? (Even after I logged into hotmail, it is the same case).
Platform: Windows XP, FF 3.5.2, Spider 0.0.2.20
Thanks
Mohamed
This is the message on the console:
Spider: Start: -url “http://www.hotmail.com” -depth 1 -timeout 120 -wait 5 -csserrors -httpresponses
Spider: Begin loading http://www.hotmail.com
Spider: Finish loading http://www.hotmail.com
Spider: Current Url: http://www.hotmail.com, Referer: null, Depth: 0
Spider: HTTP Response: originalURI: http://www.hotmail.com/ URI: http://www.hotmail.com/ referer: undefined status: 302 status text: found content-type: text/html; charset=utf-8 succeeded: false
Spider: HTTP Response: originalURI: http://www.hotmail.com/ URI: http://login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1252011915&rver=5.5.4177.0&wp=MBI&wreply=http:%2F%2Fmail.live.com%2Fdefault.aspx&lc=1033&id=64855&mkt=en-US referer: undefined status: 200 status text: ok content-type: text/html; charset=iso-8859-1 succeeded: true
Spider: stopped… loaded 1 pages
I think the problem is that the original url is redirected, but Spider is seeing the original response where there are no links available. I’ll have to handle redirected pages better I think. Sorry for the delay in responding, I missed the comments in my feeds somehow.
When I invoke the spider from the command line in chrome mode passing parameters, the window resizes itself to a smaller dimension. If you don’t pass any parameter, this does not happen.
For example, if you issue the command (the example you have provided in your help page)
firefox -P test -chrome “chrome://spider/content/spider.xul?url%3Dhttp%253A%252F%252Fbclary.com%252F%26domain%3Dbclary.com%26depth%3D2%26timeout%3D120%26waittime%3D5%26autostart%3Don%26restrict%3Don”
you can see the window resizes itself. If you just call “firefox -P test -chrome chrome://spider/content/open.xul” no resizing occurs.
When used in automated environments, this resizing causes problems (Some web pages with new rich media content, for example, don’t load properly when the window size is smaller).
Is there a remedy for this?
Thanks
Mohamed
try the command line arguments as described in help.
firefox -P test -spider -url http://bclary.com/ -domain bclary.com -depth 2 -timeout 120 -wait 5 -start
Thanks. That fixes the resizing issue. The drawback with this method is that I could not find a way to pass any parameter to the user hook function.
With the chrome application, I could pass parameters as querystring via open.xul.
I’ll look into why the resizing is occurring and see what I can do. I’ll also think about adding a new command line argument to pass name/value pairs to the Spider.