YASU – Spider 0.0.2.20

Spider 0.0.2.20 just changes the supported version of Firefox to 3.6 to match the updated versions for Shiretoko and trunk.

This entry was posted in Spider. Bookmark the permalink.

11 Responses to YASU – Spider 0.0.2.20

  1. Mohamed says:

    Hello,

    Is there a way to handle (make use of) popups (and popunders) that spring out when the plugin spiders through the web pages?

    Is it correct to assume that the document available for call back functions in the hook script, won’t contain the elements in the pop up windows?

    Thank you.

  2. bc says:

    You can use a “windowwatcher” like I used in the dialog closer. see http://bclary.com/projects/spider/spider/chrome/content/spider/dialog-closer.js for an example.

    You can find other examples at http://mxr.mozilla.org/mozilla-central/search?string=nsIWindowWatcher&find=js&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central

  3. Brad says:

    Is there a way to make spider enter a username and password when it either goes to a specific URL or it detects the presence of username and password fields? Is there a way to make spider ignore a specific link i.e. the Logout link on a page it’s crawling?

  4. bc says:

    Brad,

    Unfortunately not at the moment, but I consider the inability to be a bug. I’ll try to fix it in the next few days.

    Bob

  5. Mohamed says:

    When I tried to spider through http://www.hotmail.com , it does not pick up any link at all. Why is it so special? (Even after I logged into hotmail, it is the same case).

    Platform: Windows XP, FF 3.5.2, Spider 0.0.2.20

    Thanks
    Mohamed

  6. Mohamed says:

    This is the message on the console:

    Spider: Start: -url “http://www.hotmail.com” -depth 1 -timeout 120 -wait 5 -csserrors -httpresponses
    Spider: Begin loading http://www.hotmail.com
    Spider: Finish loading http://www.hotmail.com
    Spider: Current Url: http://www.hotmail.com, Referer: null, Depth: 0
    Spider: HTTP Response: originalURI: http://www.hotmail.com/ URI: http://www.hotmail.com/ referer: undefined status: 302 status text: found content-type: text/html; charset=utf-8 succeeded: false
    Spider: HTTP Response: originalURI: http://www.hotmail.com/ URI: http://login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1252011915&rver=5.5.4177.0&wp=MBI&wreply=http:%2F%2Fmail.live.com%2Fdefault.aspx&lc=1033&id=64855&mkt=en-US referer: undefined status: 200 status text: ok content-type: text/html; charset=iso-8859-1 succeeded: true
    Spider: stopped… loaded 1 pages

  7. bc says:

    I think the problem is that the original url is redirected, but Spider is seeing the original response where there are no links available. I’ll have to handle redirected pages better I think. Sorry for the delay in responding, I missed the comments in my feeds somehow.

  8. Mohamed says:

    When I invoke the spider from the command line in chrome mode passing parameters, the window resizes itself to a smaller dimension. If you don’t pass any parameter, this does not happen.

    For example, if you issue the command (the example you have provided in your help page)
    firefox -P test -chrome “chrome://spider/content/spider.xul?url%3Dhttp%253A%252F%252Fbclary.com%252F%26domain%3Dbclary.com%26depth%3D2%26timeout%3D120%26waittime%3D5%26autostart%3Don%26restrict%3Don”

    you can see the window resizes itself. If you just call “firefox -P test -chrome chrome://spider/content/open.xul” no resizing occurs.

    When used in automated environments, this resizing causes problems (Some web pages with new rich media content, for example, don’t load properly when the window size is smaller).

    Is there a remedy for this?

    Thanks
    Mohamed

  9. bc says:

    try the command line arguments as described in help.

    firefox -P test -spider -url http://bclary.com/ -domain bclary.com -depth 2 -timeout 120 -wait 5 -start

  10. Mohamed says:

    Thanks. That fixes the resizing issue. The drawback with this method is that I could not find a way to pass any parameter to the user hook function.

    With the chrome application, I could pass parameters as querystring via open.xul.

  11. bc says:

    I’ll look into why the resizing is occurring and see what I can do. I’ll also think about adding a new command line argument to pass name/value pairs to the Spider.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>