YASU – Spider

Spider just changes the supported version of Firefox to 3.6 to match the updated versions for Shiretoko and trunk.

11 thoughts on “YASU – Spider”

  1. Hello,

    Is there a way to handle (make use of) popups (and popunders) that spring out when the plugin spiders through the web pages?

    Is it correct to assume that the document available for call back functions in the hook script, won’t contain the elements in the pop up windows?

    Thank you.

  2. Is there a way to make spider enter a username and password when it either goes to a specific URL or it detects the presence of username and password fields? Is there a way to make spider ignore a specific link i.e. the Logout link on a page it’s crawling?

  3. Brad,

    Unfortunately not at the moment, but I consider the inability to be a bug. I’ll try to fix it in the next few days.


  4. When I tried to spider through http://www.hotmail.com , it does not pick up any link at all. Why is it so special? (Even after I logged into hotmail, it is the same case).

    Platform: Windows XP, FF 3.5.2, Spider


  5. This is the message on the console:

    Spider: Start: -url “http://www.hotmail.com” -depth 1 -timeout 120 -wait 5 -csserrors -httpresponses
    Spider: Begin loading http://www.hotmail.com
    Spider: Finish loading http://www.hotmail.com
    Spider: Current Url: http://www.hotmail.com, Referer: null, Depth: 0
    Spider: HTTP Response: originalURI: http://www.hotmail.com/ URI: http://www.hotmail.com/ referer: undefined status: 302 status text: found content-type: text/html; charset=utf-8 succeeded: false
    Spider: HTTP Response: originalURI: http://www.hotmail.com/ URI: http://login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1252011915&rver=5.5.4177.0&wp=MBI&wreply=http:%2F%2Fmail.live.com%2Fdefault.aspx&lc=1033&id=64855&mkt=en-US referer: undefined status: 200 status text: ok content-type: text/html; charset=iso-8859-1 succeeded: true
    Spider: stopped… loaded 1 pages

  6. I think the problem is that the original url is redirected, but Spider is seeing the original response where there are no links available. I’ll have to handle redirected pages better I think. Sorry for the delay in responding, I missed the comments in my feeds somehow.

  7. When I invoke the spider from the command line in chrome mode passing parameters, the window resizes itself to a smaller dimension. If you don’t pass any parameter, this does not happen.

    For example, if you issue the command (the example you have provided in your help page)
    firefox -P test -chrome “chrome://spider/content/spider.xul?url%3Dhttp%253A%252F%252Fbclary.com%252F%26domain%3Dbclary.com%26depth%3D2%26timeout%3D120%26waittime%3D5%26autostart%3Don%26restrict%3Don”

    you can see the window resizes itself. If you just call “firefox -P test -chrome chrome://spider/content/open.xul” no resizing occurs.

    When used in automated environments, this resizing causes problems (Some web pages with new rich media content, for example, don’t load properly when the window size is smaller).

    Is there a remedy for this?


  8. Thanks. That fixes the resizing issue. The drawback with this method is that I could not find a way to pass any parameter to the user hook function.

    With the chrome application, I could pass parameters as querystring via open.xul.

  9. I’ll look into why the resizing is occurring and see what I can do. I’ll also think about adding a new command line argument to pass name/value pairs to the Spider.

Leave a Reply

Your email address will not be published. Required fields are marked *