Mozilla Spider News

Bob Clary


The latest news on updates to the Mozilla Spider

Download Spider

May 1, 2017

Release Spider

December 30, 2015

Bug 1235936 - Sisyphus - Bughunter - release Spider

  • Update supported application versions to match add on validator requirements.

  • Do not use a string as an argument to setTimeout to meet add on validator requirement.

  • Handle missing console strings typeName in console listener.

  • Do not force timeouts in the WebProgressListener after a Location change error if the request is null.

  • Sign spider-

October 8, 2015

Bug 1221124 - Sisyphus - Bughunter - Spider - Ensure Progress Listener does not dereference null mCurrentUrl.

  • Ensure Progress Listener does not dereference null mCurrentUrl

August 5, 2015

Bug 1190890 - Sisyphus - Replace HTTP observer with WebProgressListener to determine page loads.

  • Replaced HTTP observer with WebProgressListener

  • Removed FormPersist

  • Updated script compilation handling

  • Removed support for HTML and XUL versions.

  • Removed signed.applets.codebase_principal_support

  • Removed support for loading Spider from querystring

August 29, 2012

  • Changed status line to display buildId rather than build date

  • Removed all checks for

May 22, 2012

  • Change Gecko compatibility to support Firefox 15

  • Enables optional spidering of file:// urls using the -fileurls option.

November 9, 2011

  • Change Gecko compatibility to support Firefox 11

October 28, 2011

Spider was an update which I failed to push to the web site but which was available from Sorry that I wasn't more timely in announcing the update. Spider included significant changes:

  • Load handlers where changed from using useCapture true to false. This is I believe a better approach than what I have used in the past, but if you see any issues with pages failing to fire the load event please contact me.

  • CSpider no longer uses direct CallWrappers to detect page time outs. Instead, time out detection is now handled by the CPageLoader.

  • The CPageLoader's initialization has been changed to include explicit references to:

    • the content element used to load the web page;

    • the callback function used to handle the load event;

    • the callback function used to handle page time outs;

    • the time out interval;

    • and the http responses observer.

The change to remove the singleton nature of the CHTTPResponseObserver was to allow the use of additional CHTTPResponseObservers which could be used in userhook functions. I found this very helpful when performing investigations of how the effect of a 2 digit rv: value in Firefox 10 might affect web site's user agent detection of Firefox.

August 24, 2011

Spider includes changes to:

  • Change Gecko compatibility to support Firefox 9

  • Fix Makefiles to properly rebuild spider.jar

March 28, 2009

Spider relaxes the checking of access to UniversalXPConnect privileges using which allows the HTML version of Spider to operate correctly in Firefox and other Gecko browsers. However, there remain some cross-browser compatiblity issues which prevent Internet Explorer, Opera and Safari from running the HML version of Spider. Future updates may address cross-browser compatibility.

March 17, 2009

Spider bumps Firefox support to version 3.6

December 18, 2008

Spider adds .torrent to the list of excluded file extensions.

October 21, 2008

Spider Added Fennec (Firefox mobile development builds) to compatible applications for Spider.

June 29, 2008

Spider Added Minefield (Firefox 3.1 development builds) to compatible applications for Spider.

March 24, 2008

Spider removes the use of mozIJSSubScriptLoader.

February 8, 2008

Spider fixed a bug introduced in regarding the HTTP responses. Spider makes a change to how the page timeout handling is performed. If there is a user defined page timeout handler, it is now called prior to the load being cancelled. This allows the timeout handler to have access to the DOM of the page in the event the timeout was due to a userhook script timing out rather than the actual page loading timing out.

January 30, 2008

Spider no longer automatically dumps HTTP responses from the content it retrieves. The new command line argument


can be used to enable this feature if desired.

September 10, 2007

Spider removes the auto update functionality due to bug 378216. You will need to update Spider manually until such time as I have a secure method for automatic updates.

July 31, 2007

Spider reverses the changes made to the script-loader.js on the Gecko 1.8 branch where scope.eval(...) will still be used while mozIJSSubScriptLoader will continue to be used on the trunk due to death of a thousand cuts.

June 11, 2007

Spider replaces the use of scope.eval(...) in script-loader.js with mozIJSSubScriptLoader in order to work around a change on the Gecko trunk where eval is being killed by the death of a thousand cuts.

June 6, 2007

Spider is the same as Spider The version change is to force an update.

April 28, 2007

Spider adds uri as a synonym for the url command line parameter.

March 9, 2007

Spider includes a fix for determining and requesting enhanced privileges in the HTML and remote XUL versions.

November 27, 2006

Spider includes the following changes:

  • added chrome: urls to the allowed protocols which are allowed. This allows the loading and scripting of the browser chrome via userhook functions.

  • User Hook error messages now include the name of the userhook function where the error occured.

  • Spider generated messages such as “Begin loading...” now include prefix Spider: to indicate the source of the message.

November 5, 2006

Spider adds an nsIHttpChannel response observer to CPageLoader.prototype.load to capture the response codes of the HTTP requests made when loading the next page. The responses are dumped to stdout and are available as properties of the gSpider.mCurrentUrl object. See CSpider's help for more information on mCurrentUrl.

The CUrl has been modified to include additional information about the HTTP responses to the requests for pages. See CSpider for more details.

October 19, 2006

Spider removes a debugging alert which was accidentially left in the code. Sorry.

October 15, 2006

Spider adds support for command line parameters and fixes the update configuration so that you will be able to use Firefox's add-ons update facility to update Spider in later versions.

September 24, 2006

Spider adds additional multimedia file types to be exluded from downloads to prevent the download dialogs from blocking execution.

August 28, 2006

Spider modifies the dialog closing logic in dialog-closer.js to work around issues with notifications, race conditions and multiple simulataneous dialogs being visible; fixes the broken link to Help; and makes file:// a supported scheme.

August 1, 2006

Spider modifies collectGarbage() in utils.js to fall back on a method by Igor Bukanov to programatically force a gc without requiring venkman or extra privileges.

July 31, 2006

Spider includes the ability to automatically close alerts, prompts, confirms and other common dialogs through the use of calls to registerDialogCloser() and unregisterDialogCloser() in userhook functions.

For an example, see closedialoghooks.js where registerDialogCloser() is called in the userhook userOnBeforePage() and unregisterDialogCloser() is called in the userhook userOnAfterPage.

June 28, 2006

Spider includes minor changes to remove the global gXMLHttpRequest and instead use the new loadScript() function to load the user hook functions.

June 4, 2006

Spider tweaks loadScript(aScriptUrl[, aScope]) to allow javascript url injection for cross-domain pages.

June 1, 2006

Spider introduces a new function loadScript(aScriptUrl[, aScope]) available for use by user hook functions to load external script. See script-loader.js.

May 17, 2006

A minor change to how Spider requests enhanced privileges was made to Spider when run using a chrome url chrome://spider/content but not as a chrome application.

This is a very minor update and you don't need to update unless you use Spider via the chrome url without running it from the command line as a chrome application.

May 8, 2006

Spider has been updated to version to fix an issue when load events do not fire on #document notes in remote XUL applications.

                 diff -ru /cygdrive/z/cygwin/home/bclary/cvs-projects/ ./chrome/content/spider/spider.js
                 --- /cygdrive/z/cygwin/home/bclary/cvs-projects/	2006-05-04 02:05:25.000000000 -0400
                 +++ ./chrome/content/spider/spider.js	2006-05-08 19:03:58.182907200 -0400
                 @@ -38,11 +38,19 @@
                 function loadHandler(evt)
                 -  dlog('loadHandler: bubbles=' + evt.bubbles + ', currentTarget=' + evt.currentTarget + ', eventPhase=' + evt.eventPhase + ', target=' + + ', originalTarget=' + evt.originalTarget + ', type=' + evt.type);
                 -  if (Event.BUBBLING_PHASE == evt.eventPhase)
                 +  try
                 -//    evt.preventDefault();
                 -//    evt.stopPropagation();
                 +    dlog('loadHandler: timeStamp=' + evt.timeStamp + 
                 +         ', bubbles=' + evt.bubbles + 
                 +         ', currentTarget=' + evt.currentTarget + 
                 +         ', eventPhase=' + evt.eventPhase + 
                 +         ', target=' + + 
                 +         ', originalTarget=' + evt.originalTarget + 
                 +         ', type=' + evt.type);
                 +  }
                 +  catch(ex)
                 +  {
                 +    dlog('loadHandler: ' + ex + '');
                 @@ -66,7 +74,8 @@
                 function init(evt, querystring)
                 -  dlog('init: bubbles=' + evt.bubbles + 
                 +  dlog('init: timeStamp=' + evt.timeStamp + 
                 +       ', bubbles=' + evt.bubbles + 
                 ', currentTarget=' + evt.currentTarget + 
                 ', eventPhase=' + evt.eventPhase + 
                 ', target=' + + 
                 @@ -715,13 +724,25 @@
                 this.content = document.getElementById('contentSpider');
                 this.onload = 
                 function(evt) {
                 -      dlog('CPageLoader.onload: phase ' + evt.eventPhase + ', target ' + + ', originalTarget ' + evt.originalTarget);
                 +      dlog('CPageLoader.onload: phase ' + evt.eventPhase + 
                 +           ', target ' + + 
                 +           ', originalTarget ' + evt.originalTarget);
                 // prevent duplicate load events in XUL
                 -      if ( != '#document')
                 +      try
                 +      {
                 +        if ( != '#document' &&
                 + != 'xul:browser')
                 +        {
                 +          dlog('CPageLoader.onload: ignore non document target ' + 
                 +  ;
                 +          return;
                 +        }
                 +      }
                 +      catch(ex)
                 -        dlog('CPageLoader.onload: ignore non document targets');
                 -        return;
                 +        dlog('CpageLoader.onload: ' + ex);
                 +//        return;
                 if (gPageLoader.onload)
                 Only in ./chrome/content/spider: spider.js~
                 diff -ru /cygdrive/z/cygwin/home/bclary/cvs-projects/ ./chrome/content/spider/version.js
                 --- /cygdrive/z/cygwin/home/bclary/cvs-projects/	2006-05-03 19:19:07.000000000 -0400
                 +++ ./chrome/content/spider/version.js	2006-05-08 19:09:50.679772800 -0400
                 @@ -5,8 +5,8 @@
                 div with id="version" to output the version string.  
                 -var _version = '';
                 -var _date = 'May 3, 2006'
                 +var _version = '';
                 +var _date = 'May 8, 2006'
                 function sayVersion()
                 Only in ./chrome/content/spider: version.js~
                 diff -ru /cygdrive/z/cygwin/home/bclary/cvs-projects/ ./install.rdf
                 --- /cygdrive/z/cygwin/home/bclary/cvs-projects/	2006-05-03 19:19:24.000000000 -0400
                 +++ ./install.rdf	2006-05-08 19:10:16.817356800 -0400
                 @@ -6,7 +6,7 @@
                 -		<em:version></em:version>
                 +		<em:version></em:version>
 		 <em:description>A Mozilla-based Web Spider</em:description>
 		 <em:creator>Bob Clary</em:creator>
May 3, 2006

Spider has been updated to version to fix an issue in trunk builds where Spider was unable to locate links for depths greater than 0 when run as a chrome XUL application.

             --- /cygdrive/z/cygwin/home/bclary/cvs-projects/	2005-09-05 21:23:42.000000000 -0400
             +++ spider/chrome/content/spider/CSpider.js	2006-05-03 22:21:00.224795200 -0400
             @@ -525,6 +525,8 @@
             // since it appears the onload event fires multiple times
             // for a xul:iframe but not for an html:iframe. 
             // go figure.
             +    // See
             +    // Fixed on trunk
             dlog('CSpider.onLoadPage called with state ' + this.mState +
             '. links not added to page stack');
             @@ -532,11 +534,21 @@
             var i;
             var links  = this.mDocument.links;
             -  if (links)
             +  if (!links)
             +  {
             +    dlog('CSpider_onLoadPage: no document.links found in document');
             +  }
             +  else
             var length = links.length;
             var href;
             +    if (length == 0)
             +    {
             +      dlog('CSpider_onLoadPage: document.links.length == 0');
             +    }
             for (i = 0; i < length; ++i)
             href = links[i].href;
             diff -ru /cygdrive/z/cygwin/home/bclary/cvs-projects/ spider/chrome/content/spider/spider.js
             --- /cygdrive/z/cygwin/home/bclary/cvs-projects/	2006-04-21 02:16:59.000000000 -0400
             +++ spider/chrome/content/spider/spider.js	2006-05-04 02:05:25.156416000 -0400
             @@ -46,7 +46,7 @@
             -this.addEventListener('load', loadHandler, false);
             +this.addEventListener('load', loadHandler, true);
             var gOutput;
             var gSpider;
             @@ -66,13 +66,16 @@
             function init(evt, querystring)
             -  dlog('init: bubbles=' + evt.bubbles + ', currentTarget=' + evt.currentTarget + ', eventPhase=' + evt.eventPhase + ', target=' + + ', originalTarget=' + ', type=' + evt.type);
             -  // Thunderbird fires init when phase is BUBBLING and AT_TARGET instead of 
             -  // just AT_TARGET ?
             +  dlog('init: bubbles=' + evt.bubbles + 
             +       ', currentTarget=' + evt.currentTarget + 
             +       ', eventPhase=' + evt.eventPhase + 
             +       ', target=' + + 
             +       ', originalTarget=' + 
             +       ', type=' + evt.type);
             if (Event.AT_TARGET != evt.eventPhase)
             +    // work around
             @@ -712,8 +715,14 @@
             this.content = document.getElementById('contentSpider');
             this.onload = 
             function(evt) {
             +      dlog('CPageLoader.onload: phase ' + evt.eventPhase + ', target ' + + ', originalTarget ' + evt.originalTarget);
             // prevent duplicate load events in XUL
             +      if ( != '#document')
             +      {
             +        dlog('CPageLoader.onload: ignore non document targets');
             +        return;
             +      }
             if (gPageLoader.onload)
             @@ -728,6 +737,11 @@
             dlog('CPageLoader_loadPage: url: ' + url + ', referer: ' + referer);
             +  if (!referer)
             +  {
             +    referer = '';
             +  }
             var nodeName = this.content.nodeName.toLowerCase();
             this.content.addEventListener('load', this.onload, true);
             diff -ru /cygdrive/z/cygwin/home/bclary/cvs-projects/ spider/chrome/content/spider/version.js
             --- /cygdrive/z/cygwin/home/bclary/cvs-projects/	2006-04-20 18:39:24.000000000 -0400
             +++ spider/chrome/content/spider/version.js	2006-05-03 19:19:07.793489600 -0400
             @@ -5,8 +5,8 @@
             div with id="version" to output the version string.  
             -var _version = '';
             -var _date = 'April 20, 2006'
             +var _version = '';
             +var _date = 'May 3, 2006'
             function sayVersion()
             diff -ru /cygdrive/z/cygwin/home/bclary/cvs-projects/ spider/install.rdf
             --- /cygdrive/z/cygwin/home/bclary/cvs-projects/	2006-04-20 18:39:37.000000000 -0400
             +++ spider/install.rdf	2006-05-03 19:19:24.227120000 -0400
             @@ -6,7 +6,7 @@
             -		<em:version></em:version>
             +		<em:version></em:version>
 	     <em:description>A Mozilla-based Web Spider</em:description>
 	     <em:creator>Bob Clary</em:creator>
April 21, 2006

Spider has been updated to version to improve behavior when run in Mozilla Thunderbird by preventing the init onload handler from firing except when the load event is AT_TARGET.

April 20, 2006

Spider has been updated to include Thunderbird as a supported application.

April 15, 2006

Spider has been updated to run any version of Firefox from 1.0.x through 3.0 and to use updateURL. The gap in version numbers is due to several internal only releases of Spider while this site was resurrected.

December 14, 2005

Spider contains an improved version of goQuitApplication which properly shuts down the browser when running on Mac OS X.

September 5, 2005

Spider now uses setTimeout to call goQuitApplication when autoquit is specified. This prevents goQuitApplication from terminating the spider before the necessary onStop handlers are called.

August 29, 2005

Spider contains the following changes:

  • Turned on xpcnativewrappers=yes

    XPCNativeWrappers allow safe access to content from chrome. xpcnativewrappers=yes has been added to the chrome registry for Spider to improve security. User hook functions which access content must use the appropriate techniques.

  • Improved error handling for loading User hook functions

August 3, 2005

Spider contains the following changes:

  • The Console Listener has been updated to attempt to prevent XPConnect errors during shutdown. In addition, spider.js now unregisters the console listener before attempting to perform automatic shutdowns.

Spider contains the following changes:

  • The url form variable is now considered to be double encoded when starting Spider from the command line. The Generate Spider URL button also double encodes the url form variable. The allows the url parameter to contain a URL with a querystring.

  • The Error Output checkboxes are now read whenever a new pages is loaded allowing you to turn error reporting on and off as the Spider runs.

  • Firefox's nsScriptError.cpp hardcoded “JavaScript Error” and “JavaScript Warning” strings are now removed from messages and the error's type and flags are used to determine the error message.

  • Broken links were fixed in the help documentation and external links were changed to open new windows.

July 20, 2005

Removed xpcnativewrapper="yes" from chrome manfifest until safe vs. unsafe content access from chrome is resolved.

July 16, 2005

Spider is now a Firefox extension which can be installed and uninstalled using the Extension Manager. It can still be installed in Mozilla Application Suite as well. To note this change, the version has been bumped to 0.0.1.x. Future plans include adding update capability as well as integration into Firefox, Mozilla Application Suite and SeaMonkey.

Uninstalling Spider  Firefox only: You must uninstall any version prior to before installing version or later.

The easy way.  Uninstall Firefox, then delete spider.jar from the chrome directory in your Firefox installation directory.

The hard way. 

  • Exit Firefox.

  • Find your installation directory of Firefox.

  • Delete spider.jar, chrome.rdf from your chrome directory.

  • Edit installed-chrome.txt to remove references to spider.

The directory layout of has changed from:



July 12, 2005
  • The gConsoleListener has been modified so that when a gConsoleListener.onConsoleMessage handler has been defined by a userhook function, the onConsoleMessage function is responsible for outputing the message.

  • Meta tags which indicate authoring tools are now processed and reported by the combined script with no simulated mouseover events example user hook script. This example illustrates how problematic authoring tools can be discovered during scanning of web sites.

July 8, 2005

Release Spider

  1. The Save button has been replaced by a Generate Spider URL button which will open a new window containing a URL which can be used to load Spider with the specified parameter values. Unlike in earlier versions, the URL can be easily copied and pasted into your favorite editor.

?, 2005

Released Spider

  • Added .xls to the list of blocked file extensions.

June 28, 2005

Released Spider

  • Changed CSpider to load pages in a FIFO manner rather than a LIFO manner. This will allow Spider to load pages in sorted order rather than reverse sorted order.

June 19, 2005

Released Spider

  • Reverted CSpider.onLoadPage to not bail out when it is called while CSpider is in a paused state. This is necessary for the moment to allow Spider to drive other web applications via user hooks.

  • Introduced a new entry point open.xul which works around the unresizable window created in recent trunk Firefox builds when Spider is run as a chrome application.

  • Introduced the use of xul:browser for page loading in the XUL based version of Spider. This allows Spider to send proper HTTP referers when loading pages from a site.

  • Updated Spider help.

June 17, 2005

Spider - I broke CSpider.onLoadPage in This should fix it. Sorry. :-(

Spider - stop checking for hook signals if Spider has stopped, make paused a valid entry state for CSpider.onLoadPage

May 18, 2005

Spider - Added the ability to automatically quit the application when it has finished its run. This allows the use of the spider in command line test environments which can kick off the spider, run tests, then exit.

May 16, 2005

Spider - Minor updates to fix potential JavaScript Errors when attempting to force page layout.

May 11, 2005

Replaced CFormData with FormPersist.js. Now correctly encodes and decodes saved urls for use in chrome contexts. Updated links to DevEdge articles to point to new mirror on

August 27, 2004

Added Script language version detection to combinedhooks.js to help search for JavaScript Version Incompatibilities.

August 3, 2004

CSpider.js - removed .com as a bad extension since it blocked paths of the form

Updated spider-help.js and user.js with information on useful preferences.

spider.js - tweaked network error reporting.

August 1, 2004

Added Network Error messages if XUL Error pages are enabled.

Modified user hook examples to improve naming conventions and to remove race conditions in mouse event hooks.

July 29, 2004

Improvement to eventhooks.js and combinedhooks.js to account for some untrappable exceptions which could cause some events to be overcounted thus pausing the spider. These are external scripts and not part of the Spider code.

July 28, 2004 - Version

Fixed bug where CSpider constructor always set mRespectRobotRules to true.

Moved bad extension checking and robot exclusion testing to loadPage to reduce the apparent hang caused by synchronous XMLHttpRequests to retrieve robots.txt for pages which contain links to many different domains.

July 26, 2004 - Version

Added option to respect robots.txt rules.

Added ability to turn on internal debugging messages.

Moved dlog debugging logger to utils.js, enabled use of cdump if available in order to send debugging messages to console as well as STDOUT.

Fixed problem in tests/eventhooks.js and tests/combinedhooks.js which caused pendingEvents to not be cleared if dispatchEvent caused and exception.

July 21, 2004 - Version

Improved locations of message locations to better signal event handler invocation.

Added ability to capture console messages in a user hook function.

Changed dlog debugging logger in CSpider.js to allow gDebug to be changed outside of CSpider.js.

July 19, 2004 - Version

Introduced "Wait for User Hook" and global variable gPageCompleted to allow the user hook function userOnAfterPage() to control page transitions.

Fix race condition between gSpider.mOnPause and user hooks contains alerts.

July 16, 2004 - Version

Changed logging to use new function cdump() which sends messages to the JavaScript Console as well as STDOUT.

home | up | top about: