Mozilla Firefox based Spiders
Web Spider applications have a multitude of possible applications ranging from search indexes, web site quality assurance testing to testing browser implementations. This article introduces an improved mozilla-based Web Spider application framework which can be adapted for many different uses.
- What's New
- Install Spider
- CSpider.js - A New and Improved Spider Framework
- Spider - A Mozilla Firefox Application based upon CSpider.js
- Real World Applications of Spider
- Links and Stuff
Spider eliminates the limitations of the earlier version by separating
and implementing a basic framework for implementing spider applications
in Mozilla for HTML, Remote XUL and Chrome applications.
Help page contains instructions on how to configure
your Mozilla browser to allow the Spider application the necessary permissions
to access the internal APIs necessary for it to operate. In order
to run Spider as either an HTML application or Remote XUL application,
you need to enable
signed.applets.codebase_principal_support to allow extended
dump() output to STDOUT.
Spider can be run as a Chrome Application. Spider is limited to operating in Mozilla-based browsers due to the use of internal Mozilla APIs to enable cross-domain access and the ability to capture error logging. You can review all of the source code by browsing the http://hg.mozilla.org/automation/sisyphus/file/tip/spider/ directory.
Note that when run as a Chrome application, Spider can prevent frame-busting code from replacing the top level browser window however when run as either an HTML or Remote XUL application, Spider can be stopped by frame-busting code.
As a Chrome Application
To install Spider into Mozilla as a Chrome application,
this link to spider-0.1.0.3-sm+tb+fx+an+fn.xpi, download it to your
local machine, then open the spider-0.1.0.3-sm+tb+fx+an+fn.xpi file in your copy of Mozilla. This
will install Spider as a Chrome application in Mozilla. You will need to
close and restart Mozilla before using Spider. You can invoke
Spider as a chrome application by entering
in Mozilla's URL bar or from the command line as described in the Help
Extending Spider at run-time
Spider uses XMLHttpRequest to optionally load external scripts which can provide additional features without having to modify the source code. This is done by implementing "user hook" functions in an external script. See Spider Help for more details on implementing external "user hooks".
This is all nice and stuff, but what use is it?
Web Site Quality Assurance
As I discussed in Web Site Quality Assurance Testing using Mozilla, Mozilla can be a powerful QA tool especially when combined with a spidering application such as Spider.
The test results where very interesting. :-)
Relatively few of these popular sites use Standards mode and are invoking Quirks (Navigator 4.x) compatible layout bugs instead. The proper use of DOCTYPE switching could definitely be improved on these sites.
should be a concern for any site when Internet Explorer for Windows XP SP2
becomes available in the fall. Others included blocked access to the
history object or out-right errors.
The large number of CSS Errors is an indicator that the majority of web developer's have difficulty coding valid CSS. If you see layout differences between Mozilla and MSIE, the first thing to do is to invoke Standards mode with the appropriate DOCTYPE and fix all possible CSS Errors.
To illustrate using other web services from inside of the "user hook"
functions, I created validatorhooks.js user hooks file
which invokes the W3C Markup Validation
Service for each page loaded and uses the number of
elements returned in the XML output format to determine if a page
Again the results were very interesting. Of the 70 pages (not counting the test-sites page), 63 were definitely invalid, 6 did not return a result from the service and only 1 validated!
Plugins / Rich Media
I used objecthooks.js as an example of
using user hook functions in detecting and analysing the use of
SCRIPT FOR EVENT tags
in web pages. The results
provide another picture of how Spider combined with custom "user hooks"
can be used to provide the answer to almost any question about a web site.
Simulating Mouseover and Mouseout Events
occur is useful however it is limited in terms of the code paths which are
actually executed. eventhooks.js
improves the coverage of the code being tested by sending
mouseout events to each element on
a page. This improves the overall testing of a web site by exercising any
roll-over or DHTML menu code on the page.
the new (in version 0.0.0.4) ability to "Wait for User Hooks" in order to
allow the spider to wait until all mouse events have been sent and
combinedhooks.js combines the Quirks vs. Standard mode detection, the OBJECT/EMBED detection and the eventhooks to provide a more complete analysis of a site.
Sending Log Somewhere Else
Automated Test Runner
Calling external scripts from user hook functions
Spider 0.0.1.18 introduced a function loadScript(aScriptUrl[, aScope]) which can be used by user hook functions to load additional utility scripts.
- What's New
- User Hooks