| WWW-Scripter documentation | view source | Contained in the WWW-Scripter distribution. |
%WindowInterface HASHWWW::Scripter - For scripting web sites that have scripts
0.022 (alpha)
use WWW::Scripter;
$w = new WWW::Scripter;
$w->use_plugin('Ajax'); # packaged separately
$w->get('http://some.site.com/that/relies/on/ajax');
$w->eval(' alert("Hello from JavaScript") ');
$w->document->getElementsByTagName('div')->[0]->....
$w->content; # returns the HTML content, possibly modified
# by scripts
This is a subclass of WWW::Mechanize that uses the W3C DOM and provides support for scripting.
No actual scripting engines are provided with WWW::Scripter, but are available as separate plugins. (See also the SEE ALSO section below.)
There are two basic modes in which you can use WWW::Scripter:
If you only need a single virtual window (which is usually the case), use WWW::Scripter itself, as described below and in WWW::Mechanize.
For multiple windows,
start with a window group (see
WWW::Scripter::WindowGroup) and fetch the WWW::Scripter object via its
active_window method before proceeding.
At any time you can attach an
existing window (WWW::Scripter object) to a window group using the latter's
attach method. You can also ->close a window to detach it from
its window group and put it back in single-window mode.
These two modes affect the behaviour of a few methods (open, close,
blur, focus) and hyperlinks and forms
with
explicit targets.
See WWW::Mechanize for a vast list of methods that this module inherits.
In addition to those, this module implements the well-known Window interface, providing also a few routines for attaching scripting engines and what-not.
In the descriptions below, $w refers to the WWW::Scripter object. You
can think of it as short for either 'WWW::Scripter' or 'window'.
my $w = new WWW::Scripter %args
The constructor accepts named arguments. There are only two that WWW::Scripter itself deals with directly. The rest are passed on to the superclass. See WWW::Mechanize and LWP::UserAgent for details on what other arguments the constructor accepts.
The two arguments are:
The maximum number of document objects to keep in history (along with their
corresponding request and response objects). If
this is omitted, Mech's stack_depth + 1 will be used. This is off by one
because stack_depth is the number of pages you can go back to, so it is
one less than the number of recorded pages. max_docs considers 0 to be
equivalent to infinity.
If the number of items in history exceeds max_docs, WWW::Scripter will still keep the request objects (so
you can go back more than max_docs times and previously visited pages
will reload). max_history restricts the total number of items in history
(whether full document objects or just requests). 0 is equivalent to
infinity.
In addition to the methods listed here, see also HTML::DOM::View and HTML::DOM::EventTarget.
Returns the location object (see WWW::Scripter::Location).
If you pass an argument, it sets the href
attribute of the location object.
Each of these calls the function assigned by one of the set_* methods
below under Window-Related Methods.
Returns the navigator object. See WWW::Scripter::Navigator.
Returns the screen object. It currently has no features.
This schedules the code to run after $ms milliseconds have elapsed,
returning a
number uniquely identifying the time-out. If the first argument is a
coderef or an object with &{} overloading, it will be called as such.
Otherwise, it is parsed as a string of JavaScript code. (If the JavaScript
plugin is not loaded, it will be ignored.)
This method is just like setTimeout, except that, when the code runs,
it schedules it to run again after $ms milliseconds.
The cancels the time-out corresponding to the $timeout_id. This only
works for those registered with setTimeout.
The cancels the timer corresponding to the $timer_id. This only
works for those registered with setInterval.
If $target is not specified or if there is no window or frame named
$target, this methods opens the $url in a new
window in multiple-window mode, or at the top-level window in single-window
mode.
If there is a window or frame named $target, then the $url is opened
in that window. If $replace is true, it replaces the current page.
A relative $url is resolved according to the base URL of the current
window (the one that open is called on), not the $target.
The $features argument is ignored.
In multiple-window mode, this detaches this window from its window group.
In single-window mode (when there is no window group) it goes back to the
previous entry in history (so that it is the opposite of open).
In multiple-window mode, this brings this window to the front. In single-window mode (when there is no window group) it does nothing.
In multiple-window mode, this sends this window back one, if it is the frontmost window. In single-window mode (when there is no window group) it does nothing.
Returns the history object. See WWW::Scripter::History.
These two return the window object itself.
Although the W3C DOM specifies that this return $w (the window itself),
for efficiency's sake this returns a separate object which one can use as
a hash or array reference to access its sub-frames. (The window object
itself cannot be used that
way.) The frames object (class WWW::Scripter::Frames) also has a window
method that returns $w.
In list context a list of frames is returned.
Returns the number of frames. $w->length is equivalent to
scalar @{$w->frames}.
Returns the 'top' window, which is the window itself if there are no frames.
Returns the parent frame, if there is one, or the window object itself otherwise.
This returns the window's name, if applicable. For a frame, this comes
from the frame element to which the window belongs. For a top-level window
created by open, this is the name that was passed as the second
argument.
These exist in case scripts try to call them. They don't do anything.
These are simple accessors. They don't do much apart from storing the assigned value as a string. The value assigned is associated with the current page.
These methods are not part of the Window interface, but are closely related to the object's window behaviour.
Use these to set the functions called by the above methods. There are no
default confirm and prompt functions. The default alert prints to
the currently selected file handle, with a line break tacked on the end.
This evaluates the code associated with each timeout registered with
the setTimeout method,
if the appropriate interval has elapsed.
This returns the number of timers currently registered.
This method waits for any registered timers to finish (calling
check_timers repeatedly in a loop). Its %args are as follows:
max_wait Number indicating for how many seconds the loop
should run before giving up and returning.
min_timers Only run until this many timers are left, not until
they have all finished.
interval Number of seconds to wait before each iteration of
the loop. The default is .1.
Some websites have timers running constantly, that are never cleared. For
these, you will usually need to set a value for min_timers (or
max_wait) to avoid an infinite loop.
This returns the window group that owns this window. See SINGLE VS MULTIPLE WINDOWS, above.
You can also pass an argument to set it, but you should only do so if you
know what you are doing, as it does not update the window group's list.
Consider using WWW::Scripter::WindowGroup's
attach method (which itself uses this method).
This finds the WWW::Scripter object (window or frame) in which a link will be opened.
If $name is not an empty string, it returns the window corresponding to
$name.
If $name is the empty string or undefined, it returns the default target
for this window,
based on the first <base target> element.
If a named window cannot be found: in multiple-window mode, a new window is
opened and returned; in single-window mode, undef is returned.
Evaluates the $code passed to it. This method dies if there is no script
handler registered for the $scripting_language.
This will automatically require() the plugin for you, and then
initialise it. To pass extra
options to the plugin after loading it, just use the same syntax again.
This will return the plugin object if the plugin has one.
This will return the plugin object, if it has one. Some plugins may provide this as a way to communicate directly with the plugin.
You can also use the return value as a boolean, to see whether a plugin is loaded.
This returns a boolean indicating whether HTML pages are parsed and turned into a DOM tree. It is true by default. You can disable HTML parsing by passing a false value. Of course, if you are using WWW::Scripter to begin with, you won't want to turn this off will you? Nevertheless, this is useful for fetching files behind the scenes when just the file contents are needed.
This returns a boolean indicating whether scripts are enabled. It is true by default. You can disable scripts by passing a false value. When you disable scripts, event handlers are also disabled, as is the registration of event handlers by HTML event attributes.
A script handler is a special object that knows how to run scripts in a particular language. Use this method to register such an object.
$language_re is a regular expression that will be matched against a
scripting language name (from a 'language' HTML attribute) or MIME type
(<script type=...). You can also use the special value 'default'.
$object is the script handler object. For its interface,
see
SCRIPT HANDLERS, below.
With this you can provide information for binding Perl classes to scripting languages, so that scripts can handle objects of those classes.
You should pass a hash ref that has the
structure described in HTML::DOM::Interface, except that this method
also accepts a _constructor hash element, which should be set to the
name of the method to be called when the constructor function is called
from the scripting language (e.g., _constructor => 'new') or a
subroutine reference.
The return value is a list of all hashrefs passed to class_info so far
plus a few that WWW::Scripter has by default (to support the DOM).
You can call it without any arguments just to get that list.
The equivalent of hitting the 'forward' button in a browser. This, of
course, only works after back.
This clears the history, preventing back from working until after the
next request, and freeing up some memory. If supplied with a true
argument, it also clears the current page. It returns $w.
These two return what was passed to the constructor, optionally setting it.
=back
To trigger events (and event handlers), use the trigger_event method of
the object on which you want to trigger it. For instance:
$w->trigger_event('resize'); # runs onresize handlers
$w->document->links->[0]->trigger_event('mouseover');
$w->current_form->trigger_event('submit'); # same as $w->submit
trigger_event accepts more arguments. See HTML::DOM and
HTML::DOM::EventTarget for details.
WWW::Scripter does not implement any event loop, so you have to call
check_timers or wait_for_timers yourself to trigger any timeouts. If
you set up a loop like this,
sleep 1, $w->check_timers while $w->count_timers;
or if you use wait_for_timers, beware that these may cause an infinite
loop if a timeout sets another timeout, or if a timer is registered with
setInterval. You basically have to know what works with the
pages you are browsing.
%WindowInterface HASHThe hash named %WWW::Scripter::WindowInterface lists the
interface members for the window object. It follows the same format as
hashes within %HTML::DOM::Interface, like this:
(
alert => VOID|METHOD,
confirm => BOOL|METHOD,
...
)
It only includes those methods listed above under The Window Interface.
This section is only of interest to those implementing scripting engines. If you are not writing one, skip this section (or just read it anyway).
A script handler object must provide the following methods:
(where $w is the WWW::Scripter object)
This is supposed to run the $code passed to it. It must set $@ to a
true value if there is an error.
This is called for each HTML event attribute (onclick, etc.). It should
return a coderef that runs the $code.
If it could not create a code ref, it should return undef and put the
error message, if any, in $@.
Plugins are usually under the WWW::Scripter::Plugin:: namespace. If a
plugin name has a hyphen (-) in it, the module name will contain a double
colon (::). If, when you pass a plugin name to use_plugin or plugin,
it has a double colon in its name, it will be treated as a fully-qualified
module name (possibly) outside the usual plugin namespace. Here are
some examples:
Plugin Name Module Name
----------- -----------
Chef WWW::Scripter::Plugin::Chef
Man-Page WWW::Scripter::Plugin::Man::Page
My::Odd::Plugin My::Odd::Plugin
This module will need to have an init method, and possibly two more
named options and clone, respectively:
init will be called as a class method the first time use_plugin
is called for a particular plugin. The second argument ($_[1]) will be
the WWW::Scripter object. The third argument will be an array ref
of options (see options, below).
It may return an object if the plugin has one.
When $w->use_plugin is called, if there are any arguments after
the plugin name, then the plugin object's options method will be called
with the options themselves as the arguments.
If a plugin does not provide an object, an error will be thrown if options
are passed to use_plugin.
The init method can override this, however. When it is called, its
third argument is a reference to an array containing the options passed
to use_plugin. The contents of that same array will be used when
options is called,
so init can modify it and even prevent options from being called
altogether, by emptying the array.
When the WWW::Scripter object is cloned (via the clone method), every
plugin that has a clone method (as determined by
->can('clone')), will also be cloned. The new clone of the
WWW::Scripter object is passed as
its argument.
If the plugin needs to record data pertinent to the current page, it can do so by associating them with the document or the request via a field hash. See Hash::Util::FieldHash and Hash::Util::FieldHash::Compat.
See LWP's Handlers feature.
From within LWP's request_* and response_* handlers, you can call
WWW::Scripter::abort to abort the request
and prevent a new entry from being created in browser history. (The
JavaScript plugin does this with javascript: URLs.)
WWW::Scripter will export this function upon request:
use WWW::Scripter qw[ abort ];
or you can call it with a fully qualified name:
WWW::Scripter::abort();
This is still an unfinished work. There are probably scores of bugs crawling all over the place. Here are some that are known (apart from the fact that so many features are still missing):
To report a bug, please send an e-mail to bug-WWW-Scripter@rt.cpan.org or use the web interface at http://rt.cpan.org/.
perl 5.8.3 or higher (5.8.4 or higher recommended)
HTML::DOM 0.045 or higher
LWP 5.77 or higher
WWW::Mechanize 1.2 or higher
Tie::RefHash::Weak 0.08 or higher for perl 5.8.x.
Copyright (C) 2009-10, Father Chrysostomos (sprout at, um, cpan dot org)
This program is free software; you may redistribute or modify it (or both) under the same terms as perl.
Some of the code in here was stolen from the immediate superclass, WWW::Mechanize, as were some of the tests and test data.
WWW::Scripter sub-modules: ::Location (WWW::Scripter::Location), ::History (WWW::Scripter::History) and ::Navigator (WWW::Scripter::Navigator).
See WWW::Mechanize, of which this is a subclass.
See also the following plugins:
And, if you are curious, have a look at the plugin version of WWW::Mechanize and WWW::Mechanize::Plugin::DOM (experimental and now deprecated) that this was originally based on: http://www-mechanize.googlecode.com/svn/wm/branches/plugins/
| WWW-Scripter documentation | view source | Contained in the WWW-Scripter distribution. |