WWW::CheckSite::Validator - A spider that assesses 'kwalitee' for a site


WWW-CheckSite documentation  | view source Contained in the WWW-CheckSite distribution.

Index


NAME

Top

WWW::CheckSite::Validator - A spider that assesses 'kwalitee' for a site

SYNOPSIS

Top

    use WWW::CheckSite::Validator;
    my $wcv = WWW::CheckSite::Validator->new(
        uri => 'http://www.test-smoke.org'
    );

    while ( my $info = $wcv->get_page ) {
        # handle the info
    }

DESCRIPTION

Top

This is a subclass of WWW::CheckSite::Spider.

WWW::CheckSite::Validator starts its work after the spider has fetched the page. It will check these things:

All links on the page (<a href>, <area href>, <frame src>) are checked for availability.

* images

All images on the page (<img src>, <input type=image>) are checked for availability.

* stylesheets

All stylesheets on the page (<link rel=stylesheet type=text/css>) are checked for availability.

* W3 HTML validation

The contents of the page are send to http://validator.w3.org for validation.

METHODS

Top

WWW::CheckSite::Validator->new( %args )

Extend WWW::CheckSite::Spider->new to check for Image::Info so we can do a basic check on the images.

$wcs->process_page

This method overrides the WWW::CheckSite::Spider::process_page() method to check on the availability of links, images and stylesheets. When specified it will also send the page for validation by W3.ORG.

On top of the standard information it returns more:

* images a list of images on the page, with some extra info
* images_cnt the number of images on the page
* images_ok the number of images that returned STATUS==200
* styles a list of stylesheets on the page, with some extra info
* styles_cnt the number of stylesheets on the page
* styles_ok the number of stylesheets that returned STATUS==200
* valid the result of validation at W3.ORG

$wcs->check_images( $stats )

The check_images() method gets information about the images on the page. The list comes from the images() method of the mechanize object. It will only HEAD the uri.

The structure for images:

* uri as returned after the HEAD request
* tag set to 'ALT'
* text set to the text of the ALT attribute
* status the return status from the HEAD request
* ct the 'Content-Type' returned by the HEAD request

$wcs->check_styles( $stats )

The check_styles() method checks the validity of stylesheets used in the page. We check for <link rel="stylesheet" type="text/css"> tags.

The structure for stylesheets:

* uri as returned after the HEAD request
* text set to empty for compatibility with links and images
* status the return status from the HEAD request
* ct the 'Content-Type' returned by the HEAD request

$wcs->validate

The validate() method sends the url/contents off to W3.org to validate.

$wcs->validate_by_none

The fallback do-not-validate method.

$wcs->validate_by_uri

Sends only the uri to W3.ORG and get the validation result.

$wcs->validate_by_upload( $stats )

Create a temporary file (with File::Temp) from $agent->content, call the validator with that temporary file and save the result (as a boolean) in $stats->{validate}.

$wcs->validate_by_xmllint( $stats )

Use the xmllint(1) program to validate the (X)HTML.

$wcs->validate_style( $ua )

Dispatch the validation to the right method.

$wcs->style_by_none

The fallback do-not-validate-stylesheet method.

$wcs->style_by_uri( $ua )

Sends only the uri to JIGSAW.W3.ORG and get the validation result.

$wcs->style_by_upload( $ua )

Create a temporary file (with File::Temp) from $ua->content, call the validator with that temporary file and return the result.

$wcs->validate_image( $ua )

This is more like a basic consistency check, that uses Image::Info::image_info().

$wcs->ct_can_validate( $ua )

Check if the content-type is "validatable".

$wcs->set_action

Why?

SEE ALSO

Top

WWW::CheckSite::Spider, WWW::CheckSite

AUTHOR

Top

Abe Timmerman, <abeltje@cpan.org>

BUGS

Top

Please report any bugs or feature requests to bug-WWW-CheckSite@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

Top


WWW-CheckSite documentation  | view source Contained in the WWW-CheckSite distribution.