HTTP::Size - Get the byte size of an internet resource


HTTP-Size documentation  | view source Contained in the HTTP-Size distribution.

Index


NAME

Top

HTTP::Size - Get the byte size of an internet resource

SYNOPSIS

Top

	use HTTP::Size

	my $size = HTTP::Size::get_size( $url );

	if( defined $size )
		{
		print "$url size was $size";
		}
	elsif( $HTTP::Size::ERROR == $HTTP::Size::INVALID_URL )
		{
		print "$url is not a valid absolute URL";
		}
	elsif( $HTTP::Size::ERROR == $HTTP::Size::COULD_NOT_FETCH )
		{
		print "Could not fetch $url\nHTTP status is $HTTP::Size::HTTP_STATUS";
		}
	elsif( $HTTP::Size::ERROR == $HTTP::Size::BAD_CONTENT_LENGTH )
		{
		print "Could not determine content length of $url";
		}

DESCRIPTION

Top

VARIABLES

Top

The following global variables describes conditions from the last function call:

	$ERROR
	$HTTP_STATUS

The $ERROR variable may be set to any of these values:

	$INVALID_URL	    - the URL is not a valid absolute URL
	$COULD_NOT_FETCH    - the function encountered an HTTP error
	$BAD_CONTENT_LENGTH - could not determine a content type

The module does not export these variables, so you need to use the full package specification outside of the HTTP::Size package.

FUNCTIONS

Top

get_size( URL )

Fetch the specified absolute URL and return its content length. The URL can be a string or an URI object. The function tries the HEAD HTTP method first, and on failure, tries the GET method. In either case it sets $HTTP_STATUS to the HTTP response code. If the response does not contain a Content-Length header, the function takes the size of the message body. If the HEAD method returned a good status, but no Content-Length header, it retries with the GET method.

On error, the function set $ERROR to one of these values:

	$INVALID_URL	    - the URL is not a valid absolute URL
	$COULD_NOT_FETCH    - the function encountered an HTTP error
	$BAD_CONTENT_LENGTH - could not determine a content type

get_sizes( URL, BASE_URL )

The get_sizes function is like get_size, although for HTML pages it also fetches all of the images then sums the sizes of the original page and image sizes. It returns a total download size. In list context it returns the total download size and a hash reference whose keys are the URLs that a browser should download automatically (images):

	size
	ERROR
	HTTP_STATUS

The ERROR and HTTP_STATUS correspond to the values of $ERROR and $HTTP_STATUS for that URL.

	my ( $total, $hash ) = HTTP::Size::get_sizes( $url );

	foreach my $key ( keys %$hash )
		{
		print "$key had an error" unless defined $size;
		}

The hash is always returned in list context (a change from version 0.4).

Relative image links resolve accroding to BASE_URL, or by a found BASE tag. See HTML::SimpleLinkExtor.

Javascript and style sheet links are unimplemented right now.

TO DO

Top

* if i have to use GET, i should use Byte-Ranges to avoid downloading the whole thing

* add a way to specify Basic Auth credentials

* download javascript and style sheets too.

SEE ALSO

Top

HTML::SimpleLinkExtor

SOURCE AVAILABILITY

Top

This source is part of a SourceForge project which always has the latest sources in CVS, as well as all of the previous releases.

	http://sourceforge.net/projects/brian-d-foy/

If, for some reason, I disappear from the world, one of the other members of the project can shepherd this module appropriately.

AUTHOR

Top

brian d foy, <bdfoy@cpan.org>

COPYRIGHT AND LICENSE

Top


HTTP-Size documentation  | view source Contained in the HTTP-Size distribution.