WebService::CIA::Parser - Parse pages from the CIA World Factbook


WebService-CIA documentation  | view source Contained in the WebService-CIA distribution.

Index


NAME

Top

WebService::CIA::Parser - Parse pages from the CIA World Factbook

SYNOPSIS

Top

  use WebService::CIA::Parser;
  my $parser = WebService::CIA::Parser->new;
  my $data = $parser->parse($string);




DESCRIPTION

Top

WebService::CIA::Parser takes a string of HTML and parses it. It will only give sensible output if the string is the HTML for a page whose URL matches https://www.cia.gov/library/publications/the-world-factbook/print/[a-z]{2}\.html

This parsing is somewhat fragile, since it assumes a certain page structure. It'll work just as long as the CIA don't choose to alter their pages.

METHODS

Top

new

Creates a new WebService::CIA::Parser object. It takes no arguments.

parse($html)

Parses a string of HTML take from the CIA World Factbook. It takes a single string as its argument and returns a hashref of fields and values.

The values are stripped of all HTML. <br> tags are replaced by newlines.

It also creates four extra fields: "URL", "URL - Print", "URL - Flag", and "URL - Map" which are the URLs of the country's Factbook page, the printable version of that page, a GIF map of the country, and a GIF flag of the country respectively.

EXAMPLE

Top

  use WebService::CIA::Parser;
  use LWP::Simple qw(get);

  $html = get(
    "https://www.cia.gov/library/publications/the-world-factbook/print/uk.html"
  );
  $parser = WebService::CIA::Parser->new;
  $data = $parser->parse($html);
  print $data->{"Population"};




AUTHOR

Top

Ian Malpass (ian-cpan@indecorous.com)

COPYRIGHT

Top

SEE ALSO

Top

WebService::CIA


WebService-CIA documentation  | view source Contained in the WebService-CIA distribution.