HoneyClient::Agent::Driver::Browser - Perl extension to drive a


HoneyClient-Agent documentation  | view source Contained in the HoneyClient-Agent distribution.

Index


NAME

Top

HoneyClient::Agent::Driver::Browser - Perl extension to drive a web browser, running inside a HoneyClient VM.

VERSION

Top

This documentation refers to HoneyClient::Agent::Driver::Browser version 0.98.

SYNOPSIS

Top

  use HoneyClient::Agent::Driver::Browser;

  # Library used exclusively for debugging complex objects.
  use Data::Dumper;

  # Create a new Browser object, initialized with a collection
  # of URLs to visit.
  my $browser = HoneyClient::Agent::Driver::Browser->new(
      links_to_visit => {
          'http://www.google.com'  => 1,
          'http://www.cnn.com'     => 1,
      },
  );

  # If you want to see what type of "state information" is physically
  # inside $browser, try this command at any time.
  print Dumper($browser);

  # Continue to "drive" the driver, until it is finished.
  while (!$browser->isFinished()) {

      # Before we drive the application to a new set of resources,
      # find out where we will be going within the application, first.
      print "About to contact the following resources:\n";
      print Dumper($browser->next());

      # Now, drive browser for one iteration.
      $browser->drive();

      # Get the driver's progress.
      print "Status:\n";
      print Dumper($browser->status());

  }

  # At this stage, the driver has exhausted its collection of links
  # to visit.  Let's say we want to add the URL "http://www.mitre.org"
  # to the driver's list.
  $browser->{links_to_visit}->{'http://www.mitre.org'} = 1;

  # Now, drive the browser for one iteration.
  $browser->drive();

  # Or, we can specify the URL as an argument.
  $browser->drive(url => "http://www.mitre.org");

DESCRIPTION

Top

This library allows the Agent module to drive an instance of any browser, running inside the HoneyClient VM. The purpose of this module is to programmatically navigate the browser to different websites, in order to become purposefully infected with new malware.

This module is object-oriented in design, retaining all state information within itself for easy access. A specific browser implementation, such as 'IE' or 'FF', must inherit from this package.

Fundamentally, the Browser driver is initialized with a set of absolute URLs for the browser to drive to. Upon visiting each URL, the driver collects any new links found and will attempt to drive the browser to each valid URL upon subsequent iterations of work.

For each top-level URL given, the driver will attempt to process all corresponding links that are hosted on the same server, in order to simulate a complete 'spider' of each server.

URLs are added and removed from hashtables, as keys. For each URL, a calculated "priority" (a positive integer) of the URL is assigned the value. When the Browser is ready to go to a new link, it will always go to the next link that has the highest priority. If two URLs have the same priority, then the Browser will chose among those two at random.

Furthermore, the Browser driver will try to visit all links shared by a common server in order before moving on to drive to other, external links in an ordered fashion. However, the Browser may end up re-visiting old sites, if new links were found that the Browser has not visited yet.

As the Browser driver navigates the browser to each link, it maintains a set of hashtables that record when valid links were visited (see links_visited); when invalid links were found (see links_ignored); and when the browser attempted to visit a link but the operation timed out (see links_timed_out). By maintaining this internal history, the driver will never navigate the browser to the same link twice.

Lastly, it is highly recommended that for each driver $object, one should call $object->isFinished() prior to making a subsequent call to $object->drive(), in order to verify that the driver has not exhausted its set of links to visit. Otherwise, if $object->drive() is called with an empty set of links to visit, the corresponding operation will croak.

DEFAULT PARAMETER LIST

Top

When a Browser $object is instantiated using the new() function, the following parameters are supplied default values. Each value can be overridden by specifying the new (key => value) pair into the new() function, as arguments.

Furthermore, as each parameter is initialized, each can be individually retrieved and set at any time, using the following syntax:

  my $value = $object->{key}; # Gets key's value.
  $object->{key} = $value;    # Sets key's value.

process_name

A string containing the process name of the browser application, as it appears in the Task Manager.

positive_words

An array of positive words, where a link's probability of being visited (its score) will increase, if the link contains any of these words.

negative_words

An array of negative words, where a link's probability of being visited (its score) will decrease, if the link contains any of these words.

parse_active_content

If set to 1, then the code will attempt to parse and extract links within active content (e.g., Flash animations). Otherwise, the code will ignore all active content.

METHODS IMPLEMENTED

Top

The following functions have been implemented by the Browser driver. Many of these methods were implementations of the parent Driver interface.

As such, the following code descriptions pertain to this particular Driver implementation. For further information about the generic Driver interface, see the HoneyClient::Agent::Driver documentation.

HoneyClient::Agent::Driver::Browser->new($param => $value, ...)

Creates a new Browser driver object, which contains a hashtable containing any of the supplied "param => value" arguments.

Inputs:$param is an optional parameter variable.$value is $param's corresponding value.

Note: If any $param(s) are supplied, then an equal number of corresponding $value(s) must also be specified.

Output: The instantiated Browser driver $object, fully initialized.

$object->drive(url => $url)

Drives an instance of the browser for one iteration, navigating to the next URL and updating the driver's corresponding internal hashtables accordingly.

For a description of which hashtable is consulted upon each iteration of drive(), see the next_link_to_visit documentation, in the "DEFAULT PARAMETER LIST" section.

Once a drive() iteration has completed, the corresponding browser process is terminated. Thus, each call to drive() invokes a new instance of the browser.

Inputs:$url is an optional argument, specifying the next immediate URL the browser must drive to.

Output: The updated Browser driver $object, containing state information from driving the browser for one iteration.

Warning: This method will croak if the Browser driver object is unable to navigate to a new link, because its list of links to visit is empty and no new URL was supplied.

$object->next()

Returns the next set of server hostnames and/or IP addresses that the browser will contact, upon the next subsequent call to the $object's drive() method.

Specifically, the returned data is a reference to a hashtable, containing detailed information about which resources, hostnames, IPs, protocols, and ports that the browser will contact upon the next drive() iteration.

Here is an example of such returned data:

  $hashref = {

      # The set of servers that the driver will contact upon
      # the next drive() operation.
      targets => {
          # The application will contact 'site.com' using
          # TCP ports 80 and 81.
          'site.com' => {
              'tcp' => [ 80, 81 ],
          },

          # The application will contact '192.168.1.1' using
          # UDP ports 53 and 123.
          '192.168.1.1' => {
              'udp' => [ 53, 123 ],
          },

          # Or, more generically:
          'hostname_or_IP' => {
              'protocol_type' => [ portnumbers_as_list ],
          },
      },

      # The set of resources that the driver will operate upon
      # the next drive() operation.
      resources => {
          'http://www.mitre.org/' => 1,
      },
  };

Note: For this implementation of the Driver interface, unless getNextLink() returns undef, the returned hashtable from this method will always contain only one hostname or IP address. Within this single entry, the protocol type is always guaranteed to be TCP, specifying a single port.

Output: The aforementioned $hashref containing the next set of resources that the back-end application will attempt to contact upon the next drive() iteration. Returns undef values for both 'targets' and 'resources' keys, if getNextLink() also returns undef.

# XXX: Resolve this, per parent Driver description.

$object->isFinished()

Indicates if the Browser driver $object has driven the browser process to all possible links it has found within its hashtables and is unable to navigate the browser further without additional, external input.

Output: True if the Browser driver $object is finished, false otherwise.

Note: Additional links can be fed to this Browser driver at any time, by simply adding new hashtable entries to the links_to_visit hashtable within the $object.

For example, if you wanted to add the URL "http://www.mitre.org" to the Browser driver $object, simply use the following code:

  $object->{links_to_visit}->{'http://www.mitre.org'} = 1;

$object->status()

Returns the current status of the Browser driver $object, as it's state exists, between subsequent calls to $object->driver().

Specifically, the data returned is a reference to a hashtable, containing specific statistical information about the status of the Browser driver's progress, between iterations of driving the browser process.

The following is an example hashtable, containing all the (key => value) pairs that would exist in the output.

  $hashref = {
      'relative_links_remaining' =>       10, # Number of URLs left to
                                              # process, at a given site.
      'links_remaining'          =>       56, # Number of URLs left to
                                              # process, for all sites.
      'links_processed'          =>       44, # Number of URLs processed.
      'links_total'              =>      100, # Total number of URLs given.
      'percent_complete'         => '44.00%', # Percent complete,
                                              #  (processed / total).
  };

Output: A corresponding $hashref, containing statistical information about the Browser driver's progress, as previously mentioned.

# XXX: Resolve this, per parent Driver description.

BUGS & ASSUMPTIONS

Top

In a nutshell, this object is nothing more than a blessed anonymous reference to a hashtable, where (key => value) pairs are defined in the DEFAULT PARAMETER LIST, as well as fed via the new() function during object initialization. As such, this package does not perform any rigorous data validation prior to accepting any new or overriding (key => value) pairs.

However, additional links can be fed to any Browser driver at any time, by simply adding new hashtable entries to the links_to_visit hashtable within the $object.

For example, if you wanted to add the URL "http://www.mitre.org" to the Browser driver $object, simply use the following code:

  $object->{links_to_visit}->{'http://www.mitre.org'} = 1;

In general, the Browser driver does not know how many links it will ultimately end up browsing to, until it conducts an exhaustive spider of all initial URLs supplied. As such, expect the output of $object->status() to change significantly, upon each $object->drive() iteration.

For example, if at one given point, the status of percent_complete is 30% and then this value drops to 15% upon another iteration, then this means that the total number of links to drive to has greatly increased.

Lastly, we assume the driven browser has been preconfigured to not cache any data. This ensures the browser will render the most recent version of the content hosted at each URL.

SEE ALSO

Top

HoneyClient::Agent::Driver

HoneyClient::Agent::Driver::Browser::IE

HoneyClient::Agent::Driver::Browser::FF

http://www.honeyclient.org/trac

REPORTING BUGS

Top

http://www.honeyclient.org/trac/newticket

AUTHORS

Top

Kathy Wang, <knwang@mitre.org>

Thanh Truong, <ttruong@mitre.org>

Darien Kindlund, <kindlund@mitre.org>

Brad Stephenson, <stephenson@mitre.org>

COPYRIGHT & LICENSE

Top


HoneyClient-Agent documentation  | view source Contained in the HoneyClient-Agent distribution.