NAME

URI::Find - Find URIs in arbitrary text

SYNOPSIS

require URI::Find;

my $finder = URI::Find->new(\&callback);

$how_many_found = $finder->find(\$text);

DESCRIPTION

This module does one thing: Finds URIs and URLs in plain text. It finds them quickly and it finds them all (or what URI::URL considers a URI to be.) It only finds URIs which include a scheme (http:// or the like), for something a bit less strict have a look at URI::Find::Schemeless.

For a command-line interface, see Darren Chamberlain's "urifind" script. It's available from his CPAN directory, <http://www.cpan.org/authors/id/D/DA/DARREN/>.

EXAMPLES

Store a list of all URIs (normalized) in the document.

      my @uris;
      my $finder = URI::Find->new(sub {
          my($uri) = shift;
          push @uris, $uri;
      });
      $finder->find(\$text);

Print the original URI text found and the normalized representation.

      my $finder = URI::Find->new(sub {
          my($uri, $orig_uri) = @;
          print "The text '$origuri' represents '$uri'\n";
          return $orig_uri;
      });
      $finder->find(\$text);

Check each URI in document to see if it exists.

use LWP::Simple;

      my $finder = URI::Find->new(sub {
          my($uri, $orig_uri) = @;
          if( head $uri ) {
              print "$origuri is okay\n";
          }
          else {
              print "$orig_uri cannot be found\n";
          }
          return $orig_uri;
      });
      $finder->find(\$text);

Turn plain text into HTML, with each URI found wrapped in an HTML anchor.

      use CGI qw(escapeHTML);
      use URI::Find;

      my $finder = URI::Find->new(sub {
          my($uri, $orig_uri) = @;
          return qq|<a href="$uri">$origuri</a>|;
      });
      $finder->find(\$text, \&escapeHTML);
      print "<pre>$text</pre>";

AUTHOR

Michael G Schwern <schwern@pobox.com> with insight from Uri Gutman, Greg Bacon, Jeff Pinyan, Roderick Schertler and others.

Roderick Schertler <roderick@argon.org> maintained versions 0.11 to 0.16.

LICENSE

Copyright 2000, 2009 by Michael G Schwern <schwern@pobox.com>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perlfoundation.org/artistic_license_1_0

SEE ALSO

URI::Find::Schemeless, URI::URL, URI, RFC 3986 Appendix C