Plucene::SearchEngine::Index::URL - File reader for web URLs


Plucene-SearchEngine documentation  | view source Contained in the Plucene-SearchEngine distribution.

Index


NAME

Top

Plucene::SearchEngine::Index::URL - File reader for web URLs

DESCRIPTION

Top

This frontend module takes a URL, downloads its content, extracts its metadata and passes the file onto a backend. The frontend registers the following Plucene fields:

mimetype

The MIME type of the data.

filename

The basename of the URL's filename.

id

The URL given.

modified

A Plucene date field representing the last modified date of the file

language

The ISO language identifier of the content

encoding

The original character set. (before conversion to UTF-8)

METHODS

    Plucene::SearchEngine::Index::URL->examine($url);

This downloads and examines a file on the filesystem for the above metadata, before handling it to a backend.


Plucene-SearchEngine documentation  | view source Contained in the Plucene-SearchEngine distribution.