| AI-Categorizer documentation | view source | Contained in the AI-Categorizer distribution. |
AI::Categorizer::Learner::KNN - K Nearest Neighbour Algorithm For AI::Categorizer
use AI::Categorizer::Learner::KNN;
# Here $k is an AI::Categorizer::KnowledgeSet object
my $nb = new AI::Categorizer::Learner::KNN(...parameters...);
$nb->train(knowledge_set => $k);
$nb->save_state('filename');
... time passes ...
$l = AI::Categorizer::Learner->restore_state('filename');
my $c = new AI::Categorizer::Collection::Files( path => ... );
while (my $document = $c->next) {
my $hypothesis = $l->categorize($document);
print "Best assigned category: ", $hypothesis->best_category, "\n";
print "All assigned categories: ", join(', ', $hypothesis->categories), "\n";
}
This is an implementation of the k-Nearest-Neighbor decision-making algorithm, applied to the task of document categorization (as defined by the AI::Categorizer module). See AI::Categorizer for a complete description of the interface.
This class inherits from the AI::Categorizer::Learner class, so all
of its methods are available unless explicitly mentioned here.
Creates a new KNN Learner and returns it. In addition to the
parameters accepted by the AI::Categorizer::Learner class, the
KNN subclass accepts the following parameters:
Sets the score threshold for category membership. The default is currently 0.1. Set the threshold lower to assign more categories per document, set it higher to assign fewer. This can be an effective way to trade of between precision and recall.
Sets the k value (as in k-Nearest-Neighbor) to the given integer.
This indicates how many of each document's nearest neighbors should be
considered when assigning categories. The default is 5.
Returns the current threshold value. With an optional numeric argument, you may set the threshold.
Trains the categorizer. This prepares it for later use in
categorizing documents. The knowledge_set parameter must provide
an object of the class AI::Categorizer::KnowledgeSet (or a subclass
thereof), populated with lots of documents and categories. See
AI::Categorizer::KnowledgeSet for the details of how to create such
an object.
Returns an AI::Categorizer::Hypothesis object representing the
categorizer's "best guess" about which categories the given document
should be assigned to. See AI::Categorizer::Hypothesis for more
details on how to use this object.
Saves the categorizer for later use. This method is inherited from
AI::Categorizer::Storable.
Originally written by David Bell (<dave@student.usyd.edu.au>),
October 2002.
Added to AI::Categorizer November 2002, modified, and maintained by
Ken Williams (<ken@mathforum.org>).
Copyright 2000-2003 Ken Williams. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
AI::Categorizer(3)
"A re-examination of text categorization methods" by Yiming Yang http://www.cs.cmu.edu/~yiming/publications.html
| AI-Categorizer documentation | view source | Contained in the AI-Categorizer distribution. |