Parse::MediaWikiDump::Links - Object capable of processing link dump files


Parse-MediaWikiDump documentation Contained in the Parse-MediaWikiDump distribution.

Index


Code Index:

NAME

Top

Parse::MediaWikiDump::Links - Object capable of processing link dump files

ABOUT

Top

This object is used to access content of the SQL based category dump files by providing an iterative interface for extracting the indidivual article links to the same. Objects returned are an instance of Parse::MediaWikiDump::link.

SYNOPSIS

Top

  $pmwd = Parse::MediaWikiDump->new;
  $links = $pmwd->links('pagelinks.sql');
  $links = $pmwd->links(\*FILEHANDLE);

  #print the links between articles 
  while(defined($link = $links->next)) {
    print 'from ', $link->from, ' to ', $link->namespace, ':', $link->to, "\n";
  }

STATUS

Top

This software is being RETIRED - MediaWiki::DumpFile is the official successor to Parse::MediaWikiDump and includes a compatibility library called MediaWiki::DumpFile::Compat that is 100% API compatible and is a near perfect standin for this module. It is faster in all instances where it counts and is actively maintained. Any undocumented deviation of MediaWiki::DumpFile::Compat from Parse::MediaWikiDump is considered a bug and will be fixed.

METHODS

Top

Create a new instance of a page links dump file parser

Return the next available Parse::MediaWikiDump::link object or undef if there is no more data left

EXAMPLE

Top


Parse-MediaWikiDump documentation Contained in the Parse-MediaWikiDump distribution.

package Parse::MediaWikiDump::Links;

#this needs to be fully replaced by MediaWiki::DumpFile::Compat
#because it uses a much more correct SQL parser

our $VERSION = '1.0.6';

use strict;
use warnings;

sub new {
	my ($class, $source) = @_;
	my $self = {};
	$$self{BUFFER} = [];

	bless($self, $class);

	$self->open($source);
	#fix for bug 58196 
	#$self->init;

	return $self;
}

sub next {
	my ($self) = @_;
	my $buffer = $$self{BUFFER};
	my $link;

	while(1) {
		if (defined($link = pop(@$buffer))) {
			last;
		}

		#signals end of input
		return undef unless $self->parse_more;
	}

	return Parse::MediaWikiDump::link->new($link);
}

#private functions with OO interface
sub parse_more {
	my ($self) = @_;
	my $source = $$self{SOURCE};
	my $need_data = 1;
	
	while($need_data) {
		my $line = <$source>;

		last unless defined($line);

		while($line =~ m/\((\d+),(-?\d+),'(.*?)'\)[;,]/g) {
			push(@{$$self{BUFFER}}, [$1, $2, $3]);
			$need_data = 0;
		}
	}

	#if we still need data and we are here it means we ran out of input
	if ($need_data) {
		return 0;
	}
	
	return 1;
}

sub open {
	my ($self, $source) = @_;

	if (ref($source) ne 'GLOB') {
		die "could not open $source: $!" unless
			open($$self{SOURCE}, $source);
	} else {
		$$self{SOURCE} = $source;
	}

	binmode($$self{SOURCE}, ':utf8');

	return 1;
}

sub init {
	my ($self) = @_;
	my $source = $$self{SOURCE};
	my $found = 0;
	
	while(<$source>) {
		if (m/^LOCK TABLES `pagelinks` WRITE;/) {
			$found = 1;
			last;
		}
	}

	die "not a MediaWiki link dump file" unless $found;
}

#depreciated backwards compatibility methods

#replaced by next()
sub link {
	my ($self) = @_;
	$self->next(@_);
}


1;
__END__