WWW::2ch - scraping of a popular bbs of Japan.


WWW-2ch documentation  | view source Contained in the WWW-2ch distribution.

Index


NAME

Top

WWW::2ch - scraping of a popular bbs of Japan.

SYNOPSIS

Top

  use WWW::2ch;

  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/ogame/',
                          cache => '/tmp/www2ch-cache');
  $bbs->load_setting;
  $bbs->load_subject;
  foreach my $dat ($bbs->subject->threads) {
      $dat->load;
      my $one = $dat->res(1);
      print $dat->title . "\n";
      print '>>1: ' . $one->body;
      foreach my $res ($dat->reslist) {
        print $res->resid . ':' . $res->date . "\n";
        print $res->body_text . "\n";
      }
      last;
  }




  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/test/read.cgi/ogame/1140947283/l50',
                          cache => '/tmp/www2ch-cache');
  my $dat = $bbs->subject->thread('1140947283');
  $dat->load;




  # dat in cash is taken out
  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/ogame/',
			cache => '/home/ko/cpan/my/WWW-2ch/cache');
  my $dat = $bbs->recall_dat('1141300600');




  # parse dose dat from file
  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/ogame/',
			cache => '/home/ko/cpan/my/WWW-2ch/cache');
  open my $fh, "test.dat" or return;
  my $data = join('', <$fh>);
  close($fh);
  my $dat = $bbs->parse_dat($data);

  # returns it with raw article data.
  $dat->dat;

  #plugin load
  my $bbs = WWW::2ch->new(url => 'http://example.jp/test/read.cgi/ogame/1140947283/l50',
                          cache => '/tmp/www2ch-cache',
                          plugin => 'ExampleJp');

  # plugin file load
  my $bbs = WWW::2ch->new(url => 'http://example.com/test/read.cgi/ogame/1140947283/l50',
                          cache => '/tmp/www2ch-cache',
                          plugin => '/usr/local/www-2ch/lib/ExampleCom.pm');




DESCRIPTION

Top

It is suitable for the scraping of a popular bbs of Japan.

other BBS and the news sites and other sites are also possible by the addition of the plugin for scraping.

Please take care with the flood control to an excessive access.

Method

Top

new(%option)

option

* url

set the permalink of top page.

* cache

cache directory or Cache module object

* plugin

plugin name (default Base)

encoding

encode name of plugin

load_setting

setting is read

load_subject

article list is read

parse_dat($data[, $subject])

parse does $data

recall_dat($key)

recall dat from cache file

SEE ALSO

Top

http://2ch.net/, http://www.monazilla.org/, WWW::2ch::Subject, WWW::2ch::Dat, WWW::2ch::Res

AUTHOR

Top

Kazuhiro Osawa <ko@yappo.ne.jp>

COPYRIGHT AND LICENSE

Top


WWW-2ch documentation  | view source Contained in the WWW-2ch distribution.