combine (4.003) karmic; urgency=low
- Added extra space when concatenating extracted text in HTMLExtractor.pm
- Bugfix certain special characters in regexp when optimizing meta/title
in generated XML
- Added extra field, score, in table urldb to support new URL
scheduling algoritms
- Added support for new URL scheduling algorithms
-- Anders Ardo <anders@nike> Mon, 15 Jun 2009 10:22:38 +0200
combine (4.002) intrepid; urgency=low
- Fixed tests 05MySQL och 90Installationtest
- Added support for exceptions to GeoIP, config-file server2country
- Now handles special characters when using files for external parsers
- Disabled warnings for unknown config variables
-- Anders Ardo <anders@dbkit10.eit.lth.se> Mon, 30 Mar 2009 09:09:35 +0200
combine (4.001) intrepid; urgency=low
- New version numbering compatible with CPAN
- Integration with Solr enterprise search server (http://lucene.apache.org/solr/)
similar to Zebra: new module Solr.pm, configuration variable SolrHost,
switch SolrIndexing in combineExport
-- Anders Ardo <anders@eit.lth.se> Mon, 8 Dec 2008 14:55:21 +0100
combine (3.12) intrepid; urgency=low
- Added code for simple Lucene integration to templates directory.
Contributed by Xianghang Liu
- Changed documentation HTML-generator to ht4tex
-- Anders Ardo <anders@eit.lth.se> Sun, 16 Nov 2008 18:19:21 +0100
combine (3.11) intrepid; urgency=low
- Added switches 'collapseinlinks' and 'nooutlinks' to combineExport
- Moved tmp-files to /tmp/$$
- decoded output from extconverter to Perl internal utf8
- Added -nodrm to pdf2html switches
- Fixed bug in processing of pure text documents
- Added switch ZebraIndexing to combineExport. Enables updating of the
configured Zebra server with exported records
- Improved indexing of PDF documents
- Handled case when $md5 empty in DeleteKey
- Fixed bug in Zebra recordId handling
-- Anders Ardo <anders@eit.lth.se> Sun, 09 Nov 2008 17:18:10 +0100
combine (3.10) sarge; urgency=low
- Added a fulltext-index in MySQL table search plus configuration var
to enable/disable
- Added Zebra deleteRecord
-- Anders Ardo <anders@eit.lth.se> Sun, 12 Oct 2008 20:56:51 +0200
combine (3.9) sarge; urgency=low
- Changed default for useTidy to 0 (Not using Tidy)
- Added dependency on libclass-factory-util-perl
- Changed test Web-server to combine.it.lth.se
- Added Combine/utilPlugIn.pm Combine/classifySVM.pm - support for SVM
classifiers (depends SVMLight) plus documentation
- Changed *Check_record.pm to use XWI text extraction from
utilPlugIn.pm
- Added country determination (plus dependency on GeoIp)
- Updated extensions of binary files
- Added analysePlugin and relTextPlugin to default config
- Cleaned FromHTML.pm code
- Added call to external relTextPlugin
- Added call to Plugin for extra analysis in utilPlugin.pm
- Improved title extraction in XWI2XML
- Fixed in rules: shell IO redirection
- Added table 'localtags' - for storing local name/value pairs to be
added by analysePlugin
-- Anders Ardo <anders@eit.lth.se> Fri, 26 Sep 2008 15:22:43 +0200
combine (3.8) sarge; urgency=low
- Perl ABSTRACT now taken from bin/combine
- Disable old behaviour to save html as text if extracted text is very
short
- PosCheckRecord now only uses meta fields that maps to dc.subject and
dc.description to match against topic definition
- PosCheckRecord now use also the URLpath for topic checking
- Added new script: combineReClassify, that reclassifies all records
- Enabled gzip HTTP content compression
- Moved <META-tag parsing/extraction to FromHTML
- Disabled some of the more uncertain binary file detcetions in
FromHTML due to charset problems
- Charset detection and decoding now done in LWP
-- Anders Ardo <anders@eit.lth.se> Wed, 23 Apr 2008 11:14:31 +0200
combine (3.7) sarge; urgency=low
- Fixed bug with required but non-existant modifiedDate
- Added fix from Sean Dreilinger to import SaveConfigString in
Combine::Config
- Fixed tests: make sure modfiedtime is set, and take into account new
MD5 calculation
- Now requires v 1.06 od HTML::Tidy; newer versions handle UTF8
differently
- Applied patch from Sean Dreilinger: make Combine work with newer
Config::General (SaveConfigString)
- Added doc/Installationtest.pl as new test t/90Installationtest.t
-- Anders Ardo <anders@eit.lth.se> Thu, 27 Sep 2007 08:52:10 +0200
combine (3.6) sarge; urgency=low
- Bugfixes in DataBase.pm MD5-sum calculation
-- Anders Ardo <anders@it.lth.se> Mon, 26 Mar 2007 10:36:04 +0200
combine (3.5) sarge; urgency=low
- Added bin/combineRank - utility for calculating ranking
values like PageRank
- Changed package numbering convention
- Documentation updates
-- Anders Ardo <anders@it.lth.se> Wed, 13 Dec 2006 16:09:43 +0100
combine (3.4-5) sarge; urgency=low
- Disabled parsing of really short files
- Changed syntax of output redirection: ' >& /dev/null' to
'> /dev/null 2> /dev/null'
- Added test in 50FromHTML.t to avoid using HTML::Tidy when not
available
- Fixed bug with improper SQL esacping in MySQLhdb.pm
- Fixed bug with expire time giving error-messages '... too small'
- Added more tests
- Changed combineINIT to disable Tidy in configuration when HTML::Tidy
not available
- Made ZOOM optional
-- Anders Ardo <anders@it.lth.se> Tue, 5 Dec 2006 12:57:18 +0100
combine (3.4-4) sarge; urgency=low
- New switch 'myconf' for combineINIT to append a local configuration
file to the standard job-specific configuration
- Improved error checking for MySQL connection
- Workaround for mysterious MySQL user access bug
- Moved AutoRecycleLinks from job_default.cfg to default.cfg
-- Anders Ardo <anders@it.lth.se> Wed, 8 Nov 2006 12:33:15 +0100
combine (3.4-3) sarge; urgency=low
- Removed obsolete flag xwi->check_nofollow
- Documentation updates
- Removing obsolete files: DTDs and ValidateWP7
- Added omit-xml-declaration="yes" to combine2dc.xsl
-- Anders Ardo <anders@it.lth.se> Fri, 3 Nov 2006 10:31:14 +0100
combine (3.4-2) sarge; urgency=low
- Added external parsers for Excel, PowerPoint and RTF
- Fixed bug that prevented RobotRules cache to be used
- Moved setting of agent and from for UserAgent into
UA::TruncatingUserAgent
-- Anders Ardo <anders@it.lth.se> Fri, 20 Oct 2006 09:08:08 +0200
combine (3.4-1) sarge; urgency=low
- Added possibility to connect directly to a Zebra server
-- Anders Ardo <anders@it.lth.se> Sat, 14 Oct 2006 12:01:55 +0200
combine (3.3-2) sarge; urgency=low
- Removed the (empty) combine-doc package. Documentation is included
in the package combine.
- Restructured XML export: moved basic XML generation from
combineExport to XWI2XML.pm.
- Implemented possibility to apply XSLT script on exported records
- Added standard export XSLT scripts for alvis and dc
- Now builds and includes doc in the source dist
- Added module (Zebra.pm) and config var for direct connection the
Zebra indexer
- Fixed bug in UTF-8 handling for exported XML
-- Anders Ardo <anders@it.lth.se> Fri, 6 Oct 2006 12:02:45 +0200
combine (3.3-1) sarge; urgency=low
- Implemented plugin modules for document classifications in focused mode
- Added template for classify plugin
-- Anders Ardo <anders@it.lth.se> Sun, 1 Oct 2006 11:23:07 +0200
combine (3.2-2) sarge; urgency=low
- Fixes and updates in PODs
-- Anders Ardo <anders.ardo@it.lth.se> Thu, 27 Sep 2006 15:31:00 +0200
combine (3.2-1) sarge; urgency=low
- Fixes to make it work with MySQL 5
- Now suggests pdftohtml untex
- Disabled 'word-2000' in tidy.cfg since it conflicts with 'bare'
-- Anders Ardo <anders.ardo@it.lth.se> Tue, 22 Aug 2006 10:07:00 +0200
combine (3.1-4) sarge; urgency=low
- Added possibility to use HTTP proxy (conf variable httpProxy)
(thanks to Joost Zuurbier)
- Added conf variable UserAgentFollowRedirects controlling
how redirects are handled (thanks to Joost Zuurbier)
- Changed combineExport to use Alvis::Pipeline directly
- Moved conf-files from libcombine to combine package
- Added conffiles file to debian/
- Improved documentation
-- Anders Ardo <anders.ardo@it.lth.se> Thu, 15 Jun 2006 09:38:00 +0200
combine (3.1-3) sarge; urgency=low
- All binaries now without '-w' switch
- Perl testsuite added in directory t
- New configuration variable WaitIntervalHost
- Some bug fixes
- combineExport supports the action attribute to XML
element documentRecord
-- Anders Ardo <anders.ardo@it.lth.se> Fri, 2 Jun 2006 15:38:00 +0200
combine (3.1-2) sarge; urgency=low
- combineExport now supports enhanced ALVIS XML format
- new configuration var 'AutoRecycleLinks', default=1
enables automatic recycling of new links
- combineINIT takes new parameter --topic <file>
which enables focused mode using the given file
as topicdefinition
-- Anders Ardo <anders.ardo@it.lth.se> Fri, 24 Mar 2006 12:11:00 +0200
combine (3.1-1) sarge; urgency=low
- New SQL doc
- Bumped version to comply with conventions
-- Anders Ardo <anders.ardo@it.lth.se> Wed, 07 Mar 2006 15:37:00 +0200
combine (3.0.0-14) stable; urgency=low
-- Anders Ardo <anders.ardo@it.lth.se> Wed, 01 Mar 2006 10:37:00 +0200
combine (3.0.0-13) stable; urgency=low
-- Anders Ardo <anders.ardo@it.lth.se> Fri, 13 Jan 2006 13:37:00 +0200
combine (3.0.0-12) stable; urgency=low
- Added delete functionality to combineUtil
- Added a few extensions for binary files to config-files
-- Anders Ardo <anders.ardo@it.lth.se> Thu, 12 Jan 2006 10:37:00 +0200
combine (3.0.0-11) stable; urgency=low
- Added combineUtil for database statistic, sanity and cleaning functionality
- Changed Config to use include for additional configuration serveralias and exclude
-- Anders Ardo <anders.ardo@it.lth.se> Tue, 03 Jan 2006 09:50:00 +0200
combine (3.0.0-10) stable; urgency=low
- Added dependecy on libalvis-cannonical
- Changed to use Alvis::Canonical
-- Anders Ardo <anders.ardo@it.lth.se> Tue, 19 Dec 2005 15:24:00 +0200
combine (3.0.0-9) stable; urgency=low
- Removed erronous quoting of '"' in exported XML-records
-- Anders Ardo <anders.ardo@it.lth.se> Tue, 13 Dec 2005 13:38:00 +0200
combine (3.0.0-8) stable; urgency=low
- Added handling of Alvis Pipeline exports
- Added modifications for OAI including incremental exports
-- Anders Ardo <anders.ardo@it.lth.se> Sat, 10 Dec 2005 14:38:00 +0200
combine (3.0.0-7) stable; urgency=low
- UTF-8 fixes
- Updated Tidy.cfg
-- Anders Ardo <anders.ardo@it.lth.se> Tue, 01 Dec 2005 08:58:00 +0200
combine (3.0.0-6) stable; urgency=low
- Updates in documentation
- Added DTD files to conf
- Updated combineExport new switches - recordid, md5, validate
-- Anders Ardo <anders.ardo@it.lth.se> Tue, 08 Nov 2005 14:39:00 +0200
combine (3.0.0-5) stable; urgency=low
- Updates in documentation
- Added some Topic definition files
- combineINIT now creates empty files 'stopwords.txt'
- combineINIT now sets permission on /var/run/combine/xxx
-- Anders Ardo <anders.ardo@it.lth.se> Mon, 07 Nov 2005 16:16:00 +0200
combine (3.0.0-4) stable; urgency=low
-- Anders Ardo <anders.ardo@it.lth.se> Thu, 03 Nov 2005 13:11:00 +0200
combine (3.0.0-3) stable; urgency=low
- Bug fixes in combine, combineCtrl, SD_SQL.pm
- Added combineINIT
-- Anders Ardo <anders.ardo@it.lth.se> Wed, 02 Nov 2005 15:15:35 +0200
combine (3.0.0-2) stable; urgency=low
- Bug fixes in Check_record.pm, Config.pm, FromHTML.pm, PosCheck_record.pm,
combine, combineCtrl, combineExport
-- Anders Ardo <anders.ardo@it.lth.se> Wed, 02 Nov 2005 14:10:35 +0200
combine (3.0.0-1) stable; urgency=low
- Initial Release.
- Code taken from AlvisCombine cvs project
-- Anders Ardo <anders.ardo@it.lth.se> Mon, 25 Apr 2005 14:10:35 +0200