Revision history for Hailo

0.69 2011-05-07 04:02:38

0.68 2011-05-03 13:16:05

0.67 2011-04-27 23:33:08

0.66 2011-04-27 07:37:45

0.65 2011-04-26 19:28:27

0.64 2010-12-10 11:09:08

0.63 2010-12-09 09:03:30

0.62 2010-12-06 03:30:07

0.61 2010-12-03 06:47:22

0.60 2010-11-09 01:35:49

0.59 2010-10-23 21:20:22

0.58 2010-10-22 03:34:08

0.57 2010-10-21 01:25:09

0.56 2010-10-18 05:15:10

0.55 2010-10-16 17:58:00

0.54 2010-10-16 10:10:19

0.53 2010-10-15 21:29:02

0.52 2010-07-18 22:40:02

0.51 2010-07-18 15:49:41

0.50 2010-05-30 12:44:25

0.49 2010-05-29 19:20:26

0.48 2010-05-29 15:16:18

0.47 2010-05-29 13:08:51

          hailo --brain db.sqlite --tokenizer Chars --train file.trn 
          hailo --brain db.sqlite --reply foo

      I.e. Hailo will note that it used the Chars tokenizer in the
      database, and load the correct tokenizer in the future. However
      this will cause Hailo to die:

          hailo --brain db.sqlite --tokenizer Chars --train file.trn 
          hailo --brain db.sqlite --tokenizer Words --reply foo

      It spots that you've explicitly said you want a tokenizer that's
      incompatible with the one in the database for doing replies and
      dies. This is what it did before if you did the exact same thing
      with the --order switch.

0.46 2010-05-27 22:47:45

0.45 2010-05-27 19:56:31

0.44 2010-05-27 15:55:30

        This improves performance a lot on input that contains URIs,
        previously Hailo would split them up nonsensically, which
        would inflate the token table a lot with little gain.
  
      - Preserve the capitalization of words that change case in the
        middle of the word. Examples include GumbyBRAIN, WoW, HoRRiBlE
        etc. Previously these and others that weren't 100% upper-case
        would all be lower cased.
  
      - Preserve the capitalization of words that are all upper-case
        followed by a non-word character followed by lower-case. This
        preserves words like KIA'd, FYIQ'ed and other things that are
        likely to be partial acronyms.
  
      - Twitter names. I.e. tokens matching @[A-Za-z0-9_]+ will be
        tokenized as-is. This ensures that Hailo users like
        Bot::Twatterhose don't corrupt their Twitter names.
  
      - Eliminate some redundant use of the regex engine in the Words
        tokenizer.

0.43 2010-05-11 19:54:36

0.42 2010-05-10 21:26:45

0.41 2010-04-23 00:24:24

0.40 2010-04-13 15:10:23

0.39 2010-04-09 13:21:22

0.38 2010-04-03 18:15:17

0.37 2010-03-31 14:28:46

0.36 2010-03-29 00:15:35

0.35 2010-03-27 21:27:33

0.34 2010-03-20 23:26:27

0.33 2010-03-20 01:57:33

                         s/iter System Hailo    lib Hailo
            System Hailo   74.8           --          -7%
            lib Hailo      69.4           8%           --
    
      Furthermore replace the use of ->fetchall_hashref in a tight
      loop with ->fetchall_arrayref. This sped up mass replies by
      almost 60% (added to the 8% above):

                         s/iter System Hailo    lib Hailo
            System Hailo   68.2           --         -36%
            lib Hailo      43.6          57%           --
    
      But aside from selective benchmarking this made Hailo around 5%
      faster in the common case:
        
                         s/iter System Hailo    lib Hailo
            System Hailo   21.5           --          -6%
            lib Hailo      20.3           6%           --

0.32 2010-03-19 12:00:22

0.31 2010-03-18 21:45:25

                      s/iter   0.30 Hailo    0.31 Hailo
        0.30 Hailo      20.2           --          -16%
        0.31 Hailo      16.9          19%            --

0.30 2010-03-15 15:18:01

0.29 2010-03-13 10:32:43

0.28 2010-03-13 10:05:57

0.27 2010-03-13 09:41:46

0.26 2010-03-13 08:04:32

0.25 2010-03-12 17:45:42

0.24 2010-03-12 01:38:56

0.23 2010-03-11 20:08:27

0.22 2010-03-10 08:46:54

0.21 2010-03-09 18:25:46

0.20 Sun Feb 28 00:29:32 GMT 2010

0.19 Sat Feb 27 04:23:03 GMT 2010

0.18 Fri Feb 26 05:02:17 GMT 2010

0.17 Tue Feb 23 04:06:50 GMT 2010

0.16 Mon Feb 22 17:08:46 GMT 2010

0.15 Thu Feb 18 23:55:19 GMT 2010

0.14 Sat Feb 13 17:07:30 GMT 2010

0.13 Sat Feb 13 09:19:52 GMT 2010

0.12 Sat Feb 13 08:55:25 GMT 2010

0.11 Fri Feb 12 09:44:13 GMT 2010

0.10 Fri Feb 12 02:31:34 GMT 2010

0.09 Thu Feb 11 02:36:49 GMT 2010

0.08 Wed Feb 10 00:06:20 GMT 2010

0.07 Tue Feb 9 15:23:44 GMT 2010

0.06 Sat Jan 30 19:21:28 GMT 2010

0.05 Sat Jan 30 13:55:18 GMT 2010

0.04 Fri Jan 29 17:48:49 GMT 2010

0.03 Fri Jan 29 14:37:17 GMT 2010

0.02 Fri Jan 29 03:54:32 GMT 2010

0.01 Fri Jan 29 00:39:54 GMT 2010