NAME

DBD::PgLite - PostgreSQL emulation mode for SQLite

SUMMARY

      use DBI;
      my $dbh = DBI->connect('dbi:PgLite:dbname=file');
      # The following PostgreSQL-flavoured SQL is invalid 
      # in SQLite directly, but works using PgLite
      my $sql = q[
        SELECT
          news_id, title, cat_id, cat_name, sc_id sc_name,
          to_char(news_created,'FMDD.FMMM.YYYY') AS ndate
        FROM
          news
          NATURAL JOIN x_news_cat
          NATURAL JOIN cat
          NATURAL JOIN subcat
        WHERE
          news_active = TRUE
          AND news_created > NOW() - INTERVAL '7 days'
      ];
      my $res = $dbh->selectall_arrayref($sql,{Columns=>{}});
      # From v. 0.05 with full sequence function support
      my $get_nid = "SELECT NEXTVAL('news_news_id_seq')";
      my $news_id = $dbh->selectrow_array($get_nid);

DESCRIPTION

The module automatically and transparently transforms a broad range of SQL statements typical of PostgreSQL into a form suitable for use in SQLite. This involves both (a) parsing and filtering of the SQL; and (b) the addition of several PostgreSQL-compatible functions to SQLite.

Mainly because of datatype issues, support for many PostgreSQL features simply cannot be provided without elaborate planning and detailed metadata. Since this module is intended to be usable with any SQLite3 database, it follows that the emulation is limited in several respects. An overview of what works and what doesn't is given in the following section on PostgreSQL Compatibility.

DBD::PgLite has support of a sort for stored procedures. This is described in the Extras section below. So are the few database functions defined by this module which are not in PostgreSQL. Finally, the Extras section contains a brief mention of the DBD::PgLite::MirrorPgToSQLite companion module.

If you do not want SQL filtering to be turned on by default for the entire session, you can connect setting the connection attribute FilterSQL to a false value:

      my $dbh = DBI->connect("dbi:PgLite:dbname=$fn",
                             undef, undef, {FilterSQL=>0});

To turn filtering off (or on) for a single statement, you can specify FilterSQL option as a statement attribute, e.g.:

      $dbh->do($sql, {FilterSQL=>0}, @bind);
      my $sth = $dbh->prepare($sql, {FilterSQL=>0});
      $res = $dbh->selectall_arrayref($sql, {FilterSQL=>0}, @bind);

It is possible to specify user-defined pre- and postfiltering routines, both globally (by specifying them as attributes of the database handle) and locally (by specifying them as statement attributes):

      $dbh = DBI->connect("dbi:PgLite:$file",undef,undef,
                          {prefilter=>\&prefilter});
      $res = $dbh->selectall_arrayref($sql,
                                      {postfilter=>\&postfilter},
                                      @bind_values);

The pre-/postfiltering subroutine receives the SQL as parameter and is expected to return the changed SQL.

STATUS OF THE MODULE

This module was initially developed using SQLite 3.0 and PostgreSQL 7.3, but it should be fully compatible with newer versions of both SQLite (3.1 and 3.2 have been tested) and PostgreSQL (8.1 has been tested).

Support for SELECT statements and the WHERE-conditions of DELETE and UPDATE statements is rather good, though still incomplete. The module especially focuses on NATURAL JOIN differences and commonly used, built-in PostgreSQL functions.

Support for inserted/updated values in INSERT and UPDATE statements could use some improvement but is useable for simple things.

There is no support for differences in DDL.

The SQL transformations used are not based on a formal grammar but on applying simple regular expressions. An obvious consequence of this is that they may depend excessively on the author's SQL style. YMMV. (I would however like you to contact me if you come across some SQL statements which you feel should work but that don't).

The development of this module has been driven by personal needs, and so is likely to be even more one-sided than the above description suggests.

POSTGRESQL COMPATIBILITY

In this section, the PostgreSQL functions and operators supported by the module are enumerated.

Regex operators

Math Functions

String Functions
The only string functions which are present natively in SQLite are substr(), lower() and upper(). These have been left alone. Added functions are the following:

Except for convert(), where another input encoding can be specified explicitly, these functions all assume that the strings are in an 8-bit character set, preferably iso-8859-1.

The little-used idiom "substring(string from pattern for escape)" (where 'pattern' is not a POSIX regular expression but a SQL pattern) is not supported. Otherwise support for string functions is pretty complete.

Data Type Formatting Functions
The implementation of these functions is impeded by the sparse type system employed by SQLite. Workarounds are possible, however, so this area will probably be better covered in future.

Date/Time Functions
Again, SQLite's intrinsically bad support for dates and intervals makes this area somewhat hard to cover properly. Function support is as follows; also note the caveats below:

Versions of SQLite 3.1 and later support some of these functions, e.g. current_date. In these versions the built-in will be overridden.

The module makes no distinction between time/timestamp with and without time zone. It is assumed that times and timestamps are either all GMT or all localtime; time zone information is silently discarded. This may change later.

Support for calculations with dates and intervals is still very limited. Basically, what is supported are expressions of the form "expr +/- interval 'descr'" where expr reduces to a timestamp or date value.

If a transaction is started with begin_work(), the time as represented by now() and friends is "frozen" in the same way as in PostgreSQL until commit() or rollback() are called. A transaction started by simply running the SQL statement "BEGIN" does not, however, trigger this behaviour. Nor is the time automatically "unfrozen" when an error occurs during a transaction; you need to catch exceptions and call rollback() manually.

Sequence Manipulation Functions

Aggregate Functions

A Note on Casting
Casting using the construct "::datatype" is not supported in general. However, "::int", "::date" and "::bool" should work as expected. All other casts are silently discarded.

A Note on Booleans
This module assumes that booleans will be stored as numeric values in the SQLite database. SQLite interprets 0 as false and any non-zero numeric value as true. Accordingly, expressions such as "= TRUE" and "= 't'" are simply removed in SELECT and DELETE statements. Likewise, "expr

In INSERT and DELETE statements, TRUE and FALSE (as well as 't'::bool and 'f'::bool - but not 't' and 'f' by themselves) are turned into 1 and 0.

Current_user etc.
The functions current_user(), session_user() and user() - with or without parentheses - all mean the same thing. They return the username of the effective uid.

Other Functions
The main groups of other functions (not supported by this module at all)

are

EXTRAS
Stored Procedures
If the active database file contains a table called pglite_functions, the module assumes that it will have the following structure:

      CREATE TABLE pglite_functions (
        name   TEXT,   -- name  of the function
        argnum INT,    -- number of arguments (-1 means any number)
        type   TEXT,   -- can be 'sql' or 'perl'
        sql    TEXT,   -- the body of the function
        PRIMARY KEY (name, argnum)
      );

In the case of a SQL-type function, it can contain syntax supported through the module (and not directly by SQLite). The numeric arguments ($1-$9) customary in PostgreSQL are supported, so that in many cases simple functions will be directly transferrable from pg_proc in a PostgreSQL database.

An instance of a SQL snippet which would work as a function body both in PostgreSQL and PgLite (e.g. with the function name 'full_price_descr'):

      SELECT TRIM(group_name||': '||price_description) 
        FROM price_group NATURAL JOIN price 
        WHERE price_id = $1

As for perl-type functions, the function body is simply the text of a subroutine. Here is a simple example of a function body for the function 'commify', which takes two arguments: the number to be formatted and the desired number of decimal places:

      sub { 
        my ($num,$dp) = @_;
        my $format = "%.${dp}f";
        $num = scalar reverse(sprintf $format, $num);
        my $rest = $1 if $num =~ s/^(\d+)\.//;
        $num =~ s/(...)/$1,/g;
        $num = "$rest.$num" if $rest;
        return scalar reverse($num);
      }

Non-Pg Functions
matches(), imatches():

        These functions are used behind the scenes to implement support for
        the '~' regex-matching operator and its variants. They take two
        arguments, a string and a regular expression. matches() is case
        sensitive, imatches() isn't.

matches_safe(), imatches_safe():

        These work in the same way as matches() and imatches() except that
        metacharacters are escaped in the regex argument. They are therefore
        in many cases more suitable for user input and other untrusted
        sources.

lower_latin1():

        Depending on platform, lower() and upper() may not transform the
        case of non-ascii characters despite a proper locale being defined
        in the environment. This functions assumes that a Latin-1 locale is
        active and returns a lower-case version of the input given this
        assumption.

localeorder():

        DBD::SQLite does not provide access to defining SQLite collation
        functions. This is a workaround for a specific case where this
        limitation can be an issue. Given a Latin-1 encoded string, it
        returns a string of hex digits which can be ascii-sorted in the
        ordinary way. The resulting row order will be in accordance with the
        currently active locele - but only if the locale is Latin-1 based.
        The sort is case-insensitive.

locale():

        An information function simply returning the name of the current
        locale. The module sets the locale based on the environment
        variables $ENV{LC_COLLATE}, $ENV{LC_ALL}, $ENV{LANG}, and
        $ENV{LC_CTYPE}, in that order. Currently it is not possible to use
        different locales for character type and collation, as far as the
        module is concerned.

DBD::PgLite::MirrorPgToSQLite
The companion module, DBD::PgLite::MirrorPgToSQLite, may be of use in conjunction with this module. It can be used for easily mirroring specific tables from a PostgreSQL database, moving views and (some) functions as well if desired.

CAVEATS

Some functions defined by the module are not suitable for use with UTF-8 data and/or in an UTF-8 locale. (This, however, would be rather easy to change if you're willing to sacrifice proper support for 8-bit locales such as iso-8859-1).

Please do not make the mistake of using this module for an important production system - too much can go wrong. But as a development tool it can be useful, and as a toy it can be fun...

TODO

There is a lot left undone. The next step is probably to handle non-SELECT statements better.

SEE ALSO

DBI, DBD::SQLite, DBD::Pg, DBD::PgLite::MirrorPgToSQLite;

THANKS TO

Johan Vromans, for encouraging me to improve the sequence support.

AUTHOR

Baldur Kristinsson (bk@mbl.is), 2006.

     Copyright (c) 2006 Baldur Kristinsson. All rights reserved.
     This program is free software; you can redistribute it and/or
     modify it under the same terms as Perl itself.

NAME

DBD::PgLite::MirrorPgToSQLite - Mirror tables from PostgreSQL to SQLite

SUMMARY

     use DBD::PgLite::MirrorPgToSQLite qw(pg_to_sqlite);
     pg_to_sqlite(
         sqlite_file => '/var/pg_mirror/news.sqlite',
         pg_dbh      => $dbh,
         schema      => 'news',
         tables      => [ qw(news cat img /^x_news/)],
         views       => [ 'v_newslist' ],
         indexes     => 1,
         verbose     => 1,
         snapshot    => 1,
     );

USAGE

The purpose of this module is to facilitate mirroring of tables from a PostgreSQL dataabse to a SQLite file. The module has only be tested with PostgreSQL 7.3 and SQLite 3.0-3.2. SQLite 2.x will probably not work; as for PostgreSQL, any version after 7.2 is supposed to work. If it doesn't, please let me know.

As seen above, options to the pg_to_sqlite() function (which is exported on request) are passed in as a hash. These options are described below. The default values can be changed by overriding the DBD::PgLite::MirrorPgToSQLite::defaults() subroutine.

Required options
Obviously, the mirroring function needs either a PosgtgreSQL database connection or enough information to be able to connect to the database by itself. It also needs the name of a target SQLite file, and a list of tables to copy between the two databases.

pg_dbh, pg_user, pg_pass, pg_dsn

        If a database handle is specified in pg_dbh, it takes precedence.
        Otherwise we try to connect using pg_dsn, pg_user, and pg_pass
        (which are assigned defaults based on the environment variables
        PGDATABASE, PGUSER and PGPASSWORD, if any of these is present).

tables

        The value of the required tables option should be an arrayref of
        strings or a string containing a comma-separated list of tablenames
        and tablename patterns. A tablename pattern is a string or distinct
        string portion delimited by forward slashes. To clarify: Suppose
        that a database contains the tables news, img, img_group, cat,
        users, comments, news_read_log, x_news_cat, x_news_img, and
        x_img_group; and that we want to mirror news, img, cat, x_news_img
        and x_news_cat, leaving the other tables alone. To achieve this, you
        would set the tables option to any of the following (there are of
        course also other possibilities):

         (1) [qw(news img cat x_news_img x_news_cat)]
         (2) 'news, img, cat, x_news_img, x_news_cat'
         (3) [qw(news /img$/ /cat$/)]
         (4) 'news,/img$/,/cat/'

        The purpose of this seemingly unneccesary flexibility in how the
        table list is specified is to make the functionality of the module
        more easily accessible from the command line.

        Please note that the patterns between the slash delimiters are not
        Perl regular expressions but rather POSIX regular expressions, used
        to query the PostgreSQL system tables directly.

sqlite_file

        This should specify the full path to a SQLite file. While the
        mirroring takes place, the incoming data is not written directly to
        this file, but to a file with the same name except for a '.tmp'
        extension. When the operation has finished, the previous file with
        the name specified (if any) is renamed with a '.bak' extension, and
        the .tmp file is renamed to the requested filename. Unless you use
        the append option, the information previously in the file will be
        totally replaced.

Other options
schema

        This signifies the schema from which the tables on the PostgreSQL
        side are to be fetched. Default: 'public'. Only one schema can be
        specified at a time.

where

        A WHERE-condition appended to the SELECT-statement used to get data
        from the PostgreSQL tables.

views

        A list of views, specified in the same manner as the list of tables
        for the tables option. An attempt is made to define corresponding
        views on the SQLite side (though this functionality is far from
        reliable).

indexes

        A boolean option indicating whether to create indexes for the same
        columns in SQLite as in PostgreSQL. Default: false. (Normally only
        the primary key is created).

functions

        A boolean indicating whether to attempt to create functions on the
        SQLite side corresponding to any SQL language (NOT PL/pgSQL or other
        procedural language) functions in the PostgreSQL database. This is
        for use with DBD::PgLite only, since these functions are put into
        the pglite_functions table. Default: false.

page_limit

        Normally the information from the PostgreSQL tables is read into
        memory in one go and transferred directly to the SQLite file. This
        is, however, obviously not desireable for very large tables. If the
        PostgreSQL system tables report that the page count for the table is
        above the limit specified by page_limit, the table is instead
        transferred row-by-row. Default value: 5000; since each page
        normally is 8K, this represents about 40 MB on disk and perhaps
        70-100 MB of memory usage by the Perl process. For page_limit to
        work, the table must have a primary key.

        NB! Do not set this limit lower than necessary: it is orders of
        magnitude slower than the default "slurp into memory" mode.

append

        If this boolean option is true, then instead of creating a new
        SQLite file, the current contents of the sqlite_file are added to.
        If a table which is being mirrored existed previously in the file,
        it is dropped and recreated, but any tables not being copied from
        PostgreSQL in the current run are left alone. (This is primarily
        useful for mirroring some tables in toto, and others only in part,
        into the same file). Default: false. Incompatible with the
        snapshot option.

snapshot

        If this is true, then the copying from PostgreSQL takes place in
        serialized mode (transaction isolation level serializable), which
        should ensure consistency of relations between tables linked by
        foreign key constraints. Currently, foreign keys are not created on
        the SQLite side, however. Default: false. Incompatible with the
        append option.

cachedir

        The current method for getting information about table structure in
        PostgreSQL is somewhat slow, especially for databases with very many
        tables. To offset this, table definitions are cached in a temporary
        directory so that subsequent mirrorings of the same table will go
        faster. The downside is, of course, that if the table structure
        changes, the cache needs to be cleared manually. The cache directory
        can be specified using this option; the default is
        /tmp/sqlite_mirror_cache (with separate subdirectories for each
        user).

verbose

        If this is true, a few messages will be output to stderr during the
        mirroring process.

TODO

AUTHOR

Baldur Kristinsson (bk@mbl.is), 2004-2006.

     Copyright (c) 2006 Baldur Kristinsson. All rights reserved.
     This program is free software; you can redistribute it and/or
     modify it under the same terms as Perl itself.