NAME
PApp - creating applications for the WorldWideWeb [Vortrag]
PApp - what is it?
PApp (which is simply short for "Perl APPlication") is a basically a collection of perl modules directed at seasoned perl programmers that allows one to create large applications for stateful protocols (like http or wap). It is possible to implement a simple-but-complete content management system in a few hundred lines of papp-code.
About this document/presentation
The presentation I will hold at the linuxworldexpo will explain a lot of technical details not included in this document. The slides for the linuxworldexpo presentation will be available at <http://www.goof.com/pcg/marc/docs.html>, shortly after the linuxworldexpo.
What was the motivation for creating PApp?
The original motivation for writing PApp came when our company (nethype GmbH) started to implement a highly interactive website that required at least limited forms of content management.
Creating large or even medium-sized applications for the Web using CGI is a tedious task. Things like preserving state (a.k.a. session-tracking and/or user-tracking) is a major headache, both for the programmer and the security advisor.
PApp solves these and a lot of other problems we didn't originally envision by providing a generic and easy API.
Features of PApp / Advantages over other solutions
While PApp could be mistaken to be yet another CGI-wrapper, this is not the case. PApp concentrates on providing an application-centric view, rather than a web-page-centric one. This means that there can be one web page per file (as with CGI), but it is also possible to put multiple pages into a single file (papp pages, called modules, are often so short that it makes a lot of sense to group related pages together), or to distribute them between many files. Of course, the good-old include mechanism is still there (although not named "include").
State management / Persistance
At your option (you want this!), PApp can manage persistent variables
for a single session or user. This means that variables almost
automatically stay persistent for the whole session. This marking
mechanism is very generic: State variables (called state keys in PApp
parlance) can be marked as session-dependent (the default),
user-dependent (persistent over session borders, also called preference
items), local to a page or a group of pages or any mixture thereof.
Interesting planned extensions include things like transactions and
transaction-dependent state keys.
An example for a user-dependent state key is the language the user selected last. A typical session-dependent variable is the flag wether the user has authorized herself. A local variable could be data from a multi-page transaction.
Security
A common bug in cgi scripts is passing of sensitive data in so-called
hidden fields of web forms, hidden from the casual user but of course
open to attacks. With PApp this is very unnatural: The state data never
leaves the server, but instead an encrypted (128 bit twofish code)
cookie is used (usually encoded in the URL, not to be mistaken with
the cookies netscape implements). Compromising the server key gives
access to other sessions (similar to a broken caching proxy), but still
makes it impossible to change the data.
In general, the design of PApp makes security the first priority, not only by careful design of the network/server protocol but also by providing easy and standardized methods for common tasks.
Session/User-tracking
Session and user-tracking are done automatically by PApp. The
application can react to session starts if necesary (e.g. by redirecting
the user to a start page when the data on the accessed page is no longer
available or initializing state keys on session start). This is also
possible when new users start a new session, of course. A session is
defined by PApp as a tree with the page that started a session (i.e. one
without or with an invalid session cookie)
User tracking beyond sessions is currently done using the http-cookie mechanism. Care has been taken to do this sensibly, however: The session data from cookies is ignored and a user without (or with disabled) cookies will not be flooded with cookie request more than once or twice a day. This is another example of how PApp can easily adapt to users.
User administration
PApp manages users per-server, not per-application. Applications can use
an access right system similar to the unix user/group mechanism to
aministrate its users, but can also implement their own system. PApp
identifies every user using a unique user id with optional attributes
like name/password/group and preferences.
I18n
I18n is short for Internationalization, which, in the context of PApp
means multiple language support. PApp does this using language
tagging, string scanning and a generic translation editor. The I18n
model of PApp is more general than the widely-known gettext model
implimented by GNU and sun, among others. The target language is chosen
using the users preferred language and protocol-specific data (e.g. the
"Accept-Language"-http-header).
Every source file can use a different language (if necessary; the language format allows finer-grained distribution of languages but this is not yet implemented). PApp can scan for strings in papp-sources, text/html/xml files and even database fields (e.g. you can declare a single database table row as english, to be translated, or as mixed language, to be scanned for language tags). Every application specifies the destination languages it wants to support. A translation editor (an example application "delivered" with PApp) can be used by translators to translate as-of-yet-untranslated messages, updates can be done on the fly.
I18n is as easy as writing "__"Translate this"" (the tagging syntax is a reminiscent of the widely used "_("message")"-syntax of C) in your documents or program.
Unicode / Multi-charset ability
Internally, PApp supports only two datatypes: binary and text.
Binary is usually used for images or similar data, while text should be
used for html (or xml or wml...) pages. Relatively recent fixes to the
unicode standard mostly pacified the objections raised by a lot of
cultural groups against this standard, so PApp encodes everything using
unicode.
The internal representation is independent of the output encoding or even the output character set. You can opt to output "iso-8859-1" text or, if the user wants something else, another charset + encoding like "iso-2022-jp". This selection can be based on program choice (i.e. hardwired) or can be based on the users preferences or protocol data (e.g. the "Accept-Charset"-http-header), similar to the selection of the target language.
Speed
Speed was a major concern when designing the API. Although implemented
to a large part in perl, PApp aims at providing a similar speed than
bare in-server scripts (like "Apache::Registry"-based scripts). At least
for non-trivial ("real-world") pages, PApp pages should be very similar
in performance, but of course cannot beat "Apache::Registry". However,
features like I18n come basically at no cost, both with respect to
programming time and to the runtime, often leading to correctly tagged
applications even when multi-language was not originally a target of an
application ("because it's so easy to do").
Scalability
PApp applications easily scale to many servers if a single server cannot
sustain the load. The only limitation is that there must be a single
database-server managing the state keys, which is usually a very small
amount of processing power a PApp application consumes.
XML
PApp support for XML is two-fold: First, PApp uses XML for its own
papp-source-format specifying basic layout of an application. The
decision for using XML was not easy, as XML can be regarded as a format
designed for machines (but still decipherable and writable by humans, if
necessary), while humans are generally better suited with SGML.
However, XML is used not only for internal source files, but is fully supported as a text format. together with its evil brother PXML, which is basically xml (text) with embedded perl code (or vice versa), it allows PApp to apply stylesheets (XSLT) to webpages either at compliation time or at runtime. PApp applications can be written fully in XML, and only the output stylesheets decides wether to actually output HTML, WML (for mobile phones) or XML (for XML-capable-browsers, if sensible).
As an added gimmick, PApp can dynamically fetch XML or PXML data (or code!) from other sources at runtime. Content management systems usually want to store pages (and/or layout) inside a database. An example on how to implement this even fits as an example into the manpage for the "PApp::XML"-module.
Protocol- & Layout-independence
Together with stylesheets, PApp applications can be written with only
minimal dependencies on layout or target protocol. It is possible to
write applications that only provide "modules", and only the final
stylesheet decides how it is rendered. Since PApp supports this
implicitly this enables layout- and protocol-independent applications.
Together with the I18n model, this enables the almost complete seperation of translation/programming/design and environment. I18n comes at almost no cost, while layout seperation of course requires though on the programmer's side, while protocol independence requires translation stylesheets to translate internal xml representations into the target "language".
Platform-independence
Although the main target of PApp is Apache/mod_perl, an interface based
on CGI (or similar mechanisms) is possible (yet slower). PApp
applications are indifferent to the environment (i.e. it is easy to
write an application that runs both under CGI and inside apache).
Database support
Serious web applications without database support are, of course,
impossible. Therefore PApp provides a lot of conviniences for
applications: Each application (and sub-application) can define a
default database connection which is persistent, cached, and checked
(like all database connections in PApp). SQL support is compatible to
the underlying DBI interface, but PApp programmers rarely have to resort
to that API. PApp automatically caches prepared SQL statements (allowing
the query optimiser to work once). Since a code-excerpt is better than a
thousand marketing words, here is an actual example:
<:
my $st = sql_exec \my($id, name),
"select id, name from user where name like ?",
$S{name};
:>
<table><tr><th>ID</th><th>Name</th></tr>
<:
while ($st->fetch) {
?><tr><td>$id</td><td>$name</td></tr><:
}
:>
</table>
This displays all id/name-pairs in a given table using a nicely-formatted html-table.
PApp currently uses MySQL for internal state management (MySQL is very fast for the task at hand). Applications are free to use any database they want, of course.
Persistent helper objects
As mentioned earlier, database connections are persistent. PApp provides
a number of "unusual" persistent objects, for example, it is possible to
tell PApp that a given callback needs to be called after a specific URL
has been clicked, independent of the target page (i.e. the target page
has to know nothing about who called it). Another helper object is the
persistent SQL row object, which maps SQL rows into perl hashes. The
following code excerpt (using the editform library which is part of
PApp, and fully language-tagged) displays the id and name fields of a
table in a freely-editable (HTML-) form. Updates to the database are
done automatically, thus, the example is complete:
<:
my $row = new PApp::DataRef 'DB_row',
table => "user",
where => [id => $userid];
# pre-set name
$row->{name} ||= "<username>";
ef_begin;
:><br>__"ID:" <?ef_string \$row->{id} , 5:><:
:><br>__"Name:" <?ef_string \$row->{name}, 20:><:
ef_submit __"Update";
ef_end;
:>
Debugability
Since PApp is written by its main users (or, conversely, the main users
of PApp also develop it), debugging support is relatively strong. PApp
usually is able to deliver a complete backtrace of the program,
including "interesting objects", when a fatal error occurs and features
a powerful exception mechanism to gather information from all stages
(low- to high-level) of a request. It is possible to store the backtrace
and error information into a database, providing the user with a nice
error message while mailing the administrator about the incident. The
information saved by PApp allows the programmer to precisely reproduce
the error situation (as far as possible), but usually the URL suffices
the uniquely identify the precise state of the session.
Schemes where the user could add comments to such a "coredump" or even browse the data structures interactively (given correct access rights) should be possible, but not implemented so far (this would be implemented using a specialized PApp application supposedly called the "error browser").
"Web-Widgets"
The newest addition to PApp (and therefore not final in its
implementation) are reusable components also dubbed Web-Widgets,
similar to reusable GUI-objects named "widgets". An example for such a
component would be a standard "forum" widget. In one of our projects we
use the same forum widget to provide "web chat", "small ads" and the
"news!" page, all with the same code but using different stylesheets to
customize the layout.
A PApp-application is, in some sense, a large state-machine. this is a limitation of stateless protocols which is hard, but not impossible, to circumvent (if perl only had efficient continuations). Any application can be embedded into other applications, while retaining their own state and their own state machine and of course their own set of variables / state keys. Standard PApp applications are not much more than a single page with html header/footer, authenticitation check and an embedded "Web-widget".
Logging
Logging is a required part of any serious business. Gathering
statistical data is a basic requirenment today. PApp saves every state
key to a database for each page impression/request (usually between 300
and 900 bytes). This saved information contains everything nedeed to
recreate a given page except the actual program code. When session/state
data is being expired (necessary to put a sensible bound on the size of
the state database), PApp conviniently allows applications to gather
statistical data from individual "hits", using almost the same
environment as at runtime. Expiring and data gathering can, of course,
be done seperately if necessary.
Disadvantages
PApp is not actually a revolution, judged by its components. It does, however draw a lot of functionality and ideas into a single, well-contained package. Nevertheless, there are quite a few reasons on why NOT to use PApp, or at least not to use PApp YET.
References
Apology
I'd like to apologize for any typoes or other mistakes in this document. It was written in a single session without access to a spellchecker and so has not been debugged it yet ;)