MKDoc::Text::Structured - Another Text to HTML module


MKDoc-Text-Structured documentation Contained in the MKDoc-Text-Structured distribution.

Index


Code Index:

NAME

Top

MKDoc::Text::Structured - Another Text to HTML module

SYNOPSIS

Top

    my $text = some_structured_text();
    my $html = MKDoc::Text::Structured::process ($text);




SUMMARY

Top

MKDoc::Text::Structured is a library which allows simple syntaxic text construct to be turned into HTML. These constructs are the ones you would be using when writing a text email or newsgroup message.

MKDoc::Text::Structured follows the KISS philosophy. Comparing with similar modules which try to implement as many HTML constructs as possible, this module is incredibly conservative.

Block level elements

Top

P

Paragraphs are defined by blocks of text separated by one or more empty lines.

The text:

  This is a paragraph,
  until it meets an empty line.

  This is another paragraph.

Would become:

  <p>This is a paragraph,
  until it meets an empty line.</p>
  <p>This is another paragraph.</p>




H1, H2, H3

Headlines are really just like a paragraph, except that they have the following syntax:

The text:

  ==========
  Headline 1
  ==========

  Headline 2
  ==========

  Headline 3
  ----------

Would become:

  <h1>Headline 1</h1>
  <h2>Headline 2</h2>
  <h3>Headline 3</h3>




The advantage in treating headlines just like paragraph is that multi-line headlines are no problem. Also, it means you can use *strong* and _emphasized_ within a headline (see STRONG and EM sections).

PRE

Pre-formatted text looks like a paragraph, except that it must be indented with at least one space character.

The text:

  This is a paragraph,
  until it meets an empty line.

    But this is pre-formatted text.
    Hey  Hey Ho  Ho!

  This is another paragraph.

Would become:

  <p>This is a paragraph,
  until it meets an empty line.</p>
  <pre>But this is pre-formatted text.
  Hey  Hey Ho  Ho!</pre>
  <p>This is another paragraph.</p>

Again, you can use *strong* and _emphasized_ within pre-formatted text (see STRONG and EM sections).

Inline Elements

Top

STRONG

The text:

  This is *strong text*

Would become:

  <p>This is <strong>strong text</strong></p>




Note 1: The star character will act as a 'strong' marker only when:

- The "opening" star is preceded by whitespace or carriage return,

- The "closing" star is followed by whitespace or carriage return, or punctuation immediately followed by whitespace or carriage return.

In other words, you can write 3*3*2 = 18 safely. The module tries to follow the DWIM ("Do What I Mean") philosophy as much as possible.

Note 2: This can only work within one block level element. It will not work across paragraphs or lists (See UL, LI and OL, LI sections).

Example 1:

  * Hello, *I will not
  * be bold*
  * but
  * *I will be*

Example 2:

  This is a paragraph. *Nothing in this paragraph
  is going to be bold.

  Nor in this one*.




EM

The text:

  This is _emphasized text_

Would become:

  <p>This is <em>emphasized text</em></p>

Same notes as for bold / strong text also applied for emphasized text.

Entity substitution

Characters that would otherwise be interpreted as XML are encoded. i.e. &, < and > become &amp; &lt; and &gt;

Additionally some standard typed versions of special characters are substituted with a richer and better-looking HTML entity:

  --   surrounded by whitespace becomes &mdash;
  -    surrounded by whitespace becomes &ndash;
  ...  becomes &hellip;
  (tm) becomes &trade;
  (r)  becomes &reg;
  (c)  becomes &copy;
  x    between numbers becomes &times;
  ''   surrounding text becomes &lsquo; &rsquo;
  ""   surrounding text becomes &ldquo; &rdquo;

Nested Structures

Top

BLOCKQUOTE

Quoted text is text that starts with a 'greater than' character and followed by a space on each line.

  > > Hey, that's pretty cool!

  > Well, sort-of

  I think it's pretty cool...

Would become:

  <blockquote><blockquote><p>Hey, that's pretty cool!</p></blockquote>
  <p>Well, sort-of</p></blockquote>
  <p>I think it's pretty cool...</p>




UL, LI

Ordered lists and unordered lists can be constructed and nested:

The text:

  * An item
  * Another item

  * Headlines work too
    ==================

    I can write *paragraphs within lists*.

      And even _pre-formatted text_!

    - Also, I can have sub-lists
    - That's no problem
    - Notice that '*' and '-' have the same meaning.
      It's just syntaxic sugar, really :-)

Would become:

  <ul><li><p>An item</p></li>
  <li><p>Another item</p></li>
  <li><h2>Headlines work too</h2>
  <p>I can write <strong>paragraphs within lists</strong>.</p>
  <pre>And even <em>pre-formatted text</em>!</pre>
  <ul><li><p>Also, I can have sub-lists</p></li>
  <li><p>That's no problem</p></li>
  <li><p>Notice that '*' and '-' have the same meaning.
  It's just syntaxic sugar, really :-)</p></li></ul></li></ul>




OL, LI

Un-ordered lists and unordered lists can be constructed and nested:

The text:

  1. An item
  2. Another item

  3. Headlines work too
     ==================

     * An un-ordered list
     * Can be nested
     * It should all work nicely.

Would become:

  <ol><li><p>An item</p></li>
  <li><p>Another item</p></li>
  <li><h2>Headlines work too</h2>
  <ul><li><p>An un-ordered list</p></li>
  <li><p>Can be nested</p></li>
  <li><p>It should all work nicely.</p></li></ul></li></ol>




Hyperlinks

Top

Smilies

Top

Basic smilies such as :-) and :-( are wrapped in a CSS class:

  <span class="smiley-happy">:-)</span>
  <span class="smiley-sad">:-(</span>

Long Words

Top

Long words are split up into fragments separated by spaces if the length exceeds a 78 character default.

Change the default length using a package variable:

  local $MKDoc::Text::Structured::Inline::LongestWord = 12;

Disable this fuctionality by setting a value of 0.

AUTHOR

Top

Copyright 2003 - MKDoc Holdings Ltd.

Author: Jean-Michel Hiver

This module is free software and is distributed under the same license as Perl itself. Use it at your own risk.

SEE ALSO

Top

  MKDoc: http://www.mkdoc.com/

Help us open-source MKDoc. Join the mkdoc-modules mailing list:

  mkdoc-modules@lists.webarch.co.uk


MKDoc-Text-Structured documentation Contained in the MKDoc-Text-Structured distribution.
package MKDoc::Text::Structured;
use MKDoc::Text::Structured::Factory;
use strict;
use warnings;

our $Text    = '';
our $VERSION = 0.83;


sub process
{
    my $text    = shift;

    # Mac + DOS carriage returns -> bang!
    $text =~ s/\r\n/\n/gs;
    $text =~ s/\r/\n/gs;

    # Trailing spaces -> fizzle!
    # (except when the line is made only of trailing spaces...)
    $text = join "\n", map {
        chomp();
        s/\s+$// unless (/^\s*$/);
        $_;
    } split /\n/, $text;

    my @lines   = split /\n/, $text;
    my @result  = ();
    my $current = undef;

    while (scalar @lines)
    {
        my $line   = shift (@lines);
        $current ||= MKDoc::Text::Structured::Factory->new ($line);
        $current || next;

        if ($current->is_ok ($line))
        {
            $current->add_line ($line);
        }
        else
        {
            push @result, $current->process();
            unshift (@lines, $line);
            $current = undef;
        }
    }

    push @result, $current->process() if ($current);
    return join "\n", @result;
}


1;


__END__