Embperl - Building dynamic Websites with Perl

Copyright (c) 1997-2002 Gerald Richter / ecos gmbh  www.ecos.de

You may distribute under the terms of either the GNU General Public 
License or the Artistic License, as specified in the Perl README file.

THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED 
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 
MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.

$Id$


### !! IMPORTANT !! IMPORTANT !! IMPORTANT !! IMPORTANT !! IMPORTANT !! 
###
###
### This is a BETA release of Embperl 2.0, before installing
### please read the README.v2. Documentation is not yet updated to
### reflect the changes in 2.0, everything that has changed is
### documented in README.v2. 
### I use Embperl 2.0b in production environment on my own.
### But be careful, this release may still contain bugs.
###
### The current stable release is Embperl 1.3.4
###
### !! IMPORTANT !! IMPORTANT !! IMPORTANT !! IMPORTANT !! IMPORTANT !! 


Hints for using Embperl 2.x
---------------------------

Embperl 2 is totaly rewritten. Most of the Perl code is moved 
into C to speed up processing. The core is totaly redesigned to
give a lot of new possibilities.

This is still beta, so it may (and will) contains bugs.
Please report any weird behaviour to the embperl mailing list, but
be sure to read this whole README to understand what doesn't work yet.

The Embperl core now works in a totaly different way. The processing
of the source towards the output is done by providers. Every provider
takes a small step. Which providers are used is defined by a recipe.
The standard Embperl recipe contains the following providers:

    1 reading the source
    2 parsing 
    3 compiling 
    4 executing
    5 outputing

The providers works in a similar way as Unix shell programm which are
processing a single source in a pipeline towards the output. In
Embperl is is not only a smimple pipeline, but a tree structure,
so multiple sources can be incorpoarted in one result.
Rearrangeing the provideres or writing and useing new ones gives
flexibility and power. Addtional to the standart Embperl providers
Embperl ships with XML parser and XSLT processor providers.

The new execution scheme is also faster, because html tags and metacommands
are parsed only once (Perl code was also (and is still) cached in 1.x)
My first benchmarks show 50%-100% faster execution under mod_perl 
compared to Embperl 1.x.

Another new feature is that the syntax of the Embperl parser is defined
within the module Embperl::Syntax and can be modified as nessecary.
Embperl comes with a set of syntax definitons which can be modified by
the user. So far there are syntax definitions for SSI, Text only, Perl only,
ASP, POD, RTF and a Mail taglib. You can tell Embperl which syntax to use either in
the configuration via EMBPERL_SYNTAX, or with the syntax parameter of
Execute, or you can change the syntax dynamically inside the page via the
[$syntax $] command. You also could specify more then one syntax at the same
time, e.g. [$syntax Embperl SSI $] to mix Embperl tags and SSI tags in the same
page.

If you'd like to create your own syntax read:

   perldoc Embperl::Syntax

and look at the files under Embperl/Syntax/ for examples on how to do it.

Also new is the ability to cache (parts of) the output. See
the new configuration directives below.

Starting with 2.0b6 Embperl provides a set of new object, which allows
to access Embperl internals and manipulate the processing. Basicly there
are three major objects:

    - Application
    - Request
    - Component

The application object is responsible for a set of pages that forms an
application. It is used to configure things like session handling and
logging which should be unique across these pages. More important
it can be overriden and the overriden object can contain the application
logic, to create a proper separation of logic and presentation.

The request object holds everything which spans a whole (HTTP-)request.

The component object is responsible for a single component, inside the
desired output. It holds things like sourcefile etc.

All three object has subobject which holds the configuration and a
subobject for it's current parameters.

See below for a sort list of accessable members.


Debugging
---------

Starting with 2.0b2 Embperl files can debugged via the interactive debugger.
The debugger shows the Embperl page source along with the correct linenumbers. 
You can do anything you can do inside a normal Perl programm via the debugger,
e.g. show variables, modify variables, single step, set breakpoints etc.

You can use the Perl interacive command line debugger via

    perl -d embpexec.pl file.epl  

or if you prefer a graphical debugger, try ddd (http://www.gnu.org/software/ddd/)
it's a great tool, also for debugging any other perl script:

    ddd --debugger 'perl -d embpexec.pl file.epl'


NOTE: embpexec.pl could be found in the Embperl source directory

If you want to debug your pages, while running under mod_perl, Apache::DB is the
right thing. Apache::DB is available from CPAN.


The following differences to Embperl 1.x apply:
------------------------------------------------------

- When running under mod_perl you _must_ load Embperl
  at server startup time. Either with a

  PerlModule Embperl

  in your httpd.conf or a

  use Embperl ;

  inside of a startup script.
  You can use the Embperl configuration directives now
  directly, (without PerlSetEnv/SetEnv). If you still
  want to use enviroment varibales to configure Embperl, write

  Embperl_UseEnv on

- For every container in your httpd.conf (e.g. VirtualHost,Directory,Location)
  where you want to define any application level configuration directives
  (see below under tAppConfig for a list), you need to set a unique
  value for EMBPERL_APPNAME. This is for example necessay for all
  Embperl::Object parameters. Example:

  <Location /eo>
  EMBPERL_APPNAME my_embperl_app
  EMBPERL_OBJECT_BASE base.epl
  </Location>
    
- The following options can currently only be set from httpd.conf:
     optKeepSpaces

- The option optRawInput is replaced by EMBPERL_INPUT_ESCMODE,
  which is off by default (same as when optRawInput was set 
  in 1.x)

- The following options are currently not supported:
     optRedirectStdout
     optDisableHtmlScan, optDisableTableScan,
     optDisableInputScan, optDisableMetaScan

  optDisableHtmlScan can be replaced by switching the syntax, e.g.

  [$syntax EmbperlBlocks $]  # same as [- $optDisableHtmlScan = 1 -]

    (here goes your code - Embperl will not interpret any html tags here)

  [$syntax Embperl $]        # same as [- $optDisableHtmlScan = 0 -]


- Nesting must be done properly. I.e. you cannot put a <table> tag (for a
  dynamic table) inside an 'if' and the </table> inside another 'if'.
  (That still works for static tables)

- optUndefToEmptyValue is always set and cannot be disabled.

- [$ foreach $x (@x) $] now requires the brackets around the
  array (like Perl)

- [+ +] blocks must now contain a valid Perl expression. Embperl 1.x
  allows you to put multiple statements into such a block. For performance
  reasons this is not possible anymore. Also the expression must _not_ be
  terminated with a semicolon. To let old code work, just wrap it into a 'do'
  e.g. [+ do { my $a = $b + 5 ; $a } +]

- EMBPERL_INPUT_FUNC and EMBPERL_OUTPUT_FUNC are not supported anymore
  You can the same result and much more by writing custom provider.

- Embperl doesn't change the current working directory anymore to the
  directory of the source file. This is done for performance reasons 
  and because it won't reliable work with threads under mod_perl 2.0.
  You can use $req -> component -> cwd to get the directotry of the
  sourcefile (where $req is Embperl request object, which is the first
  paramter passed to the page i.e. $_[0])


The following things are not fully tested/working yet:
------------------------------------------------------

- [- exit -]
  exit works not inside of [$ sub $], outside it works
  (It also can now exit the whole request, see below)  

- safe namespaces


Embperl 1.x compatibility flag
------------------------------

The compatibility flag isn't available anymore in 2.0b6. Since now
Embperl 2.0 lives in his own namespace, you can install Embperl 1.x and
2.x on the same machine without conflicts.


Addtional Config directives
---------------------------

Caching parameter
-----------------

execute parameter / httpd.conf environment variable / name inside page (must set inside [! !])


cache_key / EMBPERL_CACHE_KEY / $CACHE_KEY 

literal string that is appended to the cache key


cache_key_options / EMBPERL_CACHE_KEY_OPTIONS / $CACHE_KEY_OPTIONS

    ckoptCarryOver = 1,     use result from CacheKeyFunc of previous step if any 
    ckoptPathInfo  = 2,     include the PathInfo into CacheKey 
    ckoptQueryInfo = 4,	    include the QueryInfo into CacheKey 
    ckoptDontCachePost = 8, don't cache POST requests  (not yet implemented)

    Default: all options set


cache_key_func / EMBPERL_CACHE_KEY_FUNC / &CACHE_KEY

function that should be called when build a cache key. The result is
appended to the cache key.


expires_func / EMBPERL_EXPIRES_FUNC / &EXPIRES

function that is called every time before data is taken from the cache.
If this funtion returns true, the data from the cache isn't used anymore,
but rebuilt.


Function could be either a coderef (when passed to Execute), a name of a
subroutine or a string starting with "sub " in which case it is compiled
as anonymous subroutine.


expires_in / EMBPERL_EXPIRES_IN / $EXPIRES

Time in seconds that the output should be cached. (0 = never, -1 = forever)

expires_in / EMBPERL_EXPIRES_FILENAME / $EXPIRES_FILENAME

Expires when the given file has changed


Syntax switching
----------------

syntax / EMBPERL_SYNTAX / [$ syntax $]

Used to tell Embperl which syntax to use inside a page. Embperl comes with
the following syntaxes: 

    - EmbperlHTML       # all the HTML tags that Embperl recognizes by default
    - EmbperlBlocks     # all the [ ] blocks that Embperl supports
    - Embperl           # (default; contains EmbperlHtml and EmbperlBlocks)
    - ASP               # <%  %> and <%=  %>, see perldoc Embperl::Syntax::ASP
    - SSI               # Server Side Includes, see perldoc Embperl::Syntax::SSI
    - Perl              # File contains pure Perl (similar to Apache::Registry), but
                        #  can be used inside EmbperlObject
    - Text              # File contains only Text, no actions are taken on the Text
    - Mail              # Defines the <mail:send> tag, for sending mail. This is an
                        # example for a taglib, which could be a base for writing
                        # your own taglib to extent the number of available tags
    - POD               # translates pod files to XML, which can be converted to 
                        # the desired output format by an XSLT transformation
    - RTF               # Can be used to process word processing documents in RTF format

You can get a description for each syntax if you type

    perldoc Embperl::Syntax::xxx

where 'xxx' is the name of the syntax.

You can also specify multiple syntaxes e.g.

    EMBPERL_SYNTAX "Embperl SSI"

    Execute ({inputfile => '*', syntax => 'Embperl ASP'}) ;

The 'syntax' metacommand allows to switch the syntax or to 
add or subtract syntaxes e.g.

    [$ syntax + Mail $]

will add the Mail taglib so the <mail:send> tag is available after
this line.

    [$ syntax - Mail $]

now the <mail:send> tag is unknown again

    [$ syntax SSI $]

now you can only use SSI commands inside your page.

EMBPERL_INPUT_ESCMODE
---------------------

0   don't interpret input (default)
1   unescape html escapes to their characters (i.e. &lt; becomes < )
    inside of Perl code
2   unescape url escapes to their characters (i.e. %26; becomes & )
    inside of Perl code
3   unescape html and url escapes, depending on the context

Add 4 to remove html tags inside of Perl code. This is help full when
an html editor insert html tags like <br> inside your Perl code.

Set EMBPERL_INPUT_ESCMODE to 7 to get the old default of Embperl < 2.0b6
Set EMBPERL_INPUT_ESCMODE to 0 to get the old behaviour when optRawInput was set.
This is the current default.

Error mailing
-------------

EMBPERL_MAIL_ERRORS_TO          <email>
    email address to mail any error to

EMBPERL_MAIL_ERRORS_LIMIT       <num>
    do not mail more then <num> errors. Set to 0 for no limit.

EMBPERL_MAIL_ERRORS_RESET_TIME  <sec>
    reset error counter if for <sec> seconds no error has occured

EMBPERL_MAIL_ERRORS_RESEND_TIME <sec>
    mail errors of <sec> seconds regardless of the error counter

All error counting is done per child, so if you run a large site and
have 100 childs, you may get 100 * EMBPERL_MAIL_ERRORS_LIMIT mail
before they are limited.


Session handling
----------------

Session handling has changed from 1.3.3 to 1.3.4 and 2.0b3 to 2.0b4. You must either
install Apache::SessionX or set

    PerlSetEnv EMBPERL_SESSION_HANDLER_CLASS "Embperl::Session"

to get the old behaviour.


Overview Embperl objects and their methods
------------------------------------------


 * Application object

   thread
   curr_req
   config
   lfd
   user_session
   state_session
   app_session
   udat
   sdat
   mdat
   debug
   errors_count
   errors_last_time 
   errors_last_send_time


 * Application configuration

   app_name
   app_handler_class
   session_args
   session_classes
   session_config
   session_handler_class
   cookie_name
   cookie_domain
   cookie_path
   cookie_expires
   log
   debug
   mailhost
   mailhelo
   mailfrom
   maildebug
   mail_errors_to
   mail_errors_limit
   mail_errors_reset_time
   mail_errors_resend_time
   object_base
   object_app
   object_addpath
   object_stopdir
   object_fallback
   object_handler_class
   new


 * Request object

   apache_req
   config
   param
   component
   app
   thread
   request_count
   request_time
   iotype
   session_mgnt
   session_id
   session_state_id
   session_user_id
   exit
   log_file_start_pos
   error
   errors
   errdat1
   errdat2
   lastwarn
   cleanup_vars
   cleanup_packages
   initial_cwd
   messages
   default_messages
   startclock
   stsv_count


 * Request configuration

   allow
   urimatch
   mult_field_sep
   path
   debug
   options
   session_mode


 * Request parameter

   filename
   unparsed_uri
   uri
   path_info
   query_info
   language
   cookies


 * Component object

   config
   param
   req_running
   sub_req
   inside_sub
   exit
   path_ndx
   cwd
   ep1_compat
   phase
   sourcefile
   buf
   end_pos
   curr_pos
   sourceline
   sourceline_pos
   line_no_curr_pos
   document
   curr_node
   curr_repeat_level
   curr_checkpoint
   curr_dom_tree
   source_dom_tree
   syntax
   ifd
   ifdobj
   append_to_main_req
   prev
   strict
   import_stash
   exports
   curr_package
   eval_package
   main_sub
   prog
   prog_run
   prog_def
   code


 * Component configuration

   package
   debug
   options
   escmode
   input_escmode
   input_charset
   cache_key
   cache_key_options
   expires_func
   cache_key_func
   expires_in
   syntax
   recipe
   xsltstylesheet
   xsltproc
   compartment
   cleanup


 * Component Parameter

   inputfile
   outputfile
   input
   output
   sub
   import
   firstline
   mtime
   param
   fdat
   ffld
   object
   isa
   errors
   xsltparam 

Configuration directives summary
--------------------------------


/* tComponentConfig */

PACKAGE 
DEBUG 
OPTIONS 
ESCMODE 
INPUT_ESCMODE 
INPUT_CHARSET 
CACKE_KEY 
CACHE_KEY_OPTIONS
EXPIRES_FUNC 
CACHE_KEY_FUNC
EXPIRES_IN 
SYNTAX 
RECIPE 
XSLTSTYLESHEET 
XSLTPROC 
COMPARTMENT


/* tReqConfig */

ALLOW 
URIMATCH 
MULTFIELDSEP 
PATH
DEBUG 
OPTIONS 
SESSION_MODE 


/* tAppConfig */

APPNAME 
APP_HANDLER_CLASS
SESSION_HANDLER_CLASS
SESSION_ARGS 
SESSION_CLASSES
SESSION_CONFIG 
COOKIE_NAME 
COOKIE_DOMAIN
COOKIE_PATH 
COOKIE_EXPIRES 
LOG 
DEBUG 
MAILDEBUG 
MAILHOST 
MAILHELO 
MAILFROM 
MAIL_ERRORS_TO
MAIL_ERRORS_LIMIT
MAIL_ERRORS_RESET_TIME
MAIL_ERRORS_RESEND_TIME
OBJECT_BASE
OBJECT_APP
OBJECT_ADDPATH
OBJECT_STOPDIR
OBJECT_FALLBACK
OBJECT_HANDLER_CLASS


When running under mod_perl, you can use this directly as Apache configuration
directives. They are case insensitiv. You don't need the use environment 
variables for configuration anymore. For this to work you have to add a

  PerlModule Embperl
  AddModule embperl.c
  
before the first Embperl configuration directive. If you still like to
use enviroment variables, you must set

  Embperl_UseEnv on

For CGI mode still use enviroment variables.


exit
----

B<exit> will override the normal Perl exit in every Embperl document. Calling
exit will immediately stop any further processing of that file and send the
already-done work to the output/browser. 

B<NOTE 1:> If you are inside of an Execute, Embperl will only exit this Execute, but 
the file which called the file containing the exit with Execute will continue. If
you want to exit the whole request, call exit with an argument e.g. exit (200)
 
B<NOTE 2:> If you write a module which should work with Embperl under mod_perl, 
you must use Embperl::exit instead of the normal Perl exit. (In 1.3.x it was
Apache::Exit)


Recipes
-------

Starting with 2.0b4 Embperl introduces the concept of recipes. A recipe basically
tells Embperl how a component should be build. While before 2.0b4 you could 
have only one processor that works on the request (the Embperl processor -
you're also able to define different syntaxes), now you can have multiple of them
arranged in a pipeline or even a tree. While you are able to give the full
recipe when calling Execute, this is not very convenient, so normally you
will only give the name of a recipe, either as parameter 'recipe' to
Execute or as EMBPERL_RECIPE in your httpd.conf. Of course you can have
different recipes for different locations and/or files. A recipe is constructed
out of providers. A provider can either be read from some source or do some
processing on a source. There is no restriction on what sort of data a provider
has as in- and output - you just have to make sure that output format of
a provider matches the input format of the next provider. In the current 
implementation Embperl comes with a set of built-in providers:

- file                  read file data
- memory                get data from a scalar
- epparse               parse file into a Embperl tree structure
- epcompile             compile Embperl tree structure
- eprun                 execute Embperl tree structure
- eptostring            convert Embperl tree structure to string
- libxslt-parse-xml     parse xml source for libxslt
- libxslt-compile-xsl   parse and compile stylesheet for libxslt
- libxslt               do an xsl transformation via libxslt
- xalan-parse-xml       parse xml source for xalan
- xalan-compile-xsl     parse and compile stylesheet for xalan
- xalan                 do an xsl transformation via xalan

There is a C interface, so new custom providers can be written, but what makes it
really useful is that the next release of Embperl will contain a
Perl interface, so you can write your own providers in Perl.

The default recipe is named Embperl and contains the following providers:

    +-----------+
    + file      +
    +-----------+
          |
          v
    +-----------+
    + epparse   +
    +-----------+
          |
          v
    +-----------+
    + epcompile +
    +-----------+
          |
          v
    +-----------+
    + eprun     +
    +-----------+

This cause Embperl to behave like it has done in the past, when no
recipes existed.

Each intermediate result could be cached. So for example you are able
to cache the already parsed XML or compiled stylesheet in memory,
without the need to reparse/recompile it over and over again.

Another nice thing about recipes is that they are not static. A recipe
is defined by a recipe object. When a request comes in, Embperl calls
the get_recipe method of teh application object, which by default
calls the get_recipe of the named recipe object, which should return a array
that describes what Embperl has to do. The get_recipe methods can of course
build the array dynamically, looking, for example, at the request parameters
like filename, formvalues, mime type or whatever. For example if you
give a scalar as input the Embperl recipe replaces the file provider
with a memory provider. Additionally you can specify more then one
recipe (separated by spaces). Embperl will call all the new methods in
turn until the first one that returns undef. This way you can create recipes
that are known for what they are responsible. One possibility would be
to check the file extension and only return the recipe if it matches.
Much more sophisticated things are possible...

See perldoc Embperl::Recipe for how to create your own provider.


XML, XSLT
---------

As mentioned above, Embperl now contains a provider for doing XSLT transformations.
More XML will come in the next releases. The easiest thing is to use the XSLT
stuff thru the predefined recipes:

    EmbperlLibXSLT      the result of Embperl will run thru the Gone libxslt
    EmbperlXalanXSLT    the result of Embperl will run thru Xalan-C
    EmbperlXSLT         the result of Embperl will run thru the XSL transformer
                        given by xsltproc or EMBPERL_XSLTPROC

    LibXSLT             run source thru the Gone libxslt
    XalanXSLT           run source thru Xalan-C
    XSLT                run source thru the XSL transformer given by xsltproc or 
                        EMBPERL_XSLTPROC

For example, including the result of an XSLT 
transformation into your html page could look like this:


    <html><head><title>Include XML via XSLT</title></head>
    <body>

    <h1>Start xml</h1>
    [- Execute ({inputfile => 'foo.xml', recipe => 'EmbperlXalanXSLT', xsltstylesheet => 'foo.xsl'}) ; -]
    <h1>END</h1>

    </body>
    </html>

As you already guessed, the xsltstylesheet parameter gives the name of the xsl 
file. You can also use the EMBPERL_XSLTSTYLESHEET configuration directive
to set it from your configuration file.

By setting EMBPERL_ESCMODE (or $escmode) to 15 you get the correct escaping
for XML.


Internationalisation (I18N)
---------------------------

Starting with 2.0b6 Embperl has buildin support for multi-language applications.
There are two things to do. First inside your pages marks which parts are translateable,
by using the [= =]. Inside the [= =] blocks you could either put id, which are symbolic
names for the text, or you put the text in your primary lanaguage inside the blocks.
An example code could look like:

[= heading =]

<input name="foo" value="[=bar=]" type="submit">

Now you run the embpmsgid.pl utility, which extracts all the ids from your page:

    perl embpmsgid.pl -l de -l en -d msg.pl foo.htm

This will create a file msg.pl which contains empty definitions for 'en' and 'de'
with all the ids found in the page. If the file msg.pl already exists, the definitions
are added. You can give more then one filename to the commandline. The format of the 
msg.pl file is written with Data::Dumper, so it can be easily read in via 'do' and 
postprocessed. As next step fill the empty definition with the correct translation.
The last thing to do, is tell Embperl which language set to use. You do this inside
the init method of the application object. Create an application object, which reads
in the message and when the init method is called, pass the correct one to Embperl.
There are tow methods $r -> message and $r -> default_message. Both returns a array
ref on which you can push your message hashs. Embperl consults first the message array
and if not found afterwards the default_message array for the correct message.
Because both are arrays you can push multiple message sets on it. This is handy when
your application object calls it's base class, which also may define some messages.
Here is an example:


    package My::App ; 

    @ISA = ('Embperl::App') ;

    %messages =
        (
        'de' =>
            {
            'heading' => '�berschrift',
            'bar'     => 'Absenden',
            },
        'en' =>
            {
            'heading' => 'Heading',
            'bar'     => 'Submit',
            },
        ) ;

    sub init
        {
        my $self = shift ;
        my $r = $self -> curr_req ;

        $lang = $r -> param -> language || 'de' ;
        push @{$r -> messages}, $messages{$lang} ;
        push @{$r -> default_messages}, $messages{'en'} if ($lang ne 'en') ;
        }

    1 ;


Just load this package and set EMBPERL_APP_HANDLER_CLASS to My::App, then 
Embperl will call the init method on the start of the request.

If you are using Embperl::Object, you may instead save it as a file in your
document hiearchie make the filename know to Embperl::Object with the 
EMBPERL_OBJECT_APP directive and Embperl::Object will retrive the correct
application file, just in the same way it retrives other files.

NOTE: When using with EMbperl::Object, don't make a package declaration at
the top of your application object, Embperl::Object assign it's own namespace
to the application object.

In case you need to retrive a text inside your Perl code, you can do this
with $r -> gettext('bar')


-------------------


Enjoy

Gerald