XXX some quick stats:

	import cgi
	print "hello"

plain cgi:

	5 req/s

cgihandler

	40 req/s

handler function

	140 req/s


Httpdapy 1.7b - Dec 1999

Copyright Gregory Trubetskoy <grisha@ispol.com>

Original concept and first code by Aaron Watters from "Internet
Programming with Python" by Aaron Watters, Guido Van Rossum and James
C. Ahlstrom, ISBN 1-55851-484-8

*** If you are impatient, skip on to INSTALLATION! ***

OVERVIEW

Httpdapy allows embedding Python within a webserver for a considerable
boost in performance and added flexibility in designing web based
applications.

Currently the only supported http server is Apache.

DOCUMENTATION

At the very least browse through this file, and make sure to read the doc
string at the beginning of httpdapi.py module. The reason documentation is
in separate places is that httpdapi.py is not meant to be server
indepedent. All Apache-specific (installation and Apache options)
information is in this README file.

HISTORY

While developing my first WWW applications a few years back, I found that
using CGI for programs that need to connect to relational databases
(commercial or not) is too slow because every hit requires loading of the
interpreter executable which can be megabytes in size, any database
libraries that can themselves be pretty big, plus, the database
connection/authentication process carries a very significant overhead
because it involves things like DNS resolutions, encryption, memory
allocation, etc.. Under pressure to speed up the application, I nearly
gave up the idea of using Python for the project and started researching
other tools that claimed to specialize in www database integration. I did
not have any faith in MS's ASP; was quite frustrated by Netscape
LiveWire's slow performance and bugginess; Cold Fusion seemed promissing,
but I soon learned that writing in html-like tags makes programs as
readable as assembly. Same is true for PHP. Besides, I *really* wanted to
write things in Python.

Around the same time the IPWP book came out and the chapter describing how
to embed Python within Netscape server immediately caught my attention.  
I used the example in my project, and developed an improved version of
what I later called Nsapy that compiled on both Windows NT and Solaris.

Although Nsapy only worked with Netscape servers, it was a very
intelligent generic OO design that, in the spririt of Python, lended
itself for easy portability to other web servers.

Incidently, the popularity of Netscape's servers was taking a turn
south, and so I set out to port Nsapy to other servers starting with
the most popular one, Apache. And so from Nsapy was born Httpdapy.

Don't ask me how to pronounce it. I don't know, I never had to. If you
have ideas of a catchy and phonetically sensible name, e-mail me.

WHAT'S NEW SINCE THE 0.x VERSIONS

1. ZPublisher (formerly Bobo) support. The httpdapi_publisher module
allows the use of ZPublisher (http://www.digicool.com/site/Bobo/) with
httpdapi. See the module comments to find out how. Note that Zope
(which is a development environment which includes ZPublisher) does
not work reliably with Httpdapy yet. Zope does not have a good locking
mechanism for its persistent object storage which creates a problem
with Apache since apache runs as several separate processes any one of
which can try to modify to the storage. I expect this will be resolved
in the near future by producers of Zope.

2. This version makes a major leap forward by introducing multiple
interpreters. Scripts running in different directories will run in
(almost) completely separate and clean namespace. At (apache) module
initialization, a dictionary of interpreters keyed by interpreter name
is created. Interpreter name can be any string. For every hit,
Httpdapy will use the file parent directory path from the URI as
interpreter name. If an interpreter by this name already exists, it
will be used, else it is created. You can also force an interpreter
name with PythonInterpreter directive (whose effects recurse into
subdirectories). This is useful if you want to share an interpreter
between separate directories.

3. Autoreload mode. It is on by default. It makes the server keep
track of module import time and forces a reload when the script file
change time is later than the time of last import. (Don't confuse this
with Python's default behaviour. In Python, once the module has been
imported, nothing but the "reload" command will make the interpreter
read the file again.)

3. mod_python.c will cd into the directory to which the URL points and
httpdapi.py will prepend a '.' to PYTHONPATH. This means that scripts
can be imported from the current directory. This is more intuitive in
my opinion than knowing that scripts are imported from somewhere in
PYTHONPATH.

For authentication, the current directory is the directory in which
AuthPythonModule was last encountered.

4. URL's no longer have to end with .pye. Httpdapy will now simply cut
off the file extention to obtain the module name.

5. To ease migration from CGI programs which usually have a lot of
print statements, there are two new functions allowing redirection of
stdout to the socket a la CGI. After a call self.hook_stdout() all
stdout will be sent to the browser. A call to self.unhook_stdout()
restores the old sys.stdout. Note that the first write to stdout will
cause all the headers to be sent out, and therefore any header
manupulation (e.g. by code in an overriden Headers() method) is
meaningless. Also note that this is only a hack. I do not recommend
you rely on this feature unless you absolutely have to.

6. PythonInitFunction directive is now obsolete. It still works for
backwards compatibility, but it forces the interpreter to function in
"single interpreter" mode - i.e. all scripts share the same global
namespace. Note that you can still use separate interpreters in single
mode by forcing an interpreter name with PythonInterpreter directive.

IS THIS SAME AS PYAPACHE?

No, but they both attempt to solve the same problem - the inefficiency
of CGI. Here are some specific differences:

1. The development process for PyApache and Httpdapy is different. For
PyApache you write CGI scripts. You get your info from the environment
and write to stdout. With Httpdapy you have to inherit from the
httpdapi.RequestHandler() class, and the content is sent by returning
a string rather than writing to stdout.

2. Httpdapy takes advantage of a new featrue in Python (since 1.5 I
think) that allows creation of multiple sub-interpreters each having
its own versions of imported modules, separate sys.modules,
__builtin__, __main__, stdin, stdout, etc. The Python C API
documentation describes it better here:

http://www.python.org/doc/api/initialization.html#l2h-2379

Httpdapy creates separate sub-interpreters for different directories
in which scripts are located. So scripts /mydir/myscript.py and
/hisdir/hisscript.py will run in separate interpreters. (There is also
a way to override this and share sub-interpreters between
directories.)

As far as I understand mod_perl does something similar. PyApache does
not do this. In PyApache, the sub-interpreter is reset (destroyed and
recreated) for every hit, so you don't have different interpreters
running in parallel.

2. PyApache creates a sub-interpreter (Py_NewInterpreter()) for every
request and destroys it when done. This means that if your script
begins with "import HTMLgen", HTMLgen is imported (bytecode read from
file) for every hit.

Httpdapy keeps the interpreter around from the first hit and until the
process dies. So only the first hit will actually read HTMLgen.py(c),
all the subsequent won't.

3. While PyApache is written in its entirety in C, Httpdapy only uses
enough C to provide a "link" between python and the web server. Most
of the actual functionality of Httpdapy is implemented in Python.
Httpdapy's C code imports the module, instantiates a Python objects
and from then on delegates handling all of the requests to that Python
object.

This is probably a tad slower than pure C, but it is more flexible this
way and allows for a tighter integration with the user scripts I think.

4. Httpdapy has a couple of features convenient for developers. It can
write python traceback prints to the browser and will also re-import
(using "reload" statement) scripts whose file change date is newer
than the time of last import. Before I introduced this feature, one
had to restart the server every time a change to a script was made.

5. The httpdapi_publisher module provides plumbing for Zope. This
still needs some work, but I think this is a very exciting
feature. While Zope provides it's own tool for maintaining interpreter
persistance that does not use embedding but instead requires you to
run a separate "server" written in Python, I think embedding the
interpreter within the http server is a better solution.

REQUIREMENTS

My testing was done with Python 1.5(.1) and Apache 1.3.3.  It worked
on Linux 2.0 and Solaris 2.5.1, and it should work on Windows NT. I
haven tried compiling this version on NT, but the orgiinal 0.1b was
actually first developed on NT.

INSTALLATION

If you want to compile it on Windows NT - you're on your own. It
shouldn't be hard, I just don't feel like writing the instructions for
it right now.

The instrucitons below describe only the "Configure" method, not the new
"Apaci" Apache configuration method. If you consider yourself a programmer,
then you should feel right at home with "Configure" and a bit lost with
"Apaci". At least I do.

On UNIX, do this:

1. Copy main/mod_python.c into src/modules/extra directory in Apache source
tree.

2. cd into the src directory in Apache source tree.

3. Edit file Configuration so it has something like below. Edit
EXTRA_LDFLAGS and EXTRA_LIBS to match your system, if you used additional
libraries when compiling python, e.g. libreadline or libmysqlclient, they
have to be in EXTRA_LIBS.

This worked on Debian Linux:

PY_LIB_DIR=/usr/local/lib/python1.5/config
PY_INC_DIR=/usr/local/include/python1.5

EXTRA_CFLAGS=-Wall
EXTRA_LDFLAGS=-Xlinker -export-dynamic
EXTRA_LIBS=$(PY_LIB_DIR)/libpython1.5.a -lreadline -lncurses -ldl -lm -lpthread
EXTRA_INCLUDES=-I$(PY_INC_DIR)
EXTRA_DEPS=

On FreeBSD 3.3 (Python was installed using the ports collection) EXTRA_LIBS
and EXTRA_LDFLAGS could look like this:

EXTRA_LDFLAGS= -pthread -Xlinker -export-dynamic
EXTRA_LIBS=$(PY_LIB_DIR)/libpython1.5.a -lmytinfo -lreadline -ltermcap -lm -lcrypt

(You may want to try compiling without thread support on FreeBSD, I think there
are some issues between apache and threads. You'll Python compiled without thread
support for that. The ports collection Python has thread support enabled.)

On Sun Solaris 7, I used this (no EXTRA_LDFLAGS necessary):

EXTRA_LIBS=$(PY_LIB_DIR)/libpython1.5.a -lm -lthread -lpthread -ltermcap
EXTRA_LDFLAGS=

Then, somewhere down below in the Configuration file, add this:

AddModule modules/extra/mod_python.o

I recommend that it be the first AddModules line in the file, which means
Python processing takes place LAST.

4. Run ./Configure

5. Run make. Now you should have an httpd program to use.

From here on applicable to Windows also:

6. Drop httpdapi.py and apache.py (found in main dir) into your pythonpath
somewhere. An excellent place is /usr/local/lib/site-python.

7. Drop httpdapitest.py and auth.py (found in sample dir) in a
directory visible from the web. Do NOT make directory where your
scripts are a ScriptAlias!

8. Add this line to .htaccess or wherever else you prefer. NOTE that in
order for AddHandler to work in .htaccess, you need to have AllowOverride
FileInfo, so edit your httpd.conf accordingly:

AddHandler python-program .py

9. Restart the server if necessary. You should now be able to look at

http://myserver/httpdapitest.py

10. Now go read the comments at the top of httpdapi.py file about how to
write your own programs. Enjoy!

AUTHENTICATION

If you want to do authentication via Python, put this in your .htaccess
file:

AuthPythonModule auth <-- replace "auth" with your module name
AuthName "My Realm"
AuthType Basic

<Limit GET>
require valid-user
</Limit>

make sure to look at auth.py and that it is in your pythonpath. You can
replace auth with authDEBUG in your .htaccess to have the server reload the
module every time - useful in debugging.

TROUBLESHOOTING

It is helpful to realize that httpdapitest.py may be read from your
PYTHONPATH rather than the document directory of the server. To reduce
possible confusion, Httpdapy always prepends a "." to sys.path (unless it
is set explicitely with a PythonOption pythonpath). Also see SECURITY NOTE
below.

If you get server error, try adding this to your .htaccess:

PythonOption debug 1

This should print some traceback information about your Python error. If it
didn't then something went wrong with the installation. Also check your
error_log.

If you're really having problems, edit mod_python.c and set debug
to 1. Then run httpd from the command line with an -X option. This
should print lots of debugging information. Also check the server
error logs for errors.

WHAT OTHER ARGUMENTS does PythonOption take?

PythonOption takes two arguments, option name and option value. Whatever
you set by PythonOption will be passed to user scripts as part of the
self.pb parameter block. Options by PythonOption recurse into
subdirectories. These values currently have special meaning to httpdapi:

* "PythonOption debug 1" Turns debugging on. Default is off.

* "PythonOption autoreload 0" Turns off autoreload mode for a very
slight performance gain. The default is 1 (on).

* "PythonOption rootpkg pkgname" "pkgname" will be prepended pkgname
to all module names before they are imported. Good for keeping things
organized and provides tighter security.

* "PythonOption handler handlermodule" When a handler is set, all
requests in this directory will be served by handlermodule only,
regardless of what the URL says. This is useful for integration with
Zope. See httpdapi_publisher.py for more details.

* "PythonOption pythonpath path" allows specifying a pythonpath. When
this option is present, Httpdapy will not prepend a "." to the
path. The "path" argument will be processed with "eval" so it should
be formatted accordingly. Here is an example:

PythonOption pythonpath "['.','/usr/lib/python']"

SECURITY NOTE

So what if someone tries to execute python code from the standard
Python library with a malicious intent? Since all the modules are
imported from PYTHONPATH, doesn't that mean that anyone can do
anything by calling the right URL's? The answer is: No, because any
module that does not contain a RequestHandler class will error out.

Still, this is a very valid concern, and I by no means gurarantee that
Httpdapy has no security holes, though at this point I am not aware of
any. For tighter security, always use the rootpkg option, as well as
watch carefully what your pythonpath contains.

APACHE NOTE

It is important to understand that apache runs several processes that
are every once in a while recycled to service requests. This means
that if you assign a value to a variable in a script serviced by one
child process, that variable is not visible from all the others. It
also means that if you do any initialization, it may happen more than
you might initially expect...

Good Luck!

Linux Note:

You will encounter problems if your scripts use threads on Linux 2.0. The http
server will appear to hang upon attempts to create new threads. This is because
the LinuxThreads library (libpthreads) uses a signal (SIGUSR1) that is also in
use by Apache. You can read more about the use of SIGUSR1 in LinuxThreads in
the LinuxThreads FAQ at http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html.
I understand the issue with lack of signals has been addressed in the 2.1.x
(soon to be 2.2) kernel.

There is no simple resoulution to this problem other than not using threads in
your programs. The FAQ suggests changing the LinuxThreads code switching it to
use different signals. I have tried it and it works, but because LinuxThreads
is now part of glibc, compiling the LinuxThreads library means compiling libc. 
To make a long story short, it is not something you want to do unless you
really know what you are doing. A problem with libc may render your system
unusable.