XXX some quick stats: import cgi print "hello" plain cgi: 5 req/s cgihandler 40 req/s handler function 140 req/s Httpdapy 1.7b - Dec 1999 Copyright Gregory Trubetskoy Original concept and first code by Aaron Watters from "Internet Programming with Python" by Aaron Watters, Guido Van Rossum and James C. Ahlstrom, ISBN 1-55851-484-8 *** If you are impatient, skip on to INSTALLATION! *** OVERVIEW Httpdapy allows embedding Python within a webserver for a considerable boost in performance and added flexibility in designing web based applications. Currently the only supported http server is Apache. DOCUMENTATION At the very least browse through this file, and make sure to read the doc string at the beginning of httpdapi.py module. The reason documentation is in separate places is that httpdapi.py is not meant to be server indepedent. All Apache-specific (installation and Apache options) information is in this README file. HISTORY While developing my first WWW applications a few years back, I found that using CGI for programs that need to connect to relational databases (commercial or not) is too slow because every hit requires loading of the interpreter executable which can be megabytes in size, any database libraries that can themselves be pretty big, plus, the database connection/authentication process carries a very significant overhead because it involves things like DNS resolutions, encryption, memory allocation, etc.. Under pressure to speed up the application, I nearly gave up the idea of using Python for the project and started researching other tools that claimed to specialize in www database integration. I did not have any faith in MS's ASP; was quite frustrated by Netscape LiveWire's slow performance and bugginess; Cold Fusion seemed promissing, but I soon learned that writing in html-like tags makes programs as readable as assembly. Same is true for PHP. Besides, I *really* wanted to write things in Python. Around the same time the IPWP book came out and the chapter describing how to embed Python within Netscape server immediately caught my attention. I used the example in my project, and developed an improved version of what I later called Nsapy that compiled on both Windows NT and Solaris. Although Nsapy only worked with Netscape servers, it was a very intelligent generic OO design that, in the spririt of Python, lended itself for easy portability to other web servers. Incidently, the popularity of Netscape's servers was taking a turn south, and so I set out to port Nsapy to other servers starting with the most popular one, Apache. And so from Nsapy was born Httpdapy. Don't ask me how to pronounce it. I don't know, I never had to. If you have ideas of a catchy and phonetically sensible name, e-mail me. WHAT'S NEW SINCE THE 0.x VERSIONS 1. ZPublisher (formerly Bobo) support. The httpdapi_publisher module allows the use of ZPublisher (http://www.digicool.com/site/Bobo/) with httpdapi. See the module comments to find out how. Note that Zope (which is a development environment which includes ZPublisher) does not work reliably with Httpdapy yet. Zope does not have a good locking mechanism for its persistent object storage which creates a problem with Apache since apache runs as several separate processes any one of which can try to modify to the storage. I expect this will be resolved in the near future by producers of Zope. 2. This version makes a major leap forward by introducing multiple interpreters. Scripts running in different directories will run in (almost) completely separate and clean namespace. At (apache) module initialization, a dictionary of interpreters keyed by interpreter name is created. Interpreter name can be any string. For every hit, Httpdapy will use the file parent directory path from the URI as interpreter name. If an interpreter by this name already exists, it will be used, else it is created. You can also force an interpreter name with PythonInterpreter directive (whose effects recurse into subdirectories). This is useful if you want to share an interpreter between separate directories. 3. Autoreload mode. It is on by default. It makes the server keep track of module import time and forces a reload when the script file change time is later than the time of last import. (Don't confuse this with Python's default behaviour. In Python, once the module has been imported, nothing but the "reload" command will make the interpreter read the file again.) 3. mod_python.c will cd into the directory to which the URL points and httpdapi.py will prepend a '.' to PYTHONPATH. This means that scripts can be imported from the current directory. This is more intuitive in my opinion than knowing that scripts are imported from somewhere in PYTHONPATH. For authentication, the current directory is the directory in which AuthPythonModule was last encountered. 4. URL's no longer have to end with .pye. Httpdapy will now simply cut off the file extention to obtain the module name. 5. To ease migration from CGI programs which usually have a lot of print statements, there are two new functions allowing redirection of stdout to the socket a la CGI. After a call self.hook_stdout() all stdout will be sent to the browser. A call to self.unhook_stdout() restores the old sys.stdout. Note that the first write to stdout will cause all the headers to be sent out, and therefore any header manupulation (e.g. by code in an overriden Headers() method) is meaningless. Also note that this is only a hack. I do not recommend you rely on this feature unless you absolutely have to. 6. PythonInitFunction directive is now obsolete. It still works for backwards compatibility, but it forces the interpreter to function in "single interpreter" mode - i.e. all scripts share the same global namespace. Note that you can still use separate interpreters in single mode by forcing an interpreter name with PythonInterpreter directive. IS THIS SAME AS PYAPACHE? No, but they both attempt to solve the same problem - the inefficiency of CGI. Here are some specific differences: 1. The development process for PyApache and Httpdapy is different. For PyApache you write CGI scripts. You get your info from the environment and write to stdout. With Httpdapy you have to inherit from the httpdapi.RequestHandler() class, and the content is sent by returning a string rather than writing to stdout. 2. Httpdapy takes advantage of a new featrue in Python (since 1.5 I think) that allows creation of multiple sub-interpreters each having its own versions of imported modules, separate sys.modules, __builtin__, __main__, stdin, stdout, etc. The Python C API documentation describes it better here: http://www.python.org/doc/api/initialization.html#l2h-2379 Httpdapy creates separate sub-interpreters for different directories in which scripts are located. So scripts /mydir/myscript.py and /hisdir/hisscript.py will run in separate interpreters. (There is also a way to override this and share sub-interpreters between directories.) As far as I understand mod_perl does something similar. PyApache does not do this. In PyApache, the sub-interpreter is reset (destroyed and recreated) for every hit, so you don't have different interpreters running in parallel. 2. PyApache creates a sub-interpreter (Py_NewInterpreter()) for every request and destroys it when done. This means that if your script begins with "import HTMLgen", HTMLgen is imported (bytecode read from file) for every hit. Httpdapy keeps the interpreter around from the first hit and until the process dies. So only the first hit will actually read HTMLgen.py(c), all the subsequent won't. 3. While PyApache is written in its entirety in C, Httpdapy only uses enough C to provide a "link" between python and the web server. Most of the actual functionality of Httpdapy is implemented in Python. Httpdapy's C code imports the module, instantiates a Python objects and from then on delegates handling all of the requests to that Python object. This is probably a tad slower than pure C, but it is more flexible this way and allows for a tighter integration with the user scripts I think. 4. Httpdapy has a couple of features convenient for developers. It can write python traceback prints to the browser and will also re-import (using "reload" statement) scripts whose file change date is newer than the time of last import. Before I introduced this feature, one had to restart the server every time a change to a script was made. 5. The httpdapi_publisher module provides plumbing for Zope. This still needs some work, but I think this is a very exciting feature. While Zope provides it's own tool for maintaining interpreter persistance that does not use embedding but instead requires you to run a separate "server" written in Python, I think embedding the interpreter within the http server is a better solution. REQUIREMENTS My testing was done with Python 1.5(.1) and Apache 1.3.3. It worked on Linux 2.0 and Solaris 2.5.1, and it should work on Windows NT. I haven tried compiling this version on NT, but the orgiinal 0.1b was actually first developed on NT. INSTALLATION If you want to compile it on Windows NT - you're on your own. It shouldn't be hard, I just don't feel like writing the instructions for it right now. The instrucitons below describe only the "Configure" method, not the new "Apaci" Apache configuration method. If you consider yourself a programmer, then you should feel right at home with "Configure" and a bit lost with "Apaci". At least I do. On UNIX, do this: 1. Copy main/mod_python.c into src/modules/extra directory in Apache source tree. 2. cd into the src directory in Apache source tree. 3. Edit file Configuration so it has something like below. Edit EXTRA_LDFLAGS and EXTRA_LIBS to match your system, if you used additional libraries when compiling python, e.g. libreadline or libmysqlclient, they have to be in EXTRA_LIBS. This worked on Debian Linux: PY_LIB_DIR=/usr/local/lib/python1.5/config PY_INC_DIR=/usr/local/include/python1.5 EXTRA_CFLAGS=-Wall EXTRA_LDFLAGS=-Xlinker -export-dynamic EXTRA_LIBS=$(PY_LIB_DIR)/libpython1.5.a -lreadline -lncurses -ldl -lm -lpthread EXTRA_INCLUDES=-I$(PY_INC_DIR) EXTRA_DEPS= On FreeBSD 3.3 (Python was installed using the ports collection) EXTRA_LIBS and EXTRA_LDFLAGS could look like this: EXTRA_LDFLAGS= -pthread -Xlinker -export-dynamic EXTRA_LIBS=$(PY_LIB_DIR)/libpython1.5.a -lmytinfo -lreadline -ltermcap -lm -lcrypt (You may want to try compiling without thread support on FreeBSD, I think there are some issues between apache and threads. You'll Python compiled without thread support for that. The ports collection Python has thread support enabled.) On Sun Solaris 7, I used this (no EXTRA_LDFLAGS necessary): EXTRA_LIBS=$(PY_LIB_DIR)/libpython1.5.a -lm -lthread -lpthread -ltermcap EXTRA_LDFLAGS= Then, somewhere down below in the Configuration file, add this: AddModule modules/extra/mod_python.o I recommend that it be the first AddModules line in the file, which means Python processing takes place LAST. 4. Run ./Configure 5. Run make. Now you should have an httpd program to use. From here on applicable to Windows also: 6. Drop httpdapi.py and apache.py (found in main dir) into your pythonpath somewhere. An excellent place is /usr/local/lib/site-python. 7. Drop httpdapitest.py and auth.py (found in sample dir) in a directory visible from the web. Do NOT make directory where your scripts are a ScriptAlias! 8. Add this line to .htaccess or wherever else you prefer. NOTE that in order for AddHandler to work in .htaccess, you need to have AllowOverride FileInfo, so edit your httpd.conf accordingly: AddHandler python-program .py 9. Restart the server if necessary. You should now be able to look at http://myserver/httpdapitest.py 10. Now go read the comments at the top of httpdapi.py file about how to write your own programs. Enjoy! AUTHENTICATION If you want to do authentication via Python, put this in your .htaccess file: AuthPythonModule auth <-- replace "auth" with your module name AuthName "My Realm" AuthType Basic require valid-user make sure to look at auth.py and that it is in your pythonpath. You can replace auth with authDEBUG in your .htaccess to have the server reload the module every time - useful in debugging. TROUBLESHOOTING It is helpful to realize that httpdapitest.py may be read from your PYTHONPATH rather than the document directory of the server. To reduce possible confusion, Httpdapy always prepends a "." to sys.path (unless it is set explicitely with a PythonOption pythonpath). Also see SECURITY NOTE below. If you get server error, try adding this to your .htaccess: PythonOption debug 1 This should print some traceback information about your Python error. If it didn't then something went wrong with the installation. Also check your error_log. If you're really having problems, edit mod_python.c and set debug to 1. Then run httpd from the command line with an -X option. This should print lots of debugging information. Also check the server error logs for errors. WHAT OTHER ARGUMENTS does PythonOption take? PythonOption takes two arguments, option name and option value. Whatever you set by PythonOption will be passed to user scripts as part of the self.pb parameter block. Options by PythonOption recurse into subdirectories. These values currently have special meaning to httpdapi: * "PythonOption debug 1" Turns debugging on. Default is off. * "PythonOption autoreload 0" Turns off autoreload mode for a very slight performance gain. The default is 1 (on). * "PythonOption rootpkg pkgname" "pkgname" will be prepended pkgname to all module names before they are imported. Good for keeping things organized and provides tighter security. * "PythonOption handler handlermodule" When a handler is set, all requests in this directory will be served by handlermodule only, regardless of what the URL says. This is useful for integration with Zope. See httpdapi_publisher.py for more details. * "PythonOption pythonpath path" allows specifying a pythonpath. When this option is present, Httpdapy will not prepend a "." to the path. The "path" argument will be processed with "eval" so it should be formatted accordingly. Here is an example: PythonOption pythonpath "['.','/usr/lib/python']" SECURITY NOTE So what if someone tries to execute python code from the standard Python library with a malicious intent? Since all the modules are imported from PYTHONPATH, doesn't that mean that anyone can do anything by calling the right URL's? The answer is: No, because any module that does not contain a RequestHandler class will error out. Still, this is a very valid concern, and I by no means gurarantee that Httpdapy has no security holes, though at this point I am not aware of any. For tighter security, always use the rootpkg option, as well as watch carefully what your pythonpath contains. APACHE NOTE It is important to understand that apache runs several processes that are every once in a while recycled to service requests. This means that if you assign a value to a variable in a script serviced by one child process, that variable is not visible from all the others. It also means that if you do any initialization, it may happen more than you might initially expect... Good Luck! Linux Note: You will encounter problems if your scripts use threads on Linux 2.0. The http server will appear to hang upon attempts to create new threads. This is because the LinuxThreads library (libpthreads) uses a signal (SIGUSR1) that is also in use by Apache. You can read more about the use of SIGUSR1 in LinuxThreads in the LinuxThreads FAQ at http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html. I understand the issue with lack of signals has been addressed in the 2.1.x (soon to be 2.2) kernel. There is no simple resoulution to this problem other than not using threads in your programs. The FAQ suggests changing the LinuxThreads code switching it to use different signals. I have tried it and it works, but because LinuxThreads is now part of glibc, compiling the LinuxThreads library means compiling libc. To make a long story short, it is not something you want to do unless you really know what you are doing. A problem with libc may render your system unusable.