Python API



Multiple Interpreters

When working with mod_python, it is important to be aware of a feature of Python that is normally not used when using the language for writing scripts to be run from command line. This feature is not available from within Python itself (at least in 1.5.2) and can only be accessed through the C language API.

Python C API provides the ability to create subinterpreters. A more detailed description of a subinterpreter is given in the documentation for the Py_NewInterpreter function. For this discussion, it will suffice to say that each subinterpreter has its own separate namespace, not accessible from other subinterpreters. Subinterpreters are very useful to make sure that separate programs running under the same Apache server do not "step" on each other.

At server start-up or mod_python initialization time, mod_python initializes the global interpreter. The global interpreter contains a dictionary of subinterpreters. Initially, this dictionary is empty. With every hit, as needed, subinterpreters are created, and references to them are stored in this dictionary. The dictionary is keyed on a string, also known as interpreter name. This name can be anything, except "global_interpreter", which is the name reserved for the global interpreter. The way interpreters are named can be controlled by PythonInterp directives. Default behaviour is to name interpreters using the Apache virtual server name (ServerName directive). This means that all scripts in the same vrtual server execute in the same subinterpreter, but scripts in different virtual servers execute in different subinterpreters with completely separate namespaces. PythonIterpPerDirectory and PythonInterpPerDirective directives alter the naming convention to use the absolute path of the directory being accessed, or the directory in which the Python*Handler was encountered, respectively.

Once created, a subinterpreter will be reused for subsequent requests, but it is never destroyed until the Apache child process dies.

Overview of a handler

A handler is a function that processes a particular phase of a request. Apache processes requests in phases - read the request, process headers, provide content, etc. For every phase, it will call handlers, provided by either the Apache core or one of its modules, such as mod_python, which passes control to functions provided b the user and written in Python. A handler written in Python is not any different than a handler written in C, and follows these rules:

A handler function will always be passed a reference to a request object.

Every handler can return

As an alternative to returning an HTTP error code, handlers can signal an error by raising the apache.SERVER_RETURN exception, and providing an HTTP error code as the exception value, e.g.

          raise apache.SERVER_RETURN, apache.HTTP_FORBIDDEN
        

Handlers can send content to the client using the request.write() function. Before sending the body of the response, headers must be sent using the request.send_http_header() function.

Client data, such as POST requests, can be read by using the req.read() function.

NOTE:The directory of the Apache Python*Handler in effect is prepended to the Python Path. If the directive was specified in a server config file outside any <Directory>, then the directory is unknown and not prepended.

An example of a minimalistic handler might be:

	  from mod_python import apache

	  def requesthandler(req):
              req.content_type = "text/plain"
	      req.send_http_header()
	      req.write("Hello World!")
	      return apache.OK

apache module

The Python Application Programmer interface to Apache internals is contained in a module appropriately named apache, located inside the mod_python package. This module provides some important objects that map to Apache internal structures, as well as some useful functions, all documented below.

The apache module can only be imported by a script running under mod_python. This is because it depends on a built-in module _apache provided by mod_python. It is best imported like this:

from mod_python import apache
Mod_python's apache module defines the following objects and functions. For a more in-depth look at Apache internals, see the Shambhala API Notes.

log_error(message, [level=level], [server=server])
An interface to the Apache ap_log_error function. message is a string with the error message, level is one of the following constants:

                APLOG_EMERG
                APLOG_ALERT
                APLOG_CRIT
                APLOG_ERR
                APLOG_WARNING
                APLOG_NOTICE
                APLOG_INFO
                APLOG_DEBUG
                APLOG_NOERRNO
      
server is a reference to a server object which is passed as a member of the request, request.server. If server is not specified, then the error will be logged to the default error log, otherwise it will be written to the error log for the appropriate virtual server.

make_table()
Returns a new empty table object.

Table Object

The table object is a Python mapping to the Apache table . The table object performs just like a dictionary, with the only difference that key lookups are case insensitive.

Much of the information that Apache uses is stored in tables. For example, request.header_in and request.headers_out.

All the tables that mod_python provides inside the request object are actual mappings to the Apache structures, so changing the Python table also changes the underlying Apache table.

In addition to normal dictionary-like behavior, the table object also has an add(string key, string val) method. Add() allows for creating duplicate keys, which is useful when multiple headers, such as Set-Cookie are required.

Request Object

The request object is a Python mapping to the Apache request_rec structure.

When a handler is invoked, it is always passed a single argument - the request object. Here are the attributes of the request object:

Functions

add_handler(string htype, string handler [,string dir])
Allows dynamic handler registration. htype is a name of any of the apache Python*Handler directives, e.g. "PythonHandler". handler is the name of the module and the handler function. Optional dir is the name of the directory to be added to the python path. If no directory is specified, then, if there is already a handler of the same type specified, its directory is inherited, otherwise the directory of the presently executing handler is used.

A handler added this way only persists throughout the life of the request. It is possible to register more handlers while inside the handler of the same type. One has to be careful as to not to create an infinite loop this way.

Dynamic handler registration is a useful technique that allows the code to take a decision on what will happen next. A typical example might be a PythonAuthenHandler that will assign different PythonHandlers based on the authrntication level, something like:

              if manager:
	          req.add_handler("PythonHandler", "menu::admin")
              else:
                  req.add_handler("PythonHandler", "menu::basic")
	    
Note: at this point there is no checking being done on the validity of the handler name. If you pass this function an invalid handler it will simply be ignored.

add_common_vars()
Calls the Apache ap_add_common_vars function. After a call to this function, request.subprocess_env will contain a lot of CGI information.

child_terminate()
Terminate a child process. This should terminate the current child process in a nice fashion.

This function does nothing in multithreaded environments (e.g. Windows).


get_basic_auth_pw()
Returns a string containing the password when basic authentication is used.

get_config()
Returns a reference to the table object containing the configuration in effect for this request. The table has directives as keys, and their values, if any, as values.

get_dirs()
Returns a reference to the table object keyed by directives currently in effect and having directory names of where the particular directive was last encountered as values. For every key in the table returned by get_config(), there will be a key in this table. If the directive was in one of the server config files outside of any <Directory>, then the value will be an empty string.

get_remote_host(int type = apache.REMOTE_NAME)
Returns the a string with the remote client's DNS name or IP or None on failure. The first call to this function may entail a DNS look up, but subsequent calls will use the cached result from the first call.

The optional type argument can specify the following:

apache.REMOTE_HOST Look up the DNS name. Fail if Apache directive HostNameLookups is off or the hostname cannot be determined.

apache.REMOTE_NAME (Default) Return the DNS name if possible, or the IP (as a string in dotted decimal notation) otherwise.

apache.REMOTE_NOLOOKUP Don't perform a DNS lookup, return an IP. Note: if a lookup was performed prior to this call, then the cached host name is returned.

apache.REMOTE_DOUBLE_REV Force a double-reverse lookup. On failure, return None.


get_options()
Returns a reference to the table object containing the options set by the PythonOption directives.

read(int len)
Reads len bytes directly from the client, returning a string with the data read. When there is nothing more to read, None is returned. To find out how much there is to read, use the Content-length header sent by the client, for example:
            len = int(req.headers_in["content-length"])
            form_data = req.read(len)
	    
This function is affected by the Timeout Apache configuration directive. The read will be aborted and an IOError raised if the Timout is reached while reading client data.

register_cleanup(callable function, data=None)
Registers a cleanup. Argument function can be any callable object, the optional argument data can be any object. At the very end of the request, just before the actual request record is destroyed by Apache, function function will be called with one argument, data.

send_http_header()
Starts the output from the request by sending the HTTP headers. This function has no effect when called more than once within the same request. Any manipulation of request.headers_out after this function has been called is pointless since the headers have already been sent to the client.

write(string)
Writes string directly to the client, then flushes the buffer.

Other Members

The request object contains most of the members of the underlying request_rec.
connection connection object, RO
A connection object associated with this request. See Connection Object below for details.

server server object, RO
A server object associate with this request. See Server Object below for details.

next request object, RO
If this is an internal redirect, the request object we redirect to.

prev request object, RO
If this is an internal redirect, the request object we redirect from.

main request object, RO
If this is a sub-request, pointer to the main request.

the_request string, RO
First line of the request.

assbackwards int, RO
Is this an HTTP/0.9 "simple" request?

header_only int, RO
HEAD request, as opposed to GET.

protocol string, RO
Protocol, as given by the client, or "HTTP/0.9"

proto_num int, RO
Number version of protocol; 1.1 = 1001

hostname string, RO
Host, as set by full URI or Host:

request_time long, RO
When request started.

status_line string, RO
Status line. E.g. "200 OK".

status int, RW
An integer, whose value will be used in building the status line of the HTTP reply headers.

method string, RO
Method - GET, HEAD, POST, etc.

method_number int, RO
Method number.

allowed int, RO
A bitvector of the allowed methods. Used in relation with METHOD_NOT_ALLOWED.

sent_body int, RO
Byte count in stream is for body. (?)

bytes_sent long, RO
Bytes sent.

mtime long, RO
Time the resource was last modified.

boundary string, RO
Multipart/byteranges boundary.

range string, RO
The Range: header.

clength long, RO
The "real" content length. (I.e. can only be used after the content's been read?)

remaining long, RO
Bytes left to read. (Only makes sense inside a read operation.)

read_length long, RO
Bytes that have been read.

read_body int, RO
How the request body should be read. (?)

read_chunked int, RO
Read chunked transfer coding.

headers_in
A table object containing the headers send by the client.

headers_out
A table object representing the headers to be sent to the client. Note that manipulating this table after the request.send_http_headers() has been called is meaningless, since the headers have already gone out to the client.

err_headers_out table
These headers get send with the error response, instead of headers_out.

subprocess_env table
A table representing the subprocess environment. See also request.add_common_vars().

notes table
A place-holder for request-specific information to be used by modules.

content_type string, RW
A string, representing the response content type.

headers_out table
Headers going out to the client.

handler string, RO
The hame of the handler currently being processed. In all cases with mod_python, this should be "python-program".

content_encoding string, RO
Content encoding

vlist_validator string, RO
Variant list validator (if negotiated)

no_cache int, RO
No cache.

no_local_copy int, RO
No local copy exists.

unparsed_uri string, RO
The URI without any parsing performed.

uri string, RO
The path portion of the URI

filename string, RO
The file name being requested.

path_info
string, RO
What follows after the file name.

args
string, RO
QUERY_ARGS, if any

Connection Object

The connection object is a Python mapping to the Apache conn_rec structure.

server server object, RO
A server object associate with this connection. See Server Object below for details.

base_server server object, RO
A server object for the physical vhost that this connection came in through.

child_num int, RO
The number of the child handling the request.

local_addr tuple, RO
The (address, port) tuple for the server.

remote_iddr tuple, RO
The (address, port) tuple for the client.

remote_ip string, RO
The IP of the client.

remote_host string, RO
The DNS name of the remote client. None if DNS has not been checked, "" (empty string) if no name found.

remote_logname string, RO
Remote name if using RFC1413 (ident).

user string, RO
If an authentication check is made, this will hold the user name. NOTE: You must call get_basic_auth_pw() before using this value.

ap_auth_type string, RO
Authentication type. (None == basic?)

keepalives int, RO
The number of times this connection has been used. (?)

local_ip string, RO
The IP of the server.

local_host string, RO
The DNS name of the server.

Server Object

The request object is a Python mapping to the Apache request_rec structure. The server structure describes the server (possibly virtual server) serving the request.

Functions


register_cleanup(request, callable function, data=None)
Registers a cleanup. Very similar to req.register_cleanup(), except this cleanup will be executed at child termination time. This function requires one extra argument - the request object.

Other Members

defn_name string, RO
The name of the configuration file where the server definition was found.

defn_line_number int, RO
Line number in the config file where the server definition is found.

srm_confname string, RO
Location of the srm config file.

server_admin string, RO
Value of the ServerAdmin directive.

server_hostname string, RO
Value of the ServerName directive.

port int, RO
TCP/IP port number.

error_fname string, RO
The name of the error log file for this server, if any.

loglevel int, RO
Logging level.

is_virtual int, RO
1 if this is a virtual server.

timeout int, RO
Timeout before we give up.

keep_alive_timeout int, RO
Keep-Alive timeout.

keep_alive_max int, RO
Maximum number of requests per Keep-Alive.

keep_alive int, RO
1 if keep-alive is on.

send_buffer_size int, RO
Size of the TCP send buffer.

path string, RO
Path for ServerPath.

pathlen int, RO
Path length.

server_uid int, RO
UID under which the server is running.

server_gid int, RO
GID under which the server is running.

Debugging

Mod_python supports the ability to execute handlers within the Python debugger (pdb) via the PythonEnablePdb Apache directive. Since the debugger is an interactive tool, httpd must be invoked with the -X option. (NB: When pdb starts, you will not see the usual ">>>" prompt. Just type in the pdb commands like you would if there was one.)

Internal Callback Object

The Apache server interfaces with the Python interpreter via a callback object obCallBack. When a subinterpreter is created, an instance of obCallBack is created in this subinterpreter. Interestingly, obCallBack is not written in C, it is written in Python and the code for it is in the apache module. Mod_python only uses the C API to import apache and then instantiate obCallBack, storing a reference to the instance in the interpreter dictionary described above. Thus, the values in the interpreter dictionary are callback object instances.

When a request handler is invoked by Apache, mod_python uses the obCallBack reference to call its method Dispatch, passing it the name of the handler being invoked as a string.

The Dispatch method then does the rest of the work of importing the user module, resolving the callable object in it and calling it passing it a request object.


Last modified: Wed Oct 18 11:34:43 EDT 2000