IPC/SCOREBOARD DISCUSSION POINTS: (These notes are based purely on my own observations and thoughts may need to be corrected/revised/deleted; please do so! [chrisd]) * Update: [chrisd, 2007-01-29] - the socache API in 2.4/3.0 trunk implements much of what was suggested below for a hash API: - thanks to jorton - the slotmem API in 2.4/3.0 trunk implements much of what was suggested below for a tables/slots API: - thanks to jim and others * Historical review: [chrisd] - at least several long previous discussions on mailing lists: http://mail-archives.apache.org/mod_mbox/httpd-dev/199901.mbox/%3c19990124134912.A29193@engelschall.com%3e http://mail-archives.apache.org/mod_mbox/apr-dev/200201.mbox/%3c20020107103123.H1529@clove.org%3e - the preferred form of scoreboard storage is an anonymous virtual memory mapping using mmap(): - this has the advantage over System V shmget() that when all processes exit, memory is reclaimed automatically - the lack of a backing store (i.e., file) when using anonymous mappings may improve performance - to improve performance and avoid contention, no locks are used: - the APR-util RMM API is therefore not used, as it requires locks - each process and/or thread is assigned a unique pair of indices (these are neither the system pid nor tid) - to support multi-process, multi-thread MPMs (e.g., worker), the scoreboard is allocated as a 2-D table of slots - each child process and/or thread writes to its own slot, addressed by its indices - the table is sparsely populated, because most indices are unused - under normal operation, only a single writer exists for each slot - readers (including the master process/thread, and non-MPM modules such as mod_status) expect partially-written data - shared memory can not normally be resized, at least once shared between multiple processes, so the scoreboard table is allocated only once, at startup: - the size is based on compile-time maximum process and thread limits - these are large, leading to a fairly sparse table - slots are relatively small, to attempt to reduce scoreboard size - when the server is restarted, the scoreboard is reused: - the MPM's master process/thread normally watches for exiting processes/threads and reassigns their indices to new children - errors may cause a "long lost" child to continue writing to a reassigned slot - multiple restarts may lead to multiple generations of children, especially when using graceful restarts - because modules may be added/removed at each restart, the scoreboard can not be shared with them, except as readers: - no way to resize the scoreboard to accomodate the needs of a module added at a restart (nor to release memory allocated to a module being unloaded) - certain modules in the httpd distribution (e.g., mod_proxy) do use the scoreboard, through custom extensions unavailable to third-party modules * High-level questions (and provisional answers): [chrisd] - is it still advisable to avoid locking? - yes, at least with shared memory, since this is the default and is intended to provide good performance for typical users - alternative special-purpose storage providers might provide/require locking, but users can be expected to be aware of the tradeoffs - is it still advisable to reuse a single block of shared memory after restarts? - this might be reconsidered, if the tradeoffs were acceptable - for example, suppose the MPM master process/thread allocated a new scoreboard on restart, and marked the old one stale by setting a flag bit inside it: - previous generations of children would attempt to exit as soon as the MPM detected the flag (in addition to usual pipe-of-death or other parent-child signalling) - once all children in a previous generation had exited, master would release old scoreboard - previous generations of children would still see their original scoreboard and not the new one: - scoreboard/IPC API would detect stale flag and signal callers through error return values, or at a minimum callers should test for staleness with a call prior to using scoreboard - long-lost children would, hopefully, exit once they detected the condition - during graceful restarts in particular, currently executing requests using modules that wrote private scoreboard data would succeed, but data would be absent from new scoreboard and lost when child exited; modules would have to expect this - would need to be tested with all flavours of shared memory used internally by APR, including mmap(), shm_open(), shmget(), named, anonymous, /dev/zero, etc. * Provide a generic IPC storage API: [jim, chrisd] - support two types of storage, hashes and tables/slots - key/value hash functions (simplified from APR hashes): - get, set (also unsets if value null) - count - first, next, this - table functions (simplified from APR tables): - get (takes integer index, returns value) - set (takes integer index and/or value, unsets if value null, assigns and returns index if input index null, both arguments may not be null) - management functions: - create storage block - resize storage block - destroy storage block - create hash in block - destroy hash - create table in block (takes value size and maximum index) - destroy table - set hash memory limit (must be less than block size - total table allocations) - query if provider supports resizing (returns true or false) - query if provider supports atomic table operations - hash functions must be atomic: - as a result, hash functions may not be efficient - callers should expect to handle APR_ENOMEM return codes from most functions - table functions may be non-atomic; in this case: - caller is responsible for ensuring writers either do not conflict or are assigned private indices - readers should not expect atomic reads unless the caller is doing its own locking to ensure atomicity - some storage providers may internally implement table functions using hash functions, in which case table-related management functions are relative no-ops, and table access functions become atomic - management functions: - block creation/deletion functions may be no-ops for some providers - block resizing function may return APR_ENOTIMPL if resizing is not supported - storage blocks may only be usable by calling process/thread and any future children, and not by existing children or external processes/threads - callers should expect to handle APR_ENOMEM return codes from most functions - possible storage providers: - for single-process servers, APR memory pools and wrappers around APR hashes are sufficient: - thread mutexes required to ensure atomicity - hash and table functions may be wrappers around APR hash and table functions - provider can support resizing - for complex distributed systems, a provider might function like Google's Chubby lock server (or a non-caching memcached?): - table functions may be wrappers around hash functions - provider can support resizing - the "intermediate" case is for multi-process/thread servers: - allocate block of shared memory at init time based on requested tables and hash memory limit - provider can not support resizing, so new blocks must be allocated at restart time and old ones destroyed when all children using them have exited - hash functions implemented using APR RMM allocators - use of RMM might require modifications to APR pools and/or hashes, if using them with RMM allocators - alternatively, APR DBM or DBD backends could be used in other providers * Provide a scoreboard/IPC API: [chrisd] - the scoreboard is implemented using table functions: - indexes should contain instance ID + process ID + thread ID - for now, instance ID will be 0; in the future this might allow multiple distributed instances of httpd to share a scoreboard - during pre/check/post-config phases, modules (including MPMs) may indicate if they need IPC space, what type, and how much: - private space in scoreboard table values - additional scoreboard states - private tables - private hashes - core config directives set total table and hash memory limits: - throw errors during post-config if total requested memory by modules exceeds these - if no limits are specified, and storage provider can not be resize allocations, use a "best guess" based on current module requirements * fudge factor - at startup, the master process initializes the storage provider: - MPM sizes scoreboard based on runtime process and thread limits, not compile-time maximums - assigns IDs for additional scoreboard states requested by modules - creates scoreboard state-to-ID hash mappings in regular memory as part of read-only configuration data inherited by children - creates tables and hashes requested by modules - modules may use their private tables and hashes as needed - during restarts, master process: - re-reads config files - determines new storage requirements - if the storage provider does support resizing: - makes request to storage provider for resized block - creates/destroys appropriate hashes and tables - if the storage provider does not support resizing (i.e., default shared memory case): - makes request to storage provider for new block - sets "stale" flag in storage block for previous generation - when all children of an earlier generation have exited, destroy that generation's storage block - all IPC functions could check "stale" flag, and if set, return error code to callers indicating they need to exit - or, alternately, an IPC function could return this flag's state, and callers should be encouraged to use it regularly - possibly, the server should not support graceful restarts, to minimize long-lost child problems? - other possible users of the IPC API: - mod_ssl - mod_proxy - mod_auth_digest