Philosophy ---------- as little work as possible is done on request thread. session destruction/migration is only done when we know that no relevant application threads are running. no session can be timed-out whilst a relevant application thread is in the container. invalidation is done by simply marking a session to be dealt with by the housekeeper. contention on the id:session map will be severe so we must keep it's use to a minimum. migrate - move persistant rep to 'shared' area replicate - store persistant rep to shared 'disaster-recovery' area whole session - replicate whole session to replication area delta - replicate delta to shared area per-attribute - delta might be add/remove/change attribute per-session - serialise session into byte[] and diff (optional compression) - may get better performance doing this on per-attribute basis... a background thread should go around periodically merging diffs into whole session sized storage... would analogue of isModified flag be of any use - an isModified attribute ? - On a per-attrbute basis, this is simply an agreement to call set/remove when a change occurs... Migration vs Replication Migration JVM -> Store -> JVM Replication JVM -> JVM & Store -> JVM & Store we need 2 stores : 1 for migration and 1 for replication do they need to cooperate ? I replicate a session, then evict it. Do I simply throw away the reference because I know it is already replicated ? or do I remove all replicants and migrate it onto e.g. disc. Replication: Object Identity: All sessions single session single attribute Safety: Immediate (attribute-based) Request RequestGroup Packaging Delta Whole Migration: Object Identity: single session all sessions Safety: some factorisable of RequestGroup Packaging whole (could we use deltas?) Combinations: ObjectIdentity | Safety | Packaging | ??? Ideas for Geronimo: every MBean has an election policy default value is EveryNode - i.e. Homogeneous other values are e.g. - OneNodeOnly, RoundRobin etc,NumInstances(2)=any 2 nodes, etc... OneNodeOnly=NumInstances(1)... See FileMigrationPolicy request enters filter with id id table is checked if session present try for Rlock and req proceeds if session not present take Wlock and try to activate - when activated, release Wlock, take Rlock and proceed. so when you look up a session, if it isn't in the main table check a secondary one, for sessions being activated/located etc. if we locate and proxy, do we cache the location - not at first. when search in local store fails take a W lock and try to activate. If another request comes in before activation is complete, it will fall through to secondary table and block before continuing. If a location enquiry comes in, we can search first the local map and then the activation map before responding. If we don't respond within a given time frame the other container will assume the session no longer exists. It would be useful for the session housekeeper to have a garbage collection policy. The default policy would simply remove invalid sessions. Alternative policies might e.g. move them to an analysis area where another program might look at them and figure out why the user lost interest in the web site etc... It looks like, in order to support routing info a la modjk, we will need another interceptor to add/remove it from the session id, as and when required... Where to put lock only facade is looked up by id via impl - so no impl, no facade facade API should ONLY be that of HttpSession lock needs to exist before and after impl so: we need impl placeholders, with no data, just a lock, which can be placed into map as and when needed, and turn into a session, cease to be one as and when also. - nasty, but much more efficient. - how do we do it ? transit table shared by node2node and node2store migrations work on WADI, Jetty, Tomcat, Resin integrations Geronimo/Jetty/Tomcat Clustering/JCluster-replacement High level clustering/election abstractions... distributed gc election protocol broken We can use the Wrapper class to spot when references to stateful sessions are migrated to store or elsewhere in the cluster and somehow coordinate the same movement for ejb tier state. Ultimately, this may not be such a good idea as it will result in cache misses but... If node clocks aren't synched FileMigrationPolicy gcs files at wrong times - ouch ! Replication. Every node holds open a connection to every one of it's buddies, each connection has a logical queue updates are put onto queues queue may be sync/async queue may know how to ellyde deltas/complete-sessions session should know how to write itself into an internal byte[] connection will use NIO to copy this byte[] across connection same byte[] should be shared by all queues if a session is migrated (from storage of another node) to a node it will arrive as such a byte[], which can be immediately replicated to necessary buddies by being put on their queues, etc. A common pattern (IOC) interface LifeCycle { public void start(); public void stop(); public boolean isRunning(); } class Bean implements LifeCycle { ... public void setFoo(Type foo){...} public void setBar(Type foo){...} public void setBaz(Type foo){...} ... } public Type method() { Bean b=new Bean(); b.setFoo(x); b.setBar(y); b.setBaz(z); b.start(); ... b.stop(); } I would like: an xdoclet tag to identify attributes that can only be set when !isRunning(); an aspect which will check an assertion on any setter identified in this way. the same thing with attributes in 1.5 ---------------------------------------- replication. in case of the loss (planned or catastrophic) of a node, all sessions which it hosts must be rehosted immediately or possibly initially reconstituted from backup and passivated into shared store... this revisits the migrate to disc or vm issue? ---------------------------------------- location the result of a location query should return whether the host node is bleeding sessions, if so the session should be migrated off it. ---------------------------------------- PROBLEM. What happens if a request thread arrives whilst we are migrating/(passivating) a session? It waits to acquire an app/read-lock We finish and it acquires lock It continues into the container and asks for the session. It is now too late to proxy it elsewhere, so we will have to migrate the session in underneath it. NO GOOD ! before we release the container/write-lock, we need to somehow interrupt all the waiting threads and force them to try to look up the session again. They will find it is not there and have to consider alternatives such as redirection, proxying and migration/activation. ---------------------------------------- shutting down We could evict all sessions to shared store, whence to be loaded by other nodes. We could divide sessions equally across cluster and migrate concurrently to all nodes. Bear in mind how this situation will be affected if every node/basket/session has a pair of buddies... ---------------------------------------- Refactoring The location query should really return a client side proxy for the session-manager/bucket containing the session in question. That it is still contained should be ascertained during the resulting two-way conversation. So maybe the return packet should just contain a serialised proxy ?, or simply a string from which we can construct said proxy on the other side ? advantages to string - no serialversionuid, smaller... ---------------------------------------- Servlets & Filters may be marked as 'stateless' - i.e. no session interaction. Non-container code running in a stateless component may not read/write the session. Attempts to do so will be met with an Exception. How do we handle container code ? setLastAccessedTime will/won't be called ? If called - does nothing. With this optimisation in place, we can avoid locking the session table (hopefully) and certainly any unecessary replication or contention when serving static resources from e.g. the DefaultServlet etc. ---------------------------------------- inbound/outbound lock tables.... refactoring local/activating lock tables.... should be replaced with local/inbound/outbound. whilst sessions are being migrated into a container (from store or another node) they take an inbound lock, whilst bein migrated out of a container (to store or another container) they take an outbound lock. I think we probably need the same arrangement in stores. Try (but maybe too slow...) : requests should : take read lock on outbound table for session id - now no-one can move their session off node take read lock on local table for session id - now no-one can garbage collect their session check if session is local if so is it outbound (wait for outbound lock, then check local again - maybe migration failed) if not take inbound readlock is session inbound wait else release inbound readlock, take inbound writelock and insert locked inbound lock try to load session release all locks maybe we don't need the outbound table - the w lock in the session would be locked - we wait for it and then try local map again - phew ! ---------------------------------------- do we really need to load a session back off disc and activate all it's attributes before destroying it? the attributes were passivated as we saved it - do they need to be activated when they are timed out ? ---------------------------------------- redirect/cookie/Jetty problem. If I get hold of the session cookie, rewrite its routing info and redirect back to the lb, then browser receives: Cookie: JSESSIONID=C82EA2883D12C57D1B1F47BE85E8FF71.web0; JSESSIONID=C82EA2883D12C57D1B1F47BE85E8FF71.web1 see headers.txt, instead of just one cookie - it looks like Jetty adds the web1 cookie after I have added the web0 one - why ? Then the browser, firebird, figures out the path for each of them differently ! the first gets /wadi/jsp, the second /wadi... ? when I logged their paths in Jetty, both were 'null' ----------------------------------------= eviction lock need not take any locks on first pass migration could remove session from table then reinsert on fail Commands could be nested into two-sided conversations. A whole conversation could be sent from one node then progressively unpacked and as passed backwards and forwards each command has a commit and rollback method. ----------------------------------------= two types of affinity ? (1) memory-based lb actually remembers where last request for a given session was processed and sends subsequent requests for same session to same place. with this type of lb we cannot redirect a request to another node via its routing info - see (2), but we could proxy it. However this should only be done in emergencies as the lb will still think that the session was available on the node that it sent the request to and will continue to route relevent requests since we it is awkward to relocate requests with this lb we shouldhere. Therefore we should choose policies that will relocate the session instead, knowing that the lb will remember where it last saw it. NB - memory is state and a tier of these lbs will need to replicate such state to each other. (2) routing-based lb holds no memory/state, but this is encoded as routing info on the end if the session cookie/param. This is advantageous since no state needs to be replicated across lb tier, but adds extra complexity in web tier, since if a session relocates its session cookie must be adjusted. If we understand how the routing info is encoded then we can use request relocation (cheaper than session relocation). This means that we can actually relocate sessions to where we want them, rather than where the lb expects them, then relocate initial requests to them. Subsequent requests will thenceforth be delivered to correct new location. node shutdown... we should avoid writing to long term store if we can, since it is a single point of contention and failure. under a routing-based lb, a node should share out its active sessions between remaining nodes up to their capacity, then write remaining sessions into long term storage. The lb will have no idea where to deliver requests for these sessions, since the node encoded in the routing info is inactive, so it will drop them anywhere on the cluster. The node receiving the request will relocate it to the sessions new owner. worst case scenario cost for a sbsequent incoming request will be: either a single request relocation, or a single session disc-node migration. under a memory-based lb a node has a choice :-( a) do as for a routing-based lb. worst case scenario cost will be: either one node/store-node session relocation or migrate all sessions to store where WCSC becomes one store-node session relocation so if we want to avoid use of store it looks as if both lb strategies should share their sessions out with remaining nodes. At least some requests will fall on correct node and if they fall on wrong node cost will be same or less than load from store. There is another way an lb can implement stickiness - by intrusive cookie that records where last successful request went, this can be considered to be the same as routing-based affinity except that the routing info is encoded in its own cookie - WADI should consider how to do this. in summary: - node is shutting down - takes exclusive lock on session map - multicasts to all other nodes enquiring about location and capacity - share out active sessions between answering nodes up to their capacity (all nodes should answer as this will be quicker than waiting for a timeout - we have to wait anyway if we don't know how many nodes there are) - release lock or shutdown whilst still holding it - browsers will reissue unanswered requests. issues. node A shuts down passing node B sessions that fill it to capacity node A puts remaining sessions in store node B receives a request that requires that it load one of As sessions from store but it has no capacity... it can temporarily override max-size it can forcibly evict another session to make space it can look for another that has space and proxy the request to it - but will end up being a long-term go-between in this conversation - not a good idea. ---------------------------------------- does identity need to live in a session ? does validity need to live in a session - is an invalid session simply not one that doesn't exist? ---------------------------------------- now that facade is set in HttpSessionImpl.readObject ensure that it is not being done unecessarily elsewhere. ---------------------------------------- we need a good joinpoint to hang session creation and destruction related aspects from - how about a session commission/decommission method pair. ---------------------------------------- wad-web.xml is written (currently) in Jetty-ese, because I anticipate the internal APIs changing frequently for a while and Jetty-ese is more flexible in this respectthan Digester. I understand that this ties the Tomcat integration to Jetty code. Too bad :-) its probably only a temporary situation. ---------------------------------------- The PassivationStrategy should probably live inside the EvictionPolicy ? Their naming convention should be rationalised. ----------------------------------------- wadi-web.xml needs FAQ-ing ----------------------------------------- What happens when Jetty or Tomcat are run via JMX :-) ----------------------------------------- for Tomcat (maybe Jetty as well) if get() comes in from above filter, we know it is request thread and maybe first time filter will wrap request and put aspect around get so that it arrives with ref to req object in threadlocal req obj will have field for current session req objects session methods will be aspected to keep this up to date so any session fetch/create/invalidate will update relevant request obj. on leaving through filter current session will be picked out of request wrapper and unlocked... In Jetty, we may not need to wrap request as every get/create will have req passed as param... what about invalidate ?, how do we ensure that req is kept up to date... invalidation goes through manager which must notify relevant request wrappers ? -----------------------------------------