Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ============================== Next actions ============================== - Membership coordinator logic and membership view Id. - Implement primary/secondary replication logic (fhanik) Idea for implementation: Cookie stores info primary/backup node and hit counter. Steps to implement: (coming soon) Scenarios: 1. Request in: new session dest: any 2. Request in: existing session dest: primary server 3. Request in: existing session dest: non-primary server (fail over) 4. Request in: existing session dest: backup server 5. Backup node crash 6. Primary node crash (step 3) 7. Sequence 6. then 2. (hit counter) 8. Sequence 5. then 4. (swap) 9. New member joins 10. Backup and primary crash -> death! 11. Sequence 2. in: invalid hit counter 12. Sequence 4. in: invalid hit counter 13. Session expires 14. Sequence 3. then 13. on the old primary Notes: Ability to add a custom "Session expired" redirect for global redirects to never forward an invalid session into the server when the session data has gone lost. Ie, in a normal scenario, the session is viewed as "new" but they may wanna display a session expired redirect. Security: Dont store any sensitive info in the new cookie. only pointers to members. - break out membership/messaging module and improve JMX The clustering code has become very cluttered. We need to break out the membership and the messaging away from the logic of session handling and JMX. The latter should be built on top of the clustering module, and not baked into it. (maybe complete refactor for Tomcat 6) Steps: 1. Build a group object, to allow sending of messages The group object will handle encompass membership and messaging under one umbrella. 2. Enable group RPC Through the group object, enable RPC style messaging This should be used for message like state transfers, instead of waiting for a timeout. RCP strategies - first response, majority response, all response 3. Group/Message interceptors - to perform logic such as fragmentation, ordering etc. 4. JMX JMX should be added on top of the core code, not inside the core code. The idea of JMX as management should be added outside to the core components. For example, SimpleTCPCluster should not contain JMX code, if I want to create a different cluster class, I have to rewrite all the JMX code again. Instead, create JMX beans that reference core components through interfaces. - Create stats object Instead of keeping stats embedded in the code, make it reusable in a shared component. - support CrossContext session replication Idea: use endAccess to signal replicationValve that session from other app has changed. refactor ReplicationValve When session is serialzed from other app, set right app classloader endaccess is also send at CoyoteAdpater for context session as recycle the request - Add MbeanFactory to generate dynamic cluster at runtime. Problem: How we can start those central services? - StandardEngine support load Mbean from external file. - when a lot of messages expire it comes to burst of messages - all 60 Sec when ManagerBase#processExpires is called a lot of messages are send! - Better is to transfer a spezial epxire message with an array of expired session messages. - This reduce message transfer and reduce waits for acks. - complicated implementation thing , sessions expires when call isValid :-( - build a tool that receive the stage and present it as simple web app. detect some problems different active sessions long queues detect long wait acks Idea: Wrote a ant script with the new jmx tasks! Optimzied information access: Create a MBean.attribute that store the complete cluster state information inside one attribute from type TabularData and CompositeData! Implement as SimpleTcpCluster operation Cluster -- Attributes -- ClusterMembership -- Attributes -- Members -- Attributes -- ClusterReceiver -- Attributes -- ClusterSender -- Attributes -- DataSenders -- Sender -- Attributes -- (optional)Queue Stats -- Attributes - add cluster setup template (src) - documentation wrote a complete new how-to add example configurations add complete attribute descriptions add JMX information add deployment help Need help (pero) - implement fragmentation of large replication objects (note this will be done in the Group object, see top.) Compress at message level - this will go away! instead there will be a GZip interceptor Splitting Messages ala FarmDeployer war handling - add a message type to the message header. - filtering at receiver that drop message before build Object - define short type definition. - Every Session message with differenz type - type with String - add test cluster project functional testing a lot of szenarios differen replication mode restart after failure crash under load compress and uncompressed Junit test ( started) automated regression testing with some standard configs wrote a test client that can get a JMX State to verify the different cluster testszenarios - direct JSR 160 client - preferred option - via mx4j http adaptor ( xml protocol) - setup different szenarios use new remote jmx ant tasks to grab information from mbeans - filter all cluster messages - Filtering that no all message send to all member. - Now with domain mode registeration session message only send to members from same domain. - setup a Cluster LifecycleListener to send the cluster status! (monitoring) - Make the compact state via JMX API avialable first. - setup cluster Listener that send own state to spezial member (sender from a message like GET_ALL_SESSIONS) - create a cluster status app (html/xml) - attributes - send message - current members - receive message - active session at different clustered manager - which app are clusters (registered managers) - avg processing times - operations - resetStats - send a message (String) - getDisplay state from other members?? - stop queue to send - queue receiver message from some members ( later send after redeploying) - configure some send parameters - keepalive - compress - wait for ack - other things - based on Cluster JMX API - watch some values from complete cluster and display some graphs - display the informations from all nodes - display informations from other cluster domains via XML documents and http - display stats as xml - operation via JMX (MX4J adaptor) ================ problems ================ - How we can stop the request traffic when restart an application? currently the jk 1.2.10 can only disable the complete loadbalancer, but this detect only the new session request desicion. Request with sessions marks send to tomcat. Fix: jk > 1.2.11 has a stopped flag, but then all application stop traffic and session transfer from other nodes not stopped!! Stop MembershipService via JMX and after redeploying application, start MembershipService and restart the application Session Manager or send GETALL Session from backup? then enable apache worker again !! :-( easier stop tomcats, deploy new app and start again... - Can't stop message replication for a spezial member and application - this need a spezial cluster message and send filter at SimpleTcpCluster - Don't generate cluster message when no member is at cluster! - Register DeltaManager as Cluster LifecycleListener and stop cresting and sending - Reduce memory consume when only local node is active - Important feature when nodes crashed, and only one server exists under load... ============================== Nice to have: ============================== - Replication ContextAttributes - Configure the McastService no accept every member. - receive a secret key - have a allow or deny list like RequestFilterValve - Also receiver don't accept request from not allowed members - ReplicationListener and SocketReplicationListener only accept data from cluster member (low level ip restriction) - PooledSocketSender Add more stats check all Pooled sender checkKeepAlive - Implement a NonSerializable interface for session attributes that do not wish to be replicated Then we must have ClusterNonSerializable at common classloader - Extend StandardSession if possible - Implement context attribute replication (?) pero: Also send Start/Stop messages from Context to complete cluster! With 5.5.10 you can wrote a Cluster Lifecycle Manager that do this. Register for AFTER_MANAGERREGISTER_EVENT and AFTER_MANAGERUNREGISTER_EVENT and also to the context ServletContextAttributeEvent Listener Access the Context ((Manager)event.getData()).getContainer() - Fix farm deployment for 5.5 pero: Every start all application are deployed only to all running cluster nodes New registered nodes don't get the applications! Deploy must send a GET_ALL_APP to all other Deployers. FH: Correction, you should not send GET_ALL_APP to all deployers, only to the main one. Which could be another property of the Member object, it would not make sense to transfer the same webapp over and over again. Only watchEnabled Deployer send this member all deployed application. pero: Yes very true, but currently we distribute also all wars from watch node at begining. Waiting to start other nodes is only change to not got these war's! I have made some experiments to register war deployment at new memberAdded to cluster. Add JMX Support Resend Deployed Applications to all or one cluster node. Show all watch Resource Processing Time Change fileMessage Buffersize. Start/Stop Cluster wide application Deployer and Watcher sync with engine background thread! Fixed! Last FileMessage fragment need longe ackTimeout - Change the cluster protocol that developer can add there own data serialzable/deserialzable format (high risk) Note from Filip: Lets not do this, last attempt caused a broken clustering piece for 4 versions. Currently header 6 bytes (FLT2002) compressflag 4 byte data.length 4 bytes data, end header 6 bytes (TLF2003) Optimized to header 2 bytes (TC) type 1 byte compressflag 1 byte data.length 4 bytes, data | data "type" means user defined type and receiver extract bytes and type and sende it to callback s. ObjectReader or SocketObjectReader compress 1 first data 4 data bytes are the real uncompressed data length. ( Is for better memory management atr recevier side, S. XByteBuffer) change at DataSender.writeData and XByteBuffer and add flexible handling to ClusterSender and ClusterReceiver - Add single sign on support ============================== COMPLETED ============================== 5.5.16 (fhanik) - Extensive logging when a member crashes When a member crashes and you use pooled socket mode then the logs are spewing out info. Instead, now we only report one IOException if it happens, and mark the member as suspect. 5.5.16 (fhanik) - Average message size For my test application, the average message size has increased from 526 bytes per message to over 750 bytes per message from version 5.0.28 to 5.5.15. Need to investigate why this is the case and where the 200+ extra bytes are coming from. Fix: The default compress value had changed. The new default is that no compression is used. To re-enable compression,