Testing (again)

Testing (again) Steve Loughran Where we are today? We are now at the second revision of the Axis test framework. The original design had all classes in a package under java/test, built them in one big <javac> and then executed them. While it ran the tests, it was not that flexible. It was hard to run individual tests, and it was hard to maintain. A major effort by Matt Siebert refactored the test process to address these. The revision attempted to address this with a modular design, based on separate Ant build files for each test package, using common XML entity references to share build file fragments between the files. This gave us isolated builds of each subcomponents, the ability to build and run tests on their own, and the flexiblity to have very different build processes for each of the tests. The many build files compile the source, all of java/test/*.java into build/classes, putting them into the hierarchy build/classes/test. Test packages which have special dependencies can make their builds conditional, so only those tests for which we have support get compiled down. This architecture makes it easy to add a new test package, without having to edit shared build files. We have a separation between "unit tests" and "functional" tests; the latter includes the interop and attachment tests. There are separate targets in build test to build them, as the choice of which tests to execute is primarily driven by the compilation process. Nearly all the tests in the class tree will get executed, so to select which tests to run, you control which tests get built. This is a simple way of letting the test package-specific build files control which test to run. The Tests The many build files compile the source, all of java/test/*.java into build/classes, putting them into the hierarchy build/classes/test. Test packages which have special dependencies can make their builds conditional, so only those tests for which we have support get compiled down. We have a separation between "unit tests" and "functional" tests; the latter includes the interop and attachment tests. There are separate targets in build test to build them, as the choice of which tests to execute is primarily driven by the compilation process. Nearly all the tests in the class tree will get executed, so to select which tests to run, you control which tests get built. This is a simple way of letting the test package-specific build files control which test to run. WSDL A core component of many of the tests is generating Java source from WSDL files, both local test case WSDL and remote interop WSDL. The latter introduces a requirement to be on line, and on-line through a transparent firewall -we dont look after proxy settings enough to run behind a firewall whose sole net access is via a proxy 80. This is somewhat ironic, given that such a facility is the selling point of the transport-stack-atop-port-80 that is the SOAP and WS-* specification suite. As well as lacking off-line support for generating WSDL, we cant (obviously) run the interop tests without a network connection. This means that when a remote interop server goes down, the build fails. Execution After compiling all the code down, we run the tests. This is done by batch JUnit execution of all the test suites in all the packages with a PackageTests.class class in their package (i.e. all of build/classes/**/PackageTests.class). Functional tests are all of **/FunctionalTests.class and **/*TestCase.class; the latter are those test cases which are auto-created by the Wsdl2Java routine, often with manual editing to make the test complete. When the tests need a functional servlet engine to host the web services, we bring up the simple axis server; a minimal implementation of the servlet API that omits the production-quality aspects of a web server, including JSP support. The <runaxisfunctionaltests> task starts and stops the server, using an execution process borrowed from Cactus: we supply the task with the target names of the start and stop targets, and the task executes them before and after running all the functional tests. Result Processing In a Gump build, the build stops after the first failure, and the team notified. The property test.functional.fail sets the haltonfailure attribute of the <junit> task; set it to true and the test suite runs all tests before completing. Either way, the create-test-report target will, if Xalan or other XSLT engine is present, convert the XML reports of the test run into an HTML report, package by package. What do we want from a test suite? Basic Improvements to the current status quo All the tests to pass :) Faster tests Scalability: easy to add new tests Offline support, and robustness against unavailable interop servers. Functional testing on production app servers If we look at a lot of bugreps they are related to config and operations on app servers. "Weblogic doesnt save server-config.wsdd", "SunOne server has the wrong classpath", "Jboss.net won't compile .JWS pages that import javax.servlet.*", etc, etc. We need to run more tests on production systems, rather than wait for user feedback after we ship. Now everybody runs their apps on some such systems, so we have implicit testing, but it is not part of the daily Gump or any other regular, controlled test process. We could modify the test system so that instead of starting the SimpleAxisServer servlet routine, we can deploy to a local web server or app server. This would verify that the core test suite runs on different systems. Test more than SOAP We need more tests to validate the configuration; extending the httpunit tests to have more tests of not-quite-SOAP requests. What happens when the server gets less than they were promised? What about more than promised? near-infinite recursive XML? xsd:import statements in the XML? What happens when a client starts parsing from a socket that doesnt close its connection, or lies about the amount it is sending? These are the security and robustness categories we aren't testing for today. Automated invocation of compliance test suites: JAX-RPC TCK, WS-I Basic We have one test suite, JAX-RPC, that is only available under restricted conditions. We need someone with access to the test suite to run it. Understandig that Interop servers are regularly unavailable If Axis depends on everyone's interop server to be present, then we have a global build system that breaks every time somebody in their machine off -"the light switch in belgium problem". This is too brittle. We need to cache the external WSDL in CVS, then probe the servers to see if that has changed, downloading it only if it is different. It would be nice to only regenerate the java proxy classes from the WSDL when such a change has occurred. Load testing What happens to the system under load? This is very dependent upon the app server; having tests running on the production server is a first step to this. Traditional load testing has N clients each making M requests, for N*M simultaneous requests. The facilities for individuals to perform aggressive load tests are limited, but there is strength in numbers; many Axis developers could have their test systems sychronised to test an externally visible server together. This co-ordination could be though a P2P infrastructure, such as JXTA or Chord, but as we are not trying to design a stealth DDoS system, we could do it client-server with a (separate) SOAP service choreographing the clients. This would seem to be a long term project. Performance testing This is related to load testing, but doesnt need as many clients. We need to know how long it takes for actions to be executed, and we need to log this over time so that major regressions can get picked up. If on the daily run one test takes 50% longer than usual, it is immediately obvious that one of the day's changes caused the problem. If the performance of the system doesn't get looked at till the next version goes into RC phase, performance slippages get missed, and even institutionalised. Coverage Analysis We should be able to use quilt (http://quilt.sourceforge.net/) to do this today. As with performance tests, tracking coverage changes over time is informative; it shows whether things are improving or getting worse. Local interop testing with non-Axis clients and servers We already have some examples of .net client tests against axis, with an Ant build but sadly no automatic .NUnit test case generation. We can also build axis clients against .Net and other servers, where we can create JUnit stubs for ease of test generation. If such tests were part of the main test suite, then suitably equipped systems could run our own interop tests. These would be an extension of the SoapBuilders main suite. Here we"d want to verify that fixes worked and continued to work (e.g. the .NET1.0 empty array bugfix). We can also add stuff that isn't in the SoapBuilders: Cookie Sessions, Http Header sessions, is-date-in-the-right-TZ-at-the-far-end tests, and so on. There are logistical/tactical and strategic arguments against doing this. Logistical: even for the example of one client platform is daunting; we don't want to expose a .NET server to the internet for anyone to hit it, so the tests will only run on localhost when localhost=windows, which excludes the Gump builds. The strategic argument is that the combinatorial explosion of local interop testing against multiple clients and servers is too big; that is what the SoapBuilders are for. Either we focus on one or two key platforms to interop test against -.net and MSSTK, or we raise the problem back to SoapBuilders. What would we want from SoapBuilders, to help our regression and interop problems? I'd argue for extra tests, above and beyond the formal "rounds", wherever someone has an interop issue. We should be able to announce that we have a problem and the URL of a test endpoint, and everyone can add it to the things we can test against. Similarly, other platforms should not just fix things, they should provide means for outsiders to test the system. Glen Daniels has proposed a pattern-matching server that waits for expected wire traces, and generates preprogrammed responses, simulating part of the behaviour of a foreign server. This has the advantage of being standalone, but with the disadvantage of not being as thorough as a live test. You also have the challenge of keeping the datasets up to date. Wiretrace logging in the test case results This is just an extra little feature for diagnosis: can we record the wire traces of sent and received messages, and whenever we get a test failure, save the results to the JUnit log for an easier post-mortem. Just a thought :) Ease of learning, installation, use We are an open source project where anyone can download the source, build it and run the tests. Therefore the test framework must be easy to run, easy to work with, and easy to maintain by people other than the original authors. We also want to keep effort minimised by re-using as much as possible of other OSS projects. Options Here are some of the things we can do Nothing Leave things as they are. Maybe move to Ant1.6 alpha builds to get better memory management, the faster build.xml parser and the refactored .NET tasks, or just up to 1.5.3/1.5.4 to get the memory leak fix. Costs: nothing at first; a gradual increase in longer term costs. Improve build.xml reuse We don't necessarily need a separate build file in every test package. Instead we can have a properties file in there that sets well known properties package=test/example conditions=httpunit.present online=false needsserver=true functional=true This can be read in by something (ant or custom java) and used to control the build. Reading it into pure ant (i.e. without writing our own task) would be tricky as condition expressions are tricky. An XML description might be better, and could be XSLT'd into the build files. Caching WSDL Generation This is a trick we can do with any of these. Write a new <axis-importwsdl> task that implements dependency awareness around the inner generation process. We could go one step further and integrate dependency logic into the generation process, but as that is more fundamental, I am a bit leery of that. caches the results of the fetch. uses the if-modified-since tag to conditionally retrieve content even if content is returned, compares it byte-for-byte against the cached copy only imports the wsdl if the wsdl file is newer than a timestamp marker file in the generated dir (and a force option is false) if the server is unreachable, but the cached copy is present, don't fail the build, just set a property saying the server isn't there and continue with the WSDL generation. if the server is unreachable and the cached copy isn't there, only fail the build if some attribute is set, otherwise a wsdlnotpresent property is set We could do this over a fairly convoluted set of ant 1.6 tasks, but only if the httptasks in the Ant sandbox were pulled in for more graceful handling of missing servers. Pulling it in to one Axis task gives us more control, ant1.5 support and wouldn't be that hard to implement. Write our own test harness This deserves a mention: could we do our own complete test harness? Why? is the reponse. What would that add? In theory, having our own hosting app would let us run tests differently from core JUnit, doing more SOAP related things like post XML and expect responses. Return to being JUnit-centric The advantage of an ant-centric test system is that we can use Ant to build stuff during testing. The disadvantage is in complexity and time in running the tests. Is there a better compromise? Maybe. It is possible to run Ant from inside JUnit. This is how Ant runs its many self tests; by invoking Ant from JUnit itself. If we put JUnit in charge, then those tests that do need a complex Ant-based test system can call Ant to aid it, the rest run straight from JUnit. We may be able to take advantage of this by categorising tests into various patterns that we can build and run differently. Pure unit tests that compile and run without needing any server at all WSDL-importing tests that need to import WSDL and generate code before the tests run Local functional tests that run against an instance of the Axis running on a servlet engine Local functional tests that only run on a full J2EE app server Remote interop tests Clearly this stuff is not 100% exclusive: a lot of the local functional tests generate WSDL first, as do all the interop tests. And there are other flags which will include/exclude test items: the presence/absence of needed libraries, such as attachment support, and an online/offline flag to distinguish tests that need a full internet connection, from those that don't. All the interop tests are online, but so are a few of the others. In a JUnit-centric world, first the local unit tests would get built and run, all in in a couple of big <javac> and <junit> task. Then the WSDL import process can take place, using something like the new dependency-aware wsdl import task proposed earlier.