Testing Apache Slider¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Functional Tests¶
The functional test suite is designed to test slider against a live cluster.
For these to work you MUST have
- A YARN Cluster -secure or insecure
- A
slider-client.xml
file configured to interact with the cluster - Agent tests and Accumulo Agent tests: nothing additional
- (deprecated) HBase provider tests: HBase aa
.tar.gz
uploaded to HDFS, and a local or remote hbase conf directory - (deprecated) Accumulo provider tests: Accumulo
.tar.gz
uploaded to HDFS, and a local or remote accumulo conf directory
Configuration of functional tests¶
Maven MUST have to be given
- A path to the expanded test archive
- A path to a slider configuration directory for the cluster
The path for the expanded test is automatically calculated as being the directory under
..\slider-assembly\target
where an untarred slider distribution can be found.
If it is not present, the tests will fail
The path to the configuration directory must be supplied in the property
slider.conf.dir
which can be set on the command line
mvn clean verify -Dslider.conf.dir=src/test/clusters/sandbox/slider
It can also be set in the (optional) file slider-funtest/build.properties
:
slider.conf.dir=src/test/clusters/sandbox/slider
This file is loaded whenever a slider build or test run takes place
Configuration of slider-client.xml
¶
The slider-client.xml
must have extra configuration options for both the HBase and
Accumulo tests, as well as a common set for actually talking to a YARN cluster.
How to pick up core-site.xml
, hdfs-site.xml
and yarn-site.xml
values¶
The slider-client.xml
file can declare a HADOOP_CONF_DIR
value
for use by both the test runner AND the command line bin\slider
application
invoked.
<property> <name>HADOOP_CONF_DIR</name> <value>/home/tester/sites/production/hadoop-conf</value> </property>
If set:
- The standard
-site.xml
files are loaded by the JUnit test runner, to bond the test classes to the YARN cluster. - The property is used to set the environment variable
HADOOP_CONF_DIR
before thebin/slider
orbin\slider.py
script is executed.
Note 1: a path can be set relative to ${SLIDER_CONF_DIR}
<property> <name>HADOOP_CONF_DIR</name> <value>${SLIDER_CONF_DIR}/../hadoop-conf</value> </property>
or, on Windows
<property> <name>HADOOP_CONF_DIR</name> <value>${SLIDER_CONF_DIR}\..\hadoop-conf</value> </property>
Note 2: To test on the local cluster, use either an absolute path,
or refer to the environment variable env.HADOOP_CONF_DIR
:
<property> <name>HADOOP_CONF_DIR</name> <value>${env.HADOOP_CONF_DIR}</value> </property>
Be aware that expansion of the env.
environment variables and SLIDER_CONF_DIR
are only performed on the test runs. If the same slider-client.xml
value
is used directly from the CLI, the environment variable HADOOP_CONF_DIR
must be set to the absolute path of the hadoop configuration directory.
How to validate the configuration¶
A test case executing the slider diagnostics
command can be used
to print out the configuration as seen by the bin/slider
process
mvn integration-test \ -Dslider.conf.dir=/home/tester/sites/production/slider \ -Dit.test=DiagnosticsCommandIT
Disabling the functional tests entirely¶
All functional tests which require a live YARN cluster
are run during the integration-test phase. They are executed with the commands
mvn verify
or mvn integration-test
.
If you do not wish to run the functional tests, simply use the mvn package
command
and only those tests which do not require a live YARN cluster will run.
Non-mandatory options¶
The following test options may be added to slider-client.xml
if the defaults
need to be changed
<property> <name>slider.test.thaw.wait.seconds</name> <description>Time to wait in seconds for a thaw to result in a running AM</description> <value>60000</value> </property> <property> <name>slider.test.freeze.wait.seconds</name> <description>Time to wait in seconds for a freeze to halt the cluster</description> <value>60000</value> </property> <property> <name>slider.test.timeout.millisec</name> <description> Time out in milliseconds before a test is considered to have failed. There are some maven properties which also define limits and may need adjusting </description> <value>180000</value> </property> <property> <name>slider.test.yarn.ram</name> <description>Size in MB to ask for containers</description> <value>192</value> </property> <property> <name>slider.test.agent.enabled</name> <description>Flag to enable/disable Agent tests</description> <value>true</value> </property> <property> <name>slider.test.windows.cluster</name> <description>Flag to indicate that the test cluster is windows. This defaults to the same OS as the host running the tests </description> <value>false</value> </property> <property> <name>slider.test.am.restart.time</name> <description>Time in millis to await an AM restart</description> <value>60000</value> </property>
Note that while the same properties need to be set in
slider-core/src/test/resources/slider-client.xml
, those tests take a file in the local
filesystem -here a URI to a path visible across all nodes in the cluster are required
the tests do not copy the .tar/.tar.gz files over. The application configuration
directories may be local or remote -they are copied into the .slider
directory
during cluster creation.
Provider-specific parameters¶
An individual provider can pick up settings from their own
src/test/resources/slider-client.xml
file, or the one in slider-core
.
We strongly advice placing all the values in the slider-core
file.
- All uncertainty about which file is picked up on the class path first goes away
- There's one place to keep all the configuration values in sync.
Agent Tests¶
Agent tests are executed through the following mvn command executed at slider/slider-funtest:
cd slider-funtest mvn verify -Dslider.conf.dir=../src/test/clusters/remote/slider -Dit.test=AppsThroughAgentIT -DfailIfNoTests=false
Enable/Execute the tests
To enable the test ensure that slider.test.agent.enabled is set to true.
<property> <name>slider.test.agent.enabled</name> <description>Flag to enable/disable Agent tests</description> <value>true</value> </property>
Test setup
Edit config file src/test/clusters/remote/slider/slider-client.xml
or
your chosen equivalent and ensure that the host names are accurate for the test cluster.
User setup
Ensure that the user running the test, is present on the cluster against which you are running the tests. The user must be a member of the hadoop group.
E.g:
adduser testuser -d /home/testuser -G hadoop -m
HDFS Setup
Set up hdfs folders for test user
su hdfs hdfs dfs -mkdir /user/testuser hdfs dfs -chown testuser:hdfs /user/testuser
Configuring the YARN cluster for tests¶
Here are the configuration options we use in yarn-site.xml
for testing:
These tell YARN to ignore memory requirements in allocating VMs, and to keep the log files around after an application run.
<property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1</value> </property> <property> <description>Whether physical memory limits will be enforced for containers. </description> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <!-- we really don't want checking here--> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <!-- how long after a failure to see what is left in the directory--> <property> <name>yarn.nodemanager.delete.debug-delay-sec</name> <value>60000</value> </property> <!--ten seconds before the process gets a -9 --> <property> <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name> <value>30000</value> </property> <!-- registry--> <property> <name>hadoop.registry.rm.enabled</name> <value>true</value> </property> <property> <name>hadoop.registry.zk.quorum</name> <value>${hostname}:2181</value> </property>
Testing against a secure cluster¶
To test against a secure cluster
slider-client.xml
must be configured as per Security.- the client must have the kerberos tokens issued so that the user running the tests has access to HDFS and YARN.
If there are problems authenticating (including the cluster being offline) the tests appear to hang
Validating the configuration¶
mvn clean verify -Dit.test=BuildSetupIT
Parallel execution¶
Attempts to run test cases in parallel failed —even with a configuration to run methods in a class sequentially, but separate classes independently.
Even after identifying and eliminating some unintended sharing of static mutable variables, trying to run test cases in parallel seemed to hang tests and produce timeouts.
For this reason parallel tests have been disabled. To accelerate test runs through parallelization, run different tests on different hosts instead.
Other constraints¶
- Port assignments SHOULD NOT be fixed, as this will cause clusters to fail if there are too many instances of a role on a same host, or if other tests are using the same port.
- If a test does need to fix a port, it MUST be for a single instance of a role,
and it must be different from all others. The assignment should be set in
org.apache.slider.funtest.itest.PortAssignments
so as to ensure uniqueness over time. Otherwise: use the value of0
to allow the OS to assign free ports on demand.
Test Requirements¶
- Test cases should be written so that each class works with exactly one Slider-deployed cluster
- Every test MUST have its own cluster name -preferably derived from the classname.
- This cluster should be deployed in an
@BeforeClass
method. - The
@AfterClass
method MUST tear this cluster down. - Tests must skip their execution if functional tests -or the specific hbase or accumulo categories- are disabled.
- Tests within the suite (i.e. class) must be designed to be independent -to work irrespectively of the ordering of other tests.
Running and debugging the functional tests.¶
The functional tests all
-
In the root
slider
directory, build a complete Slider releasemvn install -DskipTests
-
Start the YARN cluster/set up proxies to connect to it, etc.
-
In the
slider-funtest
dir, run the testsmvn clean verify
Slider does not need mvn install
to be run before executing the
functional tests. In fact, this may interfere with the tests picking
up the most recent changes to the code. However, if you want to run tests
for an individual module instead of running all Slider tests at once, you
will need to install first. Make sure to install from the top level Slider
directory so that all code changes are included. To run all functional
tests, simply run mvn clean verify
at the top level.
If you are testing an individual module, and you want to
propagate changes in slider-core through to the funtest classes for
testing, you must build/install all the slider packages from the root assembly.
A common mistake during development is to rebuild the slider-core
JARs
then the slider-funtest
tests without rebuilding the slider-assembly
.
In this situation, the tests are in sync with the latest build of the code
-including any bug fixes- but the scripts executed by those tests are
of a previous build of slider-core.jar
. As a result, the fixes are not picked
up.
Limitations of slider-funtest¶
- All tests run from a single client -workload can't scale
- Output from failed AM and containers aren't collected
Troubleshooting the functional tests¶
-
If application instances fail to come up as there are still outstanding requests, it means that YARN didn't have the RAM/cores to spare for the number of containers. Edit the
slider.test.yarn.ram
to make it smaller. -
If you are testing in a local VM and stops responding, it'll have been swapped out to RAM. Rebooting can help, but for a long term fix go through all the Apache Hadoop configurations (HDFS, YARN, Zookeeper) and set their heaps to smaller numbers, like 256M each. Also: turn off unused services (hcat, oozie, webHDFS)
-
The YARN UI will list the cluster launches -look for the one with a name close to the test and view its logs
-
Container logs will appear "elsewhere". The log lists the containers used -you may be able to track the logs down from the specific nodes.
-
If you browse the filesystem, look for the specific test clusters in
~/.slider/cluster/$testname
-
If you are using a secure cluster, make sure that the clocks are synchronized, and that you have a current token -
klist
will tell you this. In a VM: install and enablentp
, consider rebooting if ther are any problems. Check also that it has the same time zone settings as the host OS.
running a single integration test¶
You can run a single integration test (or a pattern of tests) by setting the
it.test
property
For example, to run all Agent IT tests:
mvn integration-test -Dslider.conf.dir=${your-config-dir} -Dit.test=Agent\*IT
Accumulo configuration options¶
Optional parameters
<property> <name>slider.test.accumulo.launch.wait.seconds</name> <description>Time to wait in seconds for Accumulo to start</description> <value>1800</value> </property>