Testing Apache Slider

 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
  NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
  "OPTIONAL" in this document are to be interpreted as described in
  RFC 2119.

Functional Tests

The functional test suite is designed to test slider against a live cluster.

For these to work you MUST have

  1. A YARN Cluster -secure or insecure
  2. A slider-client.xml file configured to interact with the cluster
  3. Agent tests and Accumulo Agent tests: nothing additional
  4. (deprecated) HBase provider tests: HBase aa.tar.gz uploaded to HDFS, and a local or remote hbase conf directory
  5. (deprecated) Accumulo provider tests: Accumulo .tar.gz uploaded to HDFS, and a local or remote accumulo conf directory

Configuration of functional tests

Maven MUST have to be given

  1. A path to the expanded test archive
  2. A path to a slider configuration directory for the cluster

The path for the expanded test is automatically calculated as being the directory under ..\slider-assembly\target where an untarred slider distribution can be found. If it is not present, the tests will fail

The path to the configuration directory must be supplied in the property slider.conf.dir which can be set on the command line

mvn clean verify -Dslider.conf.dir=src/test/clusters/sandbox/slider

It can also be set in the (optional) file slider-funtest/build.properties:

slider.conf.dir=src/test/clusters/sandbox/slider

This file is loaded whenever a slider build or test run takes place

Configuration of slider-client.xml

The slider-client.xml must have extra configuration options for both the HBase and Accumulo tests, as well as a common set for actually talking to a YARN cluster.

How to pick up core-site.xml, hdfs-site.xml and yarn-site.xml values

The slider-client.xml file can declare a HADOOP_CONF_DIR value for use by both the test runner AND the command line bin\slider application invoked.

  <property>
    <name>HADOOP_CONF_DIR</name>
    <value>/home/tester/sites/production/hadoop-conf</value>
  </property>

If set:

  1. The standard -site.xml files are loaded by the JUnit test runner, to bond the test classes to the YARN cluster.
  2. The property is used to set the environment variable HADOOP_CONF_DIR before the bin/slider or bin\slider.py script is executed.

Note 1: a path can be set relative to ${SLIDER_CONF_DIR}

  <property>
    <name>HADOOP_CONF_DIR</name>
    <value>${SLIDER_CONF_DIR}/../hadoop-conf</value>
  </property>

or, on Windows

  <property>
    <name>HADOOP_CONF_DIR</name>
    <value>${SLIDER_CONF_DIR}\..\hadoop-conf</value>
  </property>

Note 2: To test on the local cluster, use either an absolute path, or refer to the environment variable env.HADOOP_CONF_DIR:

  <property>
    <name>HADOOP_CONF_DIR</name>
    <value>${env.HADOOP_CONF_DIR}</value>
  </property>

Be aware that expansion of the env. environment variables and SLIDER_CONF_DIR are only performed on the test runs. If the same slider-client.xml value is used directly from the CLI, the environment variable HADOOP_CONF_DIR must be set to the absolute path of the hadoop configuration directory.

How to validate the configuration

A test case executing the slider diagnostics command can be used to print out the configuration as seen by the bin/slider process

mvn integration-test \
 -Dslider.conf.dir=/home/tester/sites/production/slider \
 -Dit.test=DiagnosticsCommandIT

Disabling the functional tests entirely

All functional tests which require a live YARN cluster are run during the integration-test phase. They are executed with the commands mvn verify or mvn integration-test .

If you do not wish to run the functional tests, simply use the mvn package command and only those tests which do not require a live YARN cluster will run.

Non-mandatory options

The following test options may be added to slider-client.xml if the defaults need to be changed

<property>
  <name>slider.test.thaw.wait.seconds</name>
  <description>Time to wait in seconds for a thaw to result in a running AM</description>
  <value>60000</value>
</property>

<property>
  <name>slider.test.freeze.wait.seconds</name>
  <description>Time to wait in seconds for a freeze to halt the cluster</description>
  <value>60000</value>
</property>

 <property>
  <name>slider.test.timeout.millisec</name>
  <description>
  Time out in milliseconds before a test is considered to have failed.
  There are some maven properties which also define limits and may need adjusting
  </description>
  <value>180000</value>
</property>

 <property>
  <name>slider.test.yarn.ram</name>
  <description>Size in MB to ask for containers</description>
  <value>192</value>
</property>

<property>
  <name>slider.test.agent.enabled</name>
  <description>Flag to enable/disable Agent tests</description>
  <value>true</value>
</property>

<property>
  <name>slider.test.windows.cluster</name>
  <description>Flag to indicate that the test cluster is windows.
  This defaults to the same OS as the host running the tests
  </description>
  <value>false</value>
</property>

<property>
  <name>slider.test.am.restart.time</name>
  <description>Time in millis to await an AM restart</description>
  <value>60000</value>
</property>

Note that while the same properties need to be set in slider-core/src/test/resources/slider-client.xml, those tests take a file in the local filesystem -here a URI to a path visible across all nodes in the cluster are required the tests do not copy the .tar/.tar.gz files over. The application configuration directories may be local or remote -they are copied into the .slider directory during cluster creation.

Provider-specific parameters

An individual provider can pick up settings from their own src/test/resources/slider-client.xml file, or the one in slider-core. We strongly advice placing all the values in the slider-core file.

  1. All uncertainty about which file is picked up on the class path first goes away
  2. There's one place to keep all the configuration values in sync.

Agent Tests

Agent tests are executed through the following mvn command executed at slider/slider-funtest:

cd slider-funtest
mvn verify -Dslider.conf.dir=../src/test/clusters/remote/slider -Dit.test=AppsThroughAgentIT -DfailIfNoTests=false

Enable/Execute the tests

To enable the test ensure that slider.test.agent.enabled is set to true.

<property>
  <name>slider.test.agent.enabled</name>
  <description>Flag to enable/disable Agent tests</description>
  <value>true</value>
</property>

Test setup

Edit config file src/test/clusters/remote/slider/slider-client.xml or your chosen equivalent and ensure that the host names are accurate for the test cluster.

User setup

Ensure that the user running the test, is present on the cluster against which you are running the tests. The user must be a member of the hadoop group.

E.g:

adduser testuser -d /home/testuser -G hadoop -m

HDFS Setup

Set up hdfs folders for test user

su hdfs
hdfs dfs -mkdir /user/testuser
hdfs dfs -chown testuser:hdfs /user/testuser

Configuring the YARN cluster for tests

Here are the configuration options we use in yarn-site.xml for testing:

These tell YARN to ignore memory requirements in allocating VMs, and to keep the log files around after an application run.

  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1</value>
  </property>
  <property>
    <description>Whether physical memory limits will be enforced for
      containers.
    </description>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
  </property>
  <!-- we really don't want checking here-->
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>

  <!-- how long after a failure to see what is left in the directory-->
  <property>
    <name>yarn.nodemanager.delete.debug-delay-sec</name>
    <value>60000</value>
  </property>

  <!--ten seconds before the process gets a -9 -->
  <property>
    <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name>
    <value>30000</value>
  </property>


  <!-- registry-->
  <property>
    <name>hadoop.registry.rm.enabled</name>
    <value>true</value>
  </property>

  <property>
    <name>hadoop.registry.zk.quorum</name>
    <value>${hostname}:2181</value>
  </property>

Testing against a secure cluster

To test against a secure cluster

  1. slider-client.xml must be configured as per Security.
  2. the client must have the kerberos tokens issued so that the user running the tests has access to HDFS and YARN.

If there are problems authenticating (including the cluster being offline) the tests appear to hang

Validating the configuration

mvn clean verify -Dit.test=BuildSetupIT

Parallel execution

Attempts to run test cases in parallel failed —even with a configuration to run methods in a class sequentially, but separate classes independently.

Even after identifying and eliminating some unintended sharing of static mutable variables, trying to run test cases in parallel seemed to hang tests and produce timeouts.

For this reason parallel tests have been disabled. To accelerate test runs through parallelization, run different tests on different hosts instead.

Other constraints

  • Port assignments SHOULD NOT be fixed, as this will cause clusters to fail if there are too many instances of a role on a same host, or if other tests are using the same port.
  • If a test does need to fix a port, it MUST be for a single instance of a role, and it must be different from all others. The assignment should be set in org.apache.slider.funtest.itest.PortAssignments so as to ensure uniqueness over time. Otherwise: use the value of 0 to allow the OS to assign free ports on demand.

Test Requirements

  1. Test cases should be written so that each class works with exactly one Slider-deployed cluster
  2. Every test MUST have its own cluster name -preferably derived from the classname.
  3. This cluster should be deployed in an @BeforeClass method.
  4. The @AfterClass method MUST tear this cluster down.
  5. Tests must skip their execution if functional tests -or the specific hbase or accumulo categories- are disabled.
  6. Tests within the suite (i.e. class) must be designed to be independent -to work irrespectively of the ordering of other tests.

Running and debugging the functional tests.

The functional tests all

  1. In the root slider directory, build a complete Slider release

    mvn install -DskipTests
    
  2. Start the YARN cluster/set up proxies to connect to it, etc.

  3. In the slider-funtest dir, run the tests

    mvn clean verify
    

Slider does not need mvn install to be run before executing the functional tests. In fact, this may interfere with the tests picking up the most recent changes to the code. However, if you want to run tests for an individual module instead of running all Slider tests at once, you will need to install first. Make sure to install from the top level Slider directory so that all code changes are included. To run all functional tests, simply run mvn clean verify at the top level.

If you are testing an individual module, and you want to propagate changes in slider-core through to the funtest classes for testing, you must build/install all the slider packages from the root assembly. A common mistake during development is to rebuild the slider-core JARs then the slider-funtest tests without rebuilding the slider-assembly. In this situation, the tests are in sync with the latest build of the code -including any bug fixes- but the scripts executed by those tests are of a previous build of slider-core.jar. As a result, the fixes are not picked up.

Limitations of slider-funtest

  1. All tests run from a single client -workload can't scale
  2. Output from failed AM and containers aren't collected

Troubleshooting the functional tests

  1. If application instances fail to come up as there are still outstanding requests, it means that YARN didn't have the RAM/cores to spare for the number of containers. Edit the slider.test.yarn.ram to make it smaller.

  2. If you are testing in a local VM and stops responding, it'll have been swapped out to RAM. Rebooting can help, but for a long term fix go through all the Apache Hadoop configurations (HDFS, YARN, Zookeeper) and set their heaps to smaller numbers, like 256M each. Also: turn off unused services (hcat, oozie, webHDFS)

  3. The YARN UI will list the cluster launches -look for the one with a name close to the test and view its logs

  4. Container logs will appear "elsewhere". The log lists the containers used -you may be able to track the logs down from the specific nodes.

  5. If you browse the filesystem, look for the specific test clusters in ~/.slider/cluster/$testname

  6. If you are using a secure cluster, make sure that the clocks are synchronized, and that you have a current token -klist will tell you this. In a VM: install and enable ntp, consider rebooting if ther are any problems. Check also that it has the same time zone settings as the host OS.

running a single integration test

You can run a single integration test (or a pattern of tests) by setting the it.test property

For example, to run all Agent IT tests:

mvn integration-test -Dslider.conf.dir=${your-config-dir} -Dit.test=Agent\*IT

Accumulo configuration options

Optional parameters

 <property>
  <name>slider.test.accumulo.launch.wait.seconds</name>
  <description>Time to wait in seconds for Accumulo to start</description>
  <value>1800</value>
 </property>