Apache Slider: Getting Started

This page is updated to reflect the latest code in "develop".

Introduction

The following provides the steps required for setting up a cluster and deploying a YARN hosted application using Slider.

System Requirements

The Slider deployment has the following minimum system requirements:

  • Apache Hadoop Hadoop 2.6+

  • Required Services: HDFS, YARN and ZooKeeper

  • Oracle JDK 1.7 (64-bit)

  • Python 2.6

  • openssl

Setup the Cluster

Set up your Hadoop cluster with the services listed above.

Note: Ensure the debug delay config is set to a non-zero value to allow easy debugging. If you are using a single VM or a sandbox then you may need to modify your YARN configuration to allow for multiple containers on a single host. In yarn-site.xml make the following modifications:

Property Value
yarn.scheduler.minimum-allocation-mb >= 256 (ensure that YARN can allocate sufficient number of containers)
yarn.nodemanager.delete.debug-delay-sec >= 3600 (to retain for an hour)

Example

<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>256</value>
</property>
<property>
  <name>yarn.nodemanager.delete.debug-delay-sec</name>
  <value>3600</value>
</property>

There are other options detailed in the Troubleshooting file available here.

Download Slider Packages

You can build it as described below.

Build Slider

  • From the top level directory, execute mvn clean site:site site:stage package -DskipTests
  • Use the generated compressed tar file in slider-assembly/target directory (e.g. slider-0.80.0-incubating-all.tar.gz or slider-0.80.0-incubating-all.zip) for the subsequent steps
  • If you are cloning the Slider git repo, go to releases/slider-0.80.0-incubating branch for the latest released or develop for the latest under development

Install Slider

Slider is installed on a client machine that can access the hadoop cluster. Follow the following steps to expand/install Slider:

mkdir ${slider-install-dir}

cd ${slider-install-dir}

You can run Slider app as any user. Only requirement is that the user should have a home directory on HDFS. For the reminder of the doc, it is assumed that "yarn" user is being used.

Login as the "yarn" user (assuming this is a host associated with the installed cluster). E.g., su yarn This assumes that all apps are being run as ‘yarn’ user

Expand the tar file: tar -xvf slider-0.80.0-incubating-all.tar.gz or unzip slider-0.80.0-incubating-all.zip

Configure Slider

Browse to the Slider directory: cd slider-0.80.0-incubating/conf

Edit slider-env.sh and specify correct values.

export JAVA_HOME=/usr/jdk64/jdk1.7.0_67
export HADOOP_CONF_DIR=/etc/hadoop/conf

(You only need to set JAVA_HOME if it is not already set)

If you are on a node that does not have the hadoop conf folder then you can add the relevant configurations into slider-client.xml.

You can also simply configure slider-client.xml with the path to the Hadoop configuration directory.

This can be absolute

<property>
  <name>HADOOP_CONF_DIR</name>
  <value>/etc/hadoop/conf</value>
</property>

or it can be relative to the property SLIDER_CONF_DIR, which is set to the directory containing the slider-client.xml file.

<property>
  <name>HADOOP_CONF_DIR</name>
  <value>${SLIDER_CONF_DIR}/../hadoop-conf</value>
</property>

If the hadoop configuration dir is configured this way, the

For example, if you are targeting a non-secure cluster with no HA for NameNode or ResourceManager, modify Slider configuration file ${slider-install-dir}/slider-0.80.0-incubating/conf/slider-client.xml to add the following properties:

  <property>
    <name>hadoop.registry.zk.quorum</name>
    <value>yourZooKeeperHost:port</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>yourResourceManagerHost:8050</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>yourResourceManagerHost:8030</value>
  </property>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://yourNameNodeHost:8020</value>
  </property>

Execute:

${slider-install-dir}/slider-0.80.0-incubating/bin/slider version

OR

python %slider-install-dir%/slider-0.80.0-incubating/bin/slider.py version

Ensure there are no errors and you can see "Compiled against Hadoop 2.6.0". This ensures that Slider is installed correctly.

Deploy Slider Resources

Ensure that all file folders are accessible to the user creating the application instance. The example assumes "yarn" to be that user. The default yarn user may be different e.g. hadoop.

Ensure HDFS home folder for the User

Perform the following steps to create the user home folder with the appropriate permissions:

su hdfs

hdfs dfs -mkdir /user/yarn

hdfs dfs -chown yarn:hdfs /user/yarn

Create Application Packages

There are several sample application packages available for use with Slider:

  • app-packages/memcached-win or memcached: The README.txt describes how to create the Slider Application package for memcached for linux or windows. memcached app packages are a good place to start

  • app-packages/hbase or hbase-win: The README.txt file describes how to create a Slider Application Package for HBase. By default it will create a package for HBase 0.98.3 but you can create the same for other versions.

  • app-packages/accumulo: The README.txt file describes how to create a Slider Application Package for Accumulo.

  • app-packages/storm or storm-win: The README.txt describes how to create the Slider Application package for Storm.

Create one or more Slider application packages and follow the steps below to install them.

Install, Configure, Start and Verify Sample Application

Install Sample Application Package

slider install-package --name *package name* --package *sample-application-package*

Package gets deployed on HDFS at <User home dir>/.slider/package/<name provided in the command> This path is also reflected in appConfig.json through property "application.def" So the path and the property value must be in sync.

For example, you can use the following command for HBase application package.

slider install-package --name HBASE --package path-to-hbase-package.zip

Create Application Specifications

Configuring a Slider application consists of two parts: the Resource Specification, and the *Application Configuration.

Note: There are sample Resource Specifications (resources-default.json) and Application Configuration (appConfig-default.json) files in the application packages zip files.

Resource Specification

Slider needs to know what components (and how many components) to deploy.

As Slider creates each instance of a component in its own YARN container, it also needs to know what to ask YARN for in terms of memory and CPU for those containers.

All this information goes into the Resources Specification file ("Resource Spec") named resources.json. The Resource Spec tells Slider how many instances of each component in the application (such as an HBase RegionServer) to deploy and the parameters for YARN.

An application package should contain the default resources.json and you can start from there. Or you can create one based on Resource Specification).

Store the Resource Spec file on your local disk (e.g. /tmp/resources.json).

Application Configuration

Alongside the Resource Spec there is the Application Configuration file ("App Config") which includes parameters that are specific to the application, rather than YARN. The App Config is a file that contains all application configuration. This configuration is applied to the default configuration provided by the application definition and then handed off to the associated component agent.

For example, the heap sizes of the JVMs, The App Config defines the configuration details specific to the application and component instances. An application package should contain the default resources.json and you can start from there.

Ensure the following variables are accurate:

  • "application.def": "application_package.zip" (path to uploaded application package)

Start the Application

Once the steps above are completed, the application can be started through the Slider Command Line Interface (CLI).

Change directory to the "bin" directory under the slider installation

cd ${slider-install-dir}/slider-0.80.0-incubating/bin

Execute the following command:

./slider create cl1 --template appConfig.json --resources resources.json

Verify the Application

The successful launch of the application can be verified via the YARN Resource Manager Web UI. In most instances, this UI is accessible via a web browser at port 8088 of the Resource Manager Host:

image alt text

The specific information for the running application is accessible via the "ApplicationMaster" link that can be seen in the far right column of the row associated with the running application (probably the top row):

image alt text

Obtain client config

An application publishes several useful details that can be used to manage and communicate with the application instance.

You can use registry commands to get all published data.

All published data

All published data are available at the /ws/v1/slider/publisher end point of the AppMaster (e.g. http://c6403.ambari.apache.org:34701/ws/v1/slider/publisher). Its obtainable through slider-client status <app name> where field info.am.web.url specifies the base address. The URL is also advertised in the AppMaster tracking UI.

Client configuration Client configurations are at /ws/v1/slider/publisher/slider/<config name> where config name can be

  • site config file name without extension, such as hbase-site (e.g. http://c6403.ambari.apache.org:34701/ws/v1/slider/publisher/slider/hbase-site) where the output is json formatted name-value pairs
  • or, with extension such as hbase-site.xml (http://c6403.ambari.apache.org:34701/ws/v1/slider/publisher/slider/hbase-site.xml) to get an XML formatted output that can be consumed by the clients

Log locations

The log locations for various containers in the application instance are at ws/v1/slider/publisher/slider/logfolders

{

  "description": "LogFolders",
  "entries": {
    "c6403.ambari.apache.org-container_1404783501621_0002_01_000002-AGENT_LOG_ROOT": "/hadoop/yarn/log/application_1404783501621_0002/container_1404783501621_0002_01_000002",
    "c6403.ambari.apache.org-container_1404783501621_0002_01_000003-AGENT_LOG_ROOT": "/hadoop/yarn/log/application_1404783501621_0002/container_1404783501621_0002_01_000003",
    "c6403.ambari.apache.org-container_1404783501621_0002_01_000003-AGENT_WORK_ROOT": "/hadoop/yarn/local/usercache/yarn/appcache/application_1404783501621_0002/container_1404783501621_0002_01_000003",
    "c6403.ambari.apache.org-container_1404783501621_0002_01_000002-AGENT_WORK_ROOT": "/hadoop/yarn/local/usercache/yarn/appcache/application_1404783501621_0002/container_1404783501621_0002_01_000002"
  },
  "updated": 0,
  "empty": false
}

Quick links

An application may publish some quick links for the monitoring UI, JMX endpoint, etc. as the listening ports are dynamically allocated and thus need to be published. These information is available at /ws/v1/slider/publisher/slider/quicklinks.

Manage the Application Lifecycle

Once started, applications can be stopped, (re)started, and destroyed as follows:

Stopped:

./slider stop cl1

Started:

./slider start cl1

Destroyed:

./slider destroy

Flexed:

./slider flex cl1 --component worker 5

Appendix A: Debugging Slider-Agent

Create and deploy Slider Agent configuration

Create an agent config file (agent.ini) based on the sample available at:

${slider-install-dir}/slider-0.80.0-incubating/agent/conf/agent.ini

The sample agent.ini file can be used as is (see below). Some of the parameters of interest are:

log_level = INFO or DEBUG, to control the verbosity of log

app_log_dir = the relative location of the application log file

log_dir = the relative location of the agent and command log file

Once created, deploy the agent.ini file to HDFS (it can be deployed in any location accessible to application instance):

su yarn

hdfs dfs -copyFromLocal agent.ini /user/yarn/agent/conf

Modify the --template json file (appConfig.json) and add the location of the agent.ini file.

"agent.conf": "/user/yarn/agent/conf/agent.ini"