Apache Slider: YARN-hosted applications

NAME

slider -YARN-hosted applications

SYNOPSIS

Slider enables applications to be dynamically created on a YARN-managed datacenter. The program can be used to create, pause, and shutdown the application. It can also be used to list current active and existing but not running "stopped" application instances.

CONCEPTS

  1. A Slider application is an application packaged to be deployed by Slider. It consists of one or more distributed components

  2. A Slider application instance is a slider application configured to be deployable on a specific YARN cluster, with a specific configuration. An instance can be live -actually running- or stopped. When stopped all its configuration details and instance-specific data are preserved on HDFS.

  3. An instance directory is a directory created in HDFS describing the application instance; it records the configuration -both user specified, application-default and any dynamically created by slider.

  4. A user can create an application instance.

  5. A live instances can be stopped, saving its final state to its application instance state directory. All running components are shut down.

  6. A stopped instance can be started -a its components started on or near the servers where they were previously running.

  7. A stopped instance can be destroyed.

  8. Running instances can be listed.

  9. An instance consists of a set of components

  10. The supported component types depends upon the slider application.

  11. the count of each component must initially be specified when an application instance is created.

  12. Users can flex an application instance: adding or removing components dynamically. If the application instance is live, the changes will have immediate effect. If not, the changes will be picked up when the instance is next started.

Invoking Slider

slider <ACTION> [<name>] [<OPTIONS>]

COMMON COMMAND-LINE OPTIONS

--conf configuration.xml

Configure the Slider client. This allows the filesystem, zookeeper instance and other properties to be picked up from the configuration file, rather than on the command line.

Important: *this configuration file is not propagated to the application. It is purely for configuring the client itself.

-D name=value

Define a Hadoop configuration option which overrides any options in the client configuration XML files.

-m, --manager url

URL of the YARN resource manager

--fs filesystem-uri

Use the specific filesystem URI as an argument to the operation.

Instance Naming

Application instance names must:

  1. be at least one character long
  2. begin with a lower case letter
  3. All other characters must be in the range [a-z,0-9,_]
  4. All upper case characters are converted to lower case

Example valid names:

slider1
storm4
hbase_instance
accumulo_m1_tserve4

Actions

COMMANDS

slider build <name>

Build an instance of the given name, with the specific options.

It is not started; this can be done later with a start command.

slider create <name>

Build and run an application instance of the given name

Arguments for build and create

--package <uri-to-package>

This define the slider application package to be deployed.

--option <name> <value>

Set an application instance option.

Example:

Set an option to be passed into the -site.xml file of the target system, reducing the HDFS replication factor to 2. (

--option site.dfs.blocksize 128m

Increase the number of YARN containers which must fail before the Slider application instance itself fails.

-O slider.container.failure.threshold 16
--appconf dfspath

A URI path to the configuration directory containing the template application specification. The path must be on a filesystem visible to all nodes in the YARN cluster.

  1. Only one configuration directory can be specified.
  2. The contents of the directory will only be read when the application instance is created/built.

Example:

--appconf hdfs://namenode/users/slider/conf/hbase-template
--appconf file://users/accumulo/conf/template
--apphome localpath

A path to the home dir of a pre-installed application. If set when a Slider application instance is created, the instance will run with the binaries pre-installed on the nodes at this location

Important: this is a path in the local filesystem which must be present on all hosts in the YARN cluster

Example

--apphome /usr/hadoop/hbase
--template <filename>

Filename for the template application instance configuration. This will be merged with -and can overwrite- the built-in configuration options, and can then be overwritten by any command line --option and --compopt arguments to generate the final application configuration.

--resources <filename>

Filename for the resources configuration. This will be merged with -and can overwrite- the built-in resource options, and can then be overwritten by any command line --resopt, --rescompopt and --component arguments to generate the final resource configuration.

--image path

The full path in Hadoop HDFS to a .tar or .tar.gz file containing the binaries needed to run the target application.

Example

--image hdfs://namenode/shared/binaries/hbase-0.96.tar.gz
--component <name> <count>

The desired number of instances of a component

Example

--component worker 16

This just sets the component.instances value of the named component's resource configuration. it is exactly equivalent to

--rco worker component.instances 16

--compopt <component> <option> <value>

Provide a specific application configuration option for a component

Example

--compopt master env.TIMEOUT 10000

These options are saved into the app_conf.json file; they are not used to configure the YARN Resource allocations, which must use the --rco parameter

Resource Component option --rescompopt --rco

--rescompopt <component> <option> <value>

Set any role-specific option, such as its YARN memory requirements.

Example

--rco worker master yarn.memory 2048
--rco worker worker yarn.memory max
--zkhosts host1:port1,[host2:port2,host3:port3, ...]

The zookeeper quorum.

Example

--zkhosts zk1:2181,zk2:2181,zk3:4096

If unset, the zookeeper quorum defined in the property slider.zookeeper.quorum is used

--zkpath <zookeeper-path>

A path in the zookeeper cluster to create for an application. This is useful for applications which require a path to be created in advance of their deployment. When the an application instance is destroyed, this path will be deleted.

--queue <queue name>

The queue to deploy the application to. By default, YARN will pick the queue.

Example

--queue applications

Arguments purely for the create operation

--wait <milliseconds>

The --wait parameter, if provided, specifies the time in milliseconds to wait until the YARN application is actually running. Even after the YARN application has started, there may be some delay for the instance to start up.

[--out ]

--out <filename>

The name of a file to save a YARN application report to as a JSON file. This file will contain the YARN application ID and other information about the submitted application.

Examples

Create an application by providing template and resources.

create hbase1 --template /usr/work/hbase/appConfig.json --resources /usr/work/hbase/resources.json

Create an application by providing template and resources and queue.

create hbase1 --template /usr/work/hbase/appConfig.json --resources /usr/work/hbase/resources.json --queue default

destroy <name>

Destroy a (stopped) application instance .

Important: This deletes all persistent data, hence invoking this command by default does not destroy the app. It prints an appropriate message and asks the app owner to re-invoke the same command with --force option, if they are sure they know what they are doing.

Example

slider destroy --force instance1

exists <name> [--live] [--state status]

Probe the existence of the named Slider application instance. If the --live flag is set, the instance must actually be running

If not, an error code is returned.

When the --live flag is unset, the command looks for the application instance to be defined in the filesystem -its operation state is not checked.

it will "succeed" if the definition files of the named application instance are found.

Example:

slider exists instance4

Return codes

 0 : application instance is running
-1 : application instance exists but is not running
69 : application instance is unknown

Live Tests

When the --live flag is set, the application instance must be running or about to run for the probe to succeed. That is, either application is running (RUNNING) or in any of the states from which an application can start running. That means the service can be in any of the states NEW, NEW_SAVING, SUBMITTED, ACCEPTED or RUNNING

An application instance that is FINISHED or FAILED or KILLED is not considered to be live.

Note that probe does not check the liveness of the actually deployed application, merely that the application instance has been deployed

Return codes

 0 : application instance is running
-1 : application instance exists but is not running
69 : application instance is unknown

Example:

slider exists instance4 --live

When the --state flag is set, a specific YARN application state is checked for.

The allowed YARN states are:

  NEW: Application which was just created.
  NEW_SAVING: Application which is being saved.
  SUBMITTED: Application which has been submitted. 
  ACCEPTED: Application has been accepted by the scheduler
  RUNNING: Application which is currently running. 
  FINISHED:  Application which finished successfully. 
  FAILED: Application which failed.
  KILLED: Application which was terminated by a user or admin.

Example:

slider exists instance4 --state ACCEPTED

Return codes

 0 : application instance is running
-1 : application instance exists but is not in the desired state
69 : application instance is unknown

flex <name> [--component component count]*

Flex the number of workers in an application instance to the new value. If greater than before, new copies of the component will be requested. If less, component instances will be destroyed.

This operation has a return value of 0 if the change was submitted. Changes are not immediate and depend on the availability of resources in the YARN cluster

It returns -1 if there is no running application instance

Example

slider flex instance1 --component worker 8 --filesystem hdfs://host:port
slider flex instance1 --component master 2 --filesystem hdfs://host:port

install-package --name <name of the package> --package <package file> [--replacepkg]

Install the application package to the default package location for the user under ~/.slider/package/. This is the location referred to by the appConfig.json file provided in the --template parameter in the create command.

--name <name of the package>

Name of the package. It may be the same as the name provided in the metainfo.xml. Ensure that the same value is used in the default application package location specified in the default appConfig.json file.

--package <package file>

Location of the package on local disk.

--replacepkg

Optional. Whether to overwrite an already installed package.

Example

slider install-package --name HBASE --package /usr/work/package/hbase/slider-hbase-app-package-0.98.4-hadoop2.zip
slider install-package --name HBASE --package /usr/work/package/hbase/slider-hbase-app-package-0.98.4-hadoop2.zip --replacepkg

kdiag [--keytab <keytab> --principal <principal>] [--out <outfile>] [--keylength <length>] [--secure]

Kerberos diagnostics.

Any information which can be obtained to diagnose Kerberos problems: dumping settings, attempting login from a given keytab, etc, etc.

The purpose here is to have something which can be used to begin to understand why the client is having problems talking to Kerberos; a file which can be attached to support calls.

For an example of the output, see SLIDER-1027

Arguments

  • --keytab <keytab> --principal <principal> : list a keytab file to use and the principal to log in as. The file must contain the specific principal.

  • --keylength <length>: set the minimum encryption key length as measured in bits. If the JVM does not support this length, the command will fail. The default value is to 256, as needed for the AES256 encryption scheme. A JVM without the Java Cryptography Extensions installed does not support a key length of 256 bits: Kerberos will unless configured to use an encryption scheme with a shorter key length.

  • --secure: fail if the command is not executed on a secure cluster. That is: if the hadoop authentication mechanism of the cluster is "simple".

Although there is a --out outfile option, much of the output can come from the JRE (to stderr) and via log4j (to stdout). To get all the output, it is best to redirect both these output streams to the same file, and omit the --out option.

slider kdiag --keytab zk.service.keytab --principal zookeeper/devix.example.org@REALM > out.txt 2>&1

For extra logging during the operation

  1. Set the environment variable HADOOP_JAAS_DEBUG to true.

    export HADOOP_JAAS_DEBUG=true
    
  2. Edit the log4j.properties file for the slider client:

    log4j.logger.org.apache.hadoop.security=DEBUG
    

The diagnostics information currently generated are incomplete. Any contributions to this codebase is very welcome.

list [name] [--live] [--status status]

List Slider application instances visible to the user. This includes instances which are on the filesystem.

If no instance name is specified, all instances matching the criteria are listed.

  1. --live indicates live instances are to be listed: that is anything RUNNING or awaiting execution (e.g ACCEPTED) or earlier.
  2. --state <state> defines an explicit state for which a record of the cluster must be found in the RM.

The default is: list all application instances, running or not

If an instance name is given, then that instance must in the filesystem or the operation will fail with the unknown cluster exit code.

If the instance exists but is not in the -live state or any state specified by a --state argument —the operation will return -1

Example

slider list
slider list instance1
slider list --live
slider list instance1 --live
slider list --state FINISHED
slider list --state KILLED
slider list --state FAILED
slider list instance1 --state FAILED

Important: listings which search for completed instances may succeed while an instance of the same name is running. This is because the operation lists YARN records —and records of completed applications are retained for some time.

That is, if an instance is started and then stopped, then a new instance started, the following two operations may both succeed

slider list instance1 --live
slider list --state FINISHED

registry (--list | --listconf | --getconf <conf> ) [--name <name>] [--servicetype <servicetype>] [--out <filename>] [--verbose]

List registered application instances visible to the user. This is slightly different from the slider list command in that it does not make use of the YARN application list. Instead it communicates with Zookeeper -and works with any applications which has registered itself with the "service registry"

The --name <name> option names the registry entry to work with. For slider applications, this is the application instance

The --user <user> option names the user who owns/deployed the service. it defaults to the current user.

The --servicetype <servicetype> option allows a different service type to be chosen. The default is org-apache-slider

The --verbose flag triggers more verbose output on the operations

The --internal flag indicates the configurations to be listed and retrieved are from the "internal" list of configuration data provided for use within a deployed application.

There are two common exit codes, the exact values being documented in Exit Codes

  1. If there is no matching service then the operation fails with the EXIT_NOT_FOUND code (77).
  2. If there are no configurations in a listing, or the named configuration is not found, the command returns the exit code EXIT_NOT_FOUND (77)

Operations:

slider registry --list [--servicetype <servicetype>] [--name <name>] [--verbose] [--user <user>] [--out <filename>]

List all services of the service type and optionally the name.

If --out specified a file, the output is listed to the file, one entry per line. An empty file means that no entries were found.

slider registry --listconf [--name <name>] [--internal] [--servicetype <servicetype>] [--user <user>] [--out <filename>]

List the configurations exported by of a named application.

If --out specified a file, the output is listed to the file, one entry per line. An empty file means that no entries were found.

slider registry --getconf <configuration> [--format (xml|json|properties)] [--servicetype <servicetype>] [--name <name>] [--dest <path>] [--internal] [--user <user>] get the configuration

Get a named configuration in a chosen format. Default: XML

--dest <path> : the filename or directory to save a configuration to. --format (xml|json|properties) defines the output format

If the --dest argument is set and refers to a directory, the file is saved under that file, with the filename derived from the configuration name requested:

Example:

slider registry --getconf hbase-site.xml --name hbase1 --dest confdir

If confdir exists, this downloads the hbase site configuration to confdir/hbase-site.xml.

If the destination path refers to a file (or does not exist), the specified path is used for the file:

slider registry --getconf hbase-site.xml --name hbase1 --dest configfile.xml

This will download the configuration to the file configfile.xml.

slider resolve --path <path> [--out <filename>] [--list] [--destdir <dir]

This command resolves the service record under a path in the registry, or lists all service records directly under a path.

The result can be printed to the console (default) or saved to the filesystem; the means of specifiying the destination varies depending on whether a single record or a listing is requested.

Resolve a single entry: slider resolve --path <path> [--out <file> [--destdir <dir>]

The basic slider resolve --path <path> command, without the --list option will attempt to resolve the service record at the destination. The record may be saved to the file specified with the --out

Example: resolve and print the record at /users/hbase/services/org-apache-hbase/instance1

slider resolve --path /users/hbase/services/org-apache-hbase/instance1

Example: resolve the record at /users/hbase/services/org-apache-hbase/instance1 and save it to a file

slider resolve --path /users/hbase/services/org-apache-hbase/instance1 --out hbase.json

If the specified path is not in the registry, or the path exists but there is no service record there, the return code is EXIT_NOT_FOUND, 77.

List all entries and services under a path: slider resolve --path <path> --list

slider resolve --path <path> --list command will list all service records directly under a path.

The all entries will be listed to the console, followed by the individual service records of those entries that contain a service record declaration.

The service records can be saved to a directory, one JSON file per entry. The --destdir option enables this saving of the entries —and identifies the destination directory for them. Each entry will be saved with the entry name suffixed by .json.

It is an error if the path does not exist; the exit code will be EXIT_NOT_FOUND, 77.

It is not an error if the path does exist but there are no records underneath it.

Example: list services under /users/hbase/services/org-apache-hbase/

slider resolve --path /users/hbase/services/org-apache-hbase/ --list

This will list all services deployed under this path. If a service "hbase-1" had been deployed, it would be printed.

Example: list services under /users/hbase/services/org-apache-hbase/ and save the results

slider resolve --path /users/hbase/services/org-apache-hbase/ --list --destdir services

This will create a directory services and save service records to it. If a service "hbase-1" was registered a under this path, its service record would be saved to the file services/hbase-1.json.

The current users base path can be referred to via the "~" prefix:

slider resolve --path ~/services/ --list

This simplifies path creation, testing a,

start <name> [--wait milliseconds] [--out <filename>]

Start a stopped application instance, recreating it from its previously saved state.

After the application is launched, if an --out argument specified a file, the "YARN application report" will be saved as a JSON document into the file specified.

Examples:

slider start instance2
slider start instance1 --wait 60000
slider start instance1 --out appreport.json

If the application instance is already running, this command will not affect it.

status <name> [--out <filename>]

Get the status of the named application instance in JSON format. A filename can be used to specify the destination.

Examples:

slider status instance1 --manager host:port

slider status instance2 --manager host:port --out status.json

stop <name> [--force] [--wait time] [--message text]

stop the application instance. The running application is stopped. Its settings are retained in HDFS.

The --wait argument can specify a time in seconds to wait for the application instance to be stopped.

The --force flag causes YARN asked directly to terminate the application instance. The --message argument supplies an optional text message to be used in the request: this will appear in the application's diagnostics in the YARN RM UI.

If an unknown (or already stopped) application instance is named, no error is returned.

Examples

slider stop instance1 --wait 30
slider stop instance2 --force --message "maintenance session"

version

The command slider version prints out information about the compiled Slider application, the version of Apache Hadoop against which it was built -and the version of Hadoop that is currently on its classpath.

Note that this is the client-side Hadoop version, not that running on the server, though that can be obtained in the status operation

Commands for testing

These operations are here primarily for testing.

kill-container <name> --id container-id

Kill a YARN container belong to the application. This is useful primarily for testing the resilience to failures.

Container IDs can be determined from the application instance status JSON document.

am-suicide <name> [--exitcode code] [--message message] [--wait time]

This operation is purely for testing Slider Application Master restart; it triggers an asynchronous self-destruct operation in the AM -an operation that does not make any attempt to cleanly shut down the process.

If the application has not exceeded its restart limit (as set by slider.yarn.restart.limit), YARN will attempt to restart the failed application.

Example

slider am-suicide --exitcode 1 --wait 5000 -message "test"

tokens [--source <file>] [--out file] [--keytab <keytab> --principal <principal>]

Lists current delegation tokens, or, on a secure cluster creates new ones.

This is useful for testing the delegation token mechanism offered by Oozie, in which Oozie collects the tokens needed by slider, saves them to a file, then starts slider in with the environment variable HADOOP_TOKEN_FILE_LOCATION set to the location of this file. For ease of doing that, the bash command to set the property is printed.

For reference the tokens needed are:

  • An HDFS token.
  • A YARN client token to interact with the RM.
  • If the timeline server is enabled, a timeline server delegation token.

If the --keytab and --principal arguments are supplied, then the credentials will be generated with the named principal logged in from the specific keytab.