Apache Slider Core Configuration Specification, version 2.0¶
Terminology¶
Application A single application, such as an HBase cluster. An application is distribed across the YARN cluster.
Component A single executable part of the larger application. An application may have multiple components, and multiple instances of each component.
YARN Yet Another Resource Negotiator
YARN Resource Requirements The requirements for a YARN resource request. Currently this consists of RAM and CPU requirements.
YARN Container. An allocation portion of a servers resources granted to satisfy the requested YARN resource requirements. A process can be deployed to a container.
resources.json
: A file that describes the
size of the application in terms of its component requirements: how many,
and what their resource requirements are.
application.json
: A file that describes the
size of the application in terms of its component requirements: how many,
and what their resource requirements are.
internal.json
: A file which contains Slider's internal configuration
parameters.
Structure¶
Configurations are stored in well-formed JSON files. 1. Text MUST be saved in the UTF-8 format. 1. Duplicate entries MUST NOT occur in any section. 1. The ordering of elements is NOT significant.
The JSON specification files all have a similar structure
-
A
schema
string indicating version. Currently this is temporarily set to"http://example.org/specification/v2.0.0"
-
A global section,
/global
containing string properties - A component section,
/components
. - 0 or more sections under
/components
for each component, identified by component name, containing string properties. - An optional section
/metadata
containing arbitrary metadata (such as a description, author, or any other information that is not parsed or processed directly). - An optional section,
/credentials
containing security information.
The simplest valid specification file is
{ "schema": "http://example.org/specification/v2.0.0", "global": { }, "components": { } }
Property inheritance model and resolution¶
There is a simple global to component inheritance model.
- Properties defined in
/global
define parameters across the entire application. - Properties defined a section under
/components
define parameters for a specific component in the application. - All global properties are propagated to each component.
- A component section may override any global property.
- The final set of configuration properties for a component is the global properties extended and overridden by the global set.
- The process of expanding the properties is termed resolution; the resolved specification is the outcome.
- There is NO form of explicitly cross-referencing another attribute. This MAY be added in future.
- There is NO sharing of information from the different
.json
files in a an application configuration.
Example¶
Here is an example configuration
{ "schema": "http://example.org/specification/v2.0.0", "global": { "g1": "a", "g2": "b" }, "components": { "simple": { }, "master": { "name": "m", "g1": "overridden" }, "worker": { "name": "w", "g1": "overridden-by-worker", "timeout": "1000" } } }
The /global
section defines two properties
g1="a" g2="b"
These are the values visible to any part of the application which is not itself one of the components.
There are three components defined, simple
, master
and worker
.
component simple
:¶
g1="a" g2="b"
No settings have been defined specifically for the component; the global settings are applied.
component master
:¶
name="m", g1="overridden" g2="b"
A new attribute, name
, has been defined with the value "m"
, and the
global property g1
has been overridden with the new value, "overridden"
.
The global property g2
is passed down unchanged.
component worker
:¶
name="w", g1="overridden-by-worker" g2="b" timeout: "1000"
A new attribute, name
, has been defined with the value "w"
, and another,
timeout
, value "1000".
The global property g1
has been overridden with the new value, "overridden-by-worker"
.
The global property g2
is passed down unchanged.
This example shows some key points about the design
- each component gets its own map of properties, which is independent from that of other components.
- all global properties are either present or overridden by a new value. They can not be "undefined"
- new properties defined in a component are not visible to any other component.
The final resolved model is as follows
{ "schema": "http://example.org/specification/v2.0.0", "global": { "g1": "a", "g2": "b" }, "components": { "simple": { "g1": "a", "g2": "b" }, "master": { "name": "m", "g1": "overridden", "g2": "b" }, "worker": { "name": "m", "g1": "overridden-by-worker", "g2": "b", "timeout": "1000" } } }
This the specification JSON that would have generate exactly the same result as in the example, without any propagation of data from the global section to individual components.
Note that a resolved specification can still have the resolution operation applied to it -it just does not have any effect.
Metadata¶
The metadata section can contain arbitrary string values for use in diagnostics and by other applications.
To avoid conflict with other applications, please use a unique name in strings, such as java-style package names.
Resource Requirements: resources.json
¶
This file declares the resource requirements for YARN for the components of an application.
instances
: the number of instances of a role desired.
yarn.vcores
: number of "virtual" required by a component.
yarn.memory
: the number of megabytes required by a component.
{ "schema": "http://example.org/specification/v2.0.0", "metadata": { "description": "example of a resources file" }, "global": { "yarn.vcores": "1", "yarn.memory": "512" }, "components": { "master": { "instances": "1", "yarn.memory": "1024" }, "worker": { "instances":"5" } } }
The resolved file would be
{ "schema": "http://example.org/specification/v2.0.0", "metadata": { "description": "example of a resources file" }, "global": { "yarn.vcores": "1", "yarn.memory": "512" }, "components": { "master": { "instances": "1", "yarn.vcores": "1", "yarn.memory": "1024" }, "worker": { "instances":"5", "yarn.vcores": "1", "yarn.memory": "512" } } }
This declares this deployment of the application to consist of one instance of the master component, using 1 vcore and 1024MB of RAM, and five worker components each using one vcore and 512 MB of RAM.
Deployment specification: app_configuration.json
¶
This defines parameters that are to be used when creating the instance of the application, and instances of the individual components.
{ "schema": "http://example.org/specification/v2.0.0", "global": { "zookeeper.port": "2181", "zookeeper.path": "/yarnapps_small_cluster", "zookeeper.hosts": "zoo1,zoo2,zoo3", "env.MALLOC_ARENA_MAX": "4", "site.hbase.master.startup.retainassign": "true", "site.fs.defaultFS": "hdfs://cluster:8020", "site.fs.default.name": "hdfs://cluster:8020", "site.hbase.master.info.port": "0", "site.hbase.regionserver.info.port": "0" }, "components": { "worker": { "jvm.heapsize": "512M" }, "master": { "jvm.heapsize": "512M" } } "credentials" { } }
The resolved specification defines the values that are passed to the different components.
{ "schema": "http://example.org/specification/v2.0.0", "global": { "zookeeper.port": "2181", "zookeeper.path": "/yarnapps_small_cluster", "zookeeper.hosts": "zoo1,zoo2,zoo3", "env.MALLOC_ARENA_MAX": "4", "site.hbase.master.startup.retainassign": "true", "site.fs.defaultFS": "hdfs://cluster:8020", "site.fs.default.name": "hdfs://cluster:8020", "site.hbase.master.info.port": "0", "site.hbase.regionserver.info.port": "0" }, "components": { "worker": { "zookeeper.port": "2181", "zookeeper.path": "/yarnapps_small_cluster", "zookeeper.hosts": "zoo1,zoo2,zoo3", "env.MALLOC_ARENA_MAX": "4", "site.hbase.master.startup.retainassign": "true", "site.fs.defaultFS": "hdfs://cluster:8020", "site.fs.default.name": "hdfs://cluster:8020", "site.hbase.master.info.port": "0", "site.hbase.regionserver.info.port": "0", "jvm.heapsize": "512M" }, "master": { "zookeeper.port": "2181", "zookeeper.path": "/yarnapps_small_cluster", "zookeeper.hosts": "zoo1,zoo2,zoo3", "env.MALLOC_ARENA_MAX": "4", "site.hbase.master.startup.retainassign": "true", "site.fs.defaultFS": "hdfs://cluster:8020", "site.fs.default.name": "hdfs://cluster:8020", "site.hbase.master.info.port": "0", "site.hbase.regionserver.info.port": "0", "jvm.heapsize": "512M" } } "credentials" { } }
The site.
properties have been passed down to each component, components
whose templates may generate local site configurations. The override model
does not prevent any component from overriding global configuration so as
to create local configurations incompatible with the global state. (i.e.,
there is no way to declare an attribute as final). It is the responsibility
of the author of the configuration file (and their tools) to detect such issues.
Key Application Configuration Items¶
The following sections provides details about certain application configuration properties that can be utilized to tailor the deployment of a given application:
Controlling assigned port ranges¶
For certain deployments, the ports available for communication with clients
(Web UI ports, RPC ports, etc) are restricted to a specific set (e.g when using a firewall).
In those situations you can designate the set of allowed ports with the
site.global.slider.allowed.ports
setting.
This takes a comma-delimited set of port numbers and ranges, e.g.:
"site.global.slider.allowed.ports": "48000, 49000, 50001-50010"
The AM exposed ports (Web UI, RPC), as well as the ports allocated to launched application containers, will be limited to the ranges specified by the property value.
Delaying container launch¶
In situations where container restarts may need to be delayed to allow for
platform resources to be released (e.g. a port assigned to a previous container
may be slow to release), a delay can be designated by setting the container.launch.delay.sec
property.
"worker": { "jvm.heapsite": "512M", "container.launch.delay.sec": "30" }
Specifying the Python Executable Path¶
Slider containers use python for component scripts in the containers.
When deploying applications on certain variations of linux or other operating systems (e.g. Centos 5),
the version of python on the system PATH may be incompatible with the component script
In those circumstances the path to the python executable for container script execution can be
specified by the agent.python.exec.path
property:
"global": { "agent.python.exec.path": "/usr/bin/python", . . . }
This property may also be specified in the slider-client.xml
file (typically in the "conf" directory
of the slider installation) if the python version specified is to be utilized across multiple deployments:
<property> <name>agent.python.exec.path</name> <value>/usr/bin/python</value> </property>