Goal

Define the Slider application package for YARN hosted jmemcached. Jmemcached is a functionally equivalent java implementation of memcached, a distributed memory object caching system. The memcached daemons export the host/port they are listening on.

Basic version

The basic version of the app will allow creation of one or more memcached daemons on custom ports. Some memory settings may be configured.

The structure of an app package is discussed here.

In this example, the application package created looks as follows:

unzip -l "$@" jmemcached-1.0.0.zip
Archive:  /jmemcached-1.0.0.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
      637  07-15-14 19:17   appConfig-default.json
     1673  07-15-14 17:58   metainfo.xml
        0  07-15-14 17:54   package/
        0  07-15-14 18:03   package/files/
   122880  07-15-14 18:03   package/files/jmemcached-1.0.0.tar
        0  07-15-14 19:31   package/scripts/
     1530  07-15-14 19:31   package/scripts/memcached.py
     1287  07-15-14 18:46   package/scripts/params.py
     1581  07-15-14 19:16   README.txt
      252  07-15-14 17:58   resources-default.json

Step 1: Create metainfo.xml

The minimal metainfo contains some information about the application (name, comment, version), at least one component type, in this case its MEMCACHED, and information about the tarball. More details are available here.

<metainfo>
  <schemaVersion>2.0</schemaVersion>
  <application>
    <name>MEMCACHED</name>
    <comment>Memcache is a network accessible key/value storage system, often used as a distributed cache.</comment>
    <version>1.0.0</version>
    <exportedConfigs>None</exportedConfigs>
    <exportGroups>
      <exportGroup>
        <name>Servers</name>
        <exports>
          <export>
            <name>host_port</name>
            <value>${MEMCACHED_HOST}:${site.global.listen_port}</value>
          </export>
        </exports>
      </exportGroup>
    </exportGroups>

    <components>
      <component>
        <name>MEMCACHED</name>
        <category>MASTER</category>
        <compExports>Servers-host_port</compExports>
        <commandScript>
          <script>scripts/memcached.py</script>
          <scriptType>PYTHON</scriptType>
        </commandScript>
      </component>
    </components>

    <osSpecifics>
      <osSpecific>
        <osType>any</osType>
        <packages>
          <package>
            <type>tarball</type>
            <name>files/jmemcached-1.0.0.tar</name>
          </package>
        </packages>
      </osSpecific>
    </osSpecifics>

  </application>
</metainfo>

Step 2: Ensure application tarball

Most applications release a tarball that you can download otherwise you can create one. For this sample, we created a simple tarball that contains the cli and core jar from jmemcached.

tar tvf jmemcached-1.0.0.tar
drwxr-xr-x  0 smohanty staff       0 Nov  5 20:22 ./
-rw-r--r--  0 yarn   hadoop  13537 Jul 15 17:51 jmemcached-cli-1.0.0.jar
-rwxr-xr-x  0 yarn   hadoop 101467 Jul 15 17:51 jmemcached-core-1.0.0.jar

Step 3: Create a default resources file (resources.json)

By default all resources.json files must include a slider-appmaster component Add one more entry for the component MEMCACHED and assign a unique priority and default number of instances. Ensure, that a suitable default value is provided for yarn.memory. Additional details are available here).

{
  "schema" : "http://example.org/specification/v2.0.0",
  "metadata" : {
  },
  "global" : {
  },
  "components": {
    "slider-appmaster": {
    },
    "MEMCACHED": {
      "yarn.role.priority": "1",
      "yarn.component.instances": "1",
      "yarn.memory": "256"
    }
  }
}

Step 4: Create a default configuration template (appConfig.json)

The config template has few mandatory parameters such as

  • application.def - location of the application definition package in default FS (e.g. HDFS). This is where the application package is stored.
  • java_home - location of java on target hosts

Add other parameters needed by the application itself. Currently we support three parameters

  • site.global.additional_cp - this package uses this variable to provide the location of other helper jars - ones found in the hadoop client jars location (e.g. /usr/lib/hadoop/lib, your deployment may have the jars in a different location)
  • site.global.xmx_val - value of Xmx
  • site.global.xms_val - value of Xms
  • site.global.memory_val - value of memory
  • site.global.listen_port - let Slider know that ports need to be allocated

You can add additional parameters as needed.

{
  "schema": "http://example.org/specification/v2.0.0",
  "metadata": {
  },
  "global": {
    "application.def": ".slider/package/MEMCACHED/jmemcached-1.0.0.zip",
    "java_home": "/usr/jdk64/jdk1.7.0_67",

    "site.global.additional_cp": "/usr/lib/hadoop/lib/*",
    "site.global.xmx_val": "256m",
    "site.global.xms_val": "128m",
    "site.global.memory_val": "200M",
    "site.global.listen_port": "${MEMCACHED.ALLOCATED_PORT}{PER_CONTAINER}"
  },
  "components": {
    "slider-appmaster": {
      "jvm.heapsize": "256M"
    }
  }
}

Additional details on how to define a configuration template is here.

Step 5: Implement the basic commands

All Slider applications are expected to implement INSTALL/CONFIGURE/START/STOP/STATUS for each component. Some of the implementations can be NOP - in our case, we will implement only INSTALL and START and leave rest of the code as default. Note, Slider has an extensive library that can be used to implement the commands. More details can be found here.

The parameters file we will use is:

from resource_management import *

config = Script.get_config()

app_root = config['configurations']['global']['app_root']
java64_home = config['hostLevelParams']['java_home']
pid_file = config['configurations']['global']['pid_file']

additional_cp = config['configurations']['global']['additional_cp']
xmx_val = config['configurations']['global']['xmx_val']
xms_val = config['configurations']['global']['xms_val']
memory_val = config['configurations']['global']['memory_val']
port = config['configurations']['global']['listen_port']

Note that the parameter.py file only reads the parameters needed by the command implementations.

import sys
from resource_management import *

class Memcached(Script):
  def install(self, env):
    self.install_packages(env)

  def configure(self, env):
    import params
    env.set_params(params)

  def start(self, env):
    import params
    env.set_params(params)
    self.configure(env)
    process_cmd = format("{java64_home}/bin/java -Xmx{xmx_val} -Xms{xms_val} -classpath {app_root}/*:{additional_cp} com.thimbleware.jmemcached.Main --memory={memory_val} --port={port}")

    Execute(process_cmd,
        logoutput=False,
        wait_for_finish=False,
        pid_file=params.pid_file
    )

  def stop(self, env):
    import params
    env.set_params(params)

  def status(self, env):
    import params
    env.set_params(params)
    check_process_status(params.pid_file)

if __name__ == "__main__":
  Memcached().execute()

Thats pretty much it. The script basically does the following:

  • Expand the given tarball
  • Reads the provided configuration and creates the command string
  • Executes the command to start jmemcached
  • Start writes the PID into a file that is used to check status of the daemon

Debugging Tips

End of the day, the above package runs the following command - post formatting

format("{java64_home}/bin/java -Xmx{xmx_val} -Xms{xms_val} -classpath {app_root}/*:{additional_cp} com.thimbleware.jmemcached.Main --memory={memory_val} --port={port}")

which, expands to

/usr/jdk64/jdk1.7.0_67/bin/java -Xmx256m -Xms128m -classpath /hadoop/yarn/local/usercache/yarn/appcache/application_1428879923172_0003/container_e01_1428879923172_0003_01_000002/app/install/*:/usr/lib/hadoop/lib/* com.thimbleware.jmemcached.Main --memory=200M --port=37539

port and memory are based on the input provided. And, class path includes YARN container location for the active container where the tarball got expanded.

So, without Slider/YARN, one should be able to execute the command above and see memcached up and running. It might be a better option to try the command first to ensure that you have the right tarballs and environment to run memcached.

Add on package

You can deploy the application with add on packages as well. Add on packages are extension libraries, configurations, or scripts that the master application can use. For example, HBase has Phoenix providing SQL access to it and Ranger providing Authorization added to it. With Slider, you can deploy HBase as the master application, and Phoenix and/or Ranger as the add on package to it

To do that, you need to package your add on packages in a similar way to the master package. For example, in order to deploy Phoenix with HBase, you need to create an add on package as below:

unzip -l "$@" Phoenix.zip
Archive:  Archive.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
     2143  04-19-15 12:03   metainfo.xml
        0  11-19-14 15:17   package/
        0  04-19-15 11:41   package/files/
     1908  12-04-14 17:07   package/files/end2endTest.py
      840  12-04-14 17:07   package/files/hadoop-metrics2-hbase.properties
     2271  12-04-14 17:07   package/files/hadoop-metrics2-phoenix.properties
     1136  12-04-14 17:07   package/files/hbase-site.xml
     2342  12-04-14 17:07   package/files/log4j.properties
     3762  12-04-14 17:07   package/files/performance.py
 38015961  12-04-14 17:07   package/files/phoenix-4.2.2-client.jar
  4039324  12-04-14 17:07   package/files/phoenix-4.2.2-server.jar
     3171  12-04-14 17:07   package/files/phoenix_utils.py
     1669  12-04-14 17:07   package/files/psql.py
     1820  12-04-14 17:07   package/files/readme.txt
     2314  12-04-14 17:07   package/files/sqlline.py
        0  04-19-15 21:57   package/scripts/
     2287  04-19-15 21:57   package/scripts/addon_hbase_master.py
     1659  04-19-15 11:45   package/scripts/addon_hbase_regionserver.py
     1079  04-18-15 17:36   package/scripts/params.py
        0  04-17-15 18:42   package/templates/
 --------                   -------
 42106790                   21 files

Please note that different from the master package, the add on package doesn't have appConfig.json and resources.json, because appConfig.json is not supported for now, while you can define all the variables you need in the master package's appConfig.json. For add on package, you should/can not add any new components to the master application, so resources.json is not needed either.

What is remaining is the metainfo.xml, in which you should/can not add any new components. However, you can apply some components in the add on package's metainfo.json belonging to all components in the master package's metainfo.json by specifying 'ALL' in the component name

<metainfo>
  <schemaVersion>2.0</schemaVersion>
  <applicationPackage>
    <name>PHOENIX</name>
    <comment>
      Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data.
    </comment>
    <version>2.0</version>
    <type>ADDON-PACKAGE</type>
    <minHadoopVersion>1.0</minHadoopVersion>

    <components>
      <component>
        <name>HBASE_REGIONSERVER</name>
        <commandScript>
          <script>scripts/addon_hbase_regionserver.py</script>
          <scriptType>PYTHON</scriptType>
          <timeout>600</timeout>
        </commandScript>
      </component>
      <component>
        <name>HBASE_MASTER</name>
        <commandScript>
          <script>scripts/addon_hbase_master.py</script>
          <scriptType>PYTHON</scriptType>
          <timeout>600</timeout>
        </commandScript>
      </component>
    </components>

    <osSpecifics>
      <osSpecific>
        <osType>any</osType>
        <packages>
          <package>
            <type>tarball</type>
            <name>files/phoenix-4.2.2-server.jar</name>
          </package>
        </packages>
      </osSpecific>
    </osSpecifics>
  </applicationPackage>
</metainfo>

Please note the type of the application package is specified as 'ADDON-PACKAGE'.

The add on package is expected to implement INSTALL for each component, without other commands defined for the master application. Note, you can still use Slider's extensive library that can be used to implement the commands. More details can be found here.

Below is an example of the python script for HbaseMaster:

import sys
import os
from shutil import copyfile
from resource_management import *


class HbaseMaster(Script):
  def install(self, env):
    config = Script.get_config()

    src = config['commandParams']['addonPackageRoot'] + "/package/files/phoenix-4.2.2-client.jar"
    dst = config['configurations']['global']['app_root'] + "/lib/phoenix-4.2.2-client.jar"
    copyfile(src, dst)

    filestocopytobin = ["end2endTest.py","hadoop-metrics2-hbase.properties","hadoop-metrics2-phoenix.properties","hbase-site.xml","log4j.properties","performance.py","phoenix_utils.py","psql.py","readme.txt","sqlline.py"]

    for file in filestocopytobin:
      src = config['commandParams']['addonPackageRoot'] + "/package/files/" + file
      dst = config['configurations']['global']['app_root'] + "/bin/" + file
      copyfile(src, dst)
    bin_file = config['configurations']['global']['app_root'] + "/bin/" + "sqlline.py"
    os.chmod(bin_file, 0555)

if __name__ == "__main__":
  HbaseMaster().execute()
  pass

Below is an example of the python script for HbaseRegionServer:

import sys
from shutil import copyfile
from resource_management import *


class HbaseRegionserver(Script):
  def install(self, env):
    config = Script.get_config()
    src = config['commandParams']['addonPackageRoot'] + "/package/files/phoenix-4.2.2-server.jar"
    dst = config['configurations']['global']['app_root'] + "/lib/phoenix-4.2.2-server.jar"
    copyfile(src, dst)

  def configure(self, env):
    import params
    env.set_params(params)


  def start(self, env):
    import params
    env.set_params(params)
    self.configure(env) # for security


  def stop(self, env):
    import params
    env.set_params(params)


  def status(self, env):
    import status_params
    env.set_params(status_params)

if __name__ == "__main__":
  HbaseRegionserver().execute()
  pass

Similarly, you can provide params.py to provide all configuration needed to run the install command

from resource_management import *

# server configurations
config = Script.get_config()

hbase_root = config['configurations']['global']['app_root']
jar_location = config['commandParams']['addonPackageRoot'] + "/package/files/phoenix-4.2.2-server.jar"

print ('jar_location' + jar_location)

When submitting an application with add on package with Slider, you need to use '--addon' option to specify the add on package name and path to its zipped package

slider create [application_name] --template [path to appConfig.json] --resources [path to resources.json] --addon PHOENIX [path to Phoenix.zip]