Goal¶
Define the Slider application package for YARN hosted jmemcached. Jmemcached is a functionally equivalent java implementation of memcached, a distributed memory object caching system. The memcached daemons export the host/port they are listening on.
Basic version¶
The basic version of the app will allow creation of one or more memcached daemons on custom ports. Some memory settings may be configured.
The structure of an app package is discussed here.
In this example, the application package created looks as follows:
unzip -l "$@" jmemcached-1.0.0.zip Archive: /jmemcached-1.0.0.zip Length Date Time Name -------- ---- ---- ---- 637 07-15-14 19:17 appConfig-default.json 1673 07-15-14 17:58 metainfo.xml 0 07-15-14 17:54 package/ 0 07-15-14 18:03 package/files/ 122880 07-15-14 18:03 package/files/jmemcached-1.0.0.tar 0 07-15-14 19:31 package/scripts/ 1530 07-15-14 19:31 package/scripts/memcached.py 1287 07-15-14 18:46 package/scripts/params.py 1581 07-15-14 19:16 README.txt 252 07-15-14 17:58 resources-default.json
Step 1: Create metainfo.xml¶
The minimal metainfo contains some information about the application (name, comment, version), at least one component type, in this case its MEMCACHED, and information about the tarball. More details are available here.
<metainfo> <schemaVersion>2.0</schemaVersion> <application> <name>MEMCACHED</name> <comment>Memcache is a network accessible key/value storage system, often used as a distributed cache.</comment> <version>1.0.0</version> <exportedConfigs>None</exportedConfigs> <exportGroups> <exportGroup> <name>Servers</name> <exports> <export> <name>host_port</name> <value>${MEMCACHED_HOST}:${site.global.listen_port}</value> </export> </exports> </exportGroup> </exportGroups> <components> <component> <name>MEMCACHED</name> <category>MASTER</category> <compExports>Servers-host_port</compExports> <commandScript> <script>scripts/memcached.py</script> <scriptType>PYTHON</scriptType> </commandScript> </component> </components> <osSpecifics> <osSpecific> <osType>any</osType> <packages> <package> <type>tarball</type> <name>files/jmemcached-1.0.0.tar</name> </package> </packages> </osSpecific> </osSpecifics> </application> </metainfo>
Step 2: Ensure application tarball¶
Most applications release a tarball that you can download otherwise you can create one. For this sample, we created a simple tarball that contains the cli and core jar from jmemcached.
tar tvf jmemcached-1.0.0.tar drwxr-xr-x 0 smohanty staff 0 Nov 5 20:22 ./ -rw-r--r-- 0 yarn hadoop 13537 Jul 15 17:51 jmemcached-cli-1.0.0.jar -rwxr-xr-x 0 yarn hadoop 101467 Jul 15 17:51 jmemcached-core-1.0.0.jar
Step 3: Create a default resources file (resources.json)¶
By default all resources.json
files must include a slider-appmaster
component
Add one more entry for the component MEMCACHED
and assign a unique priority and default number of instances. Ensure, that a suitable default value is provided for yarn.memory. Additional details are available here).
{ "schema" : "http://example.org/specification/v2.0.0", "metadata" : { }, "global" : { }, "components": { "slider-appmaster": { }, "MEMCACHED": { "yarn.role.priority": "1", "yarn.component.instances": "1", "yarn.memory": "256" } } }
Step 4: Create a default configuration template (appConfig.json)¶
The config template has few mandatory parameters such as
application.def
- location of the application definition package in default FS (e.g. HDFS). This is where the application package is stored.java_home
- location of java on target hosts
Add other parameters needed by the application itself. Currently we support three parameters
site.global.additional_cp
- this package uses this variable to provide the location of other helper jars - ones found in the hadoop client jars location (e.g. /usr/lib/hadoop/lib, your deployment may have the jars in a different location)site.global.xmx_val
- value of Xmxsite.global.xms_val
- value of Xmssite.global.memory_val
- value of memorysite.global.listen_port
- let Slider know that ports need to be allocated
You can add additional parameters as needed.
{ "schema": "http://example.org/specification/v2.0.0", "metadata": { }, "global": { "application.def": ".slider/package/MEMCACHED/jmemcached-1.0.0.zip", "java_home": "/usr/jdk64/jdk1.7.0_67", "site.global.additional_cp": "/usr/lib/hadoop/lib/*", "site.global.xmx_val": "256m", "site.global.xms_val": "128m", "site.global.memory_val": "200M", "site.global.listen_port": "${MEMCACHED.ALLOCATED_PORT}{PER_CONTAINER}" }, "components": { "slider-appmaster": { "jvm.heapsize": "256M" } } }
Additional details on how to define a configuration template is here.
Step 5: Implement the basic commands¶
All Slider applications are expected to implement INSTALL/CONFIGURE/START/STOP/STATUS for each component. Some of the implementations can be NOP - in our case, we will implement only INSTALL and START and leave rest of the code as default. Note, Slider has an extensive library that can be used to implement the commands. More details can be found here.
The parameters file we will use is:
from resource_management import * config = Script.get_config() app_root = config['configurations']['global']['app_root'] java64_home = config['hostLevelParams']['java_home'] pid_file = config['configurations']['global']['pid_file'] additional_cp = config['configurations']['global']['additional_cp'] xmx_val = config['configurations']['global']['xmx_val'] xms_val = config['configurations']['global']['xms_val'] memory_val = config['configurations']['global']['memory_val'] port = config['configurations']['global']['listen_port']
Note that the parameter.py
file only reads the parameters needed by the command implementations.
import sys from resource_management import * class Memcached(Script): def install(self, env): self.install_packages(env) def configure(self, env): import params env.set_params(params) def start(self, env): import params env.set_params(params) self.configure(env) process_cmd = format("{java64_home}/bin/java -Xmx{xmx_val} -Xms{xms_val} -classpath {app_root}/*:{additional_cp} com.thimbleware.jmemcached.Main --memory={memory_val} --port={port}") Execute(process_cmd, logoutput=False, wait_for_finish=False, pid_file=params.pid_file ) def stop(self, env): import params env.set_params(params) def status(self, env): import params env.set_params(params) check_process_status(params.pid_file) if __name__ == "__main__": Memcached().execute()
Thats pretty much it. The script basically does the following:
- Expand the given tarball
- Reads the provided configuration and creates the command string
- Executes the command to start jmemcached
- Start writes the PID into a file that is used to check status of the daemon
Debugging Tips¶
End of the day, the above package runs the following command - post formatting
format("{java64_home}/bin/java -Xmx{xmx_val} -Xms{xms_val} -classpath {app_root}/*:{additional_cp} com.thimbleware.jmemcached.Main --memory={memory_val} --port={port}")
which, expands to
/usr/jdk64/jdk1.7.0_67/bin/java -Xmx256m -Xms128m -classpath /hadoop/yarn/local/usercache/yarn/appcache/application_1428879923172_0003/container_e01_1428879923172_0003_01_000002/app/install/*:/usr/lib/hadoop/lib/* com.thimbleware.jmemcached.Main --memory=200M --port=37539 port and memory are based on the input provided. And, class path includes YARN container location for the active container where the tarball got expanded.
So, without Slider/YARN, one should be able to execute the command above and see memcached up and running. It might be a better option to try the command first to ensure that you have the right tarballs and environment to run memcached.
Add on package¶
You can deploy the application with add on packages as well. Add on packages are extension libraries, configurations, or scripts that the master application can use. For example, HBase has Phoenix providing SQL access to it and Ranger providing Authorization added to it. With Slider, you can deploy HBase as the master application, and Phoenix and/or Ranger as the add on package to it
To do that, you need to package your add on packages in a similar way to the master package. For example, in order to deploy Phoenix with HBase, you need to create an add on package as below:
unzip -l "$@" Phoenix.zip Archive: Archive.zip Length Date Time Name -------- ---- ---- ---- 2143 04-19-15 12:03 metainfo.xml 0 11-19-14 15:17 package/ 0 04-19-15 11:41 package/files/ 1908 12-04-14 17:07 package/files/end2endTest.py 840 12-04-14 17:07 package/files/hadoop-metrics2-hbase.properties 2271 12-04-14 17:07 package/files/hadoop-metrics2-phoenix.properties 1136 12-04-14 17:07 package/files/hbase-site.xml 2342 12-04-14 17:07 package/files/log4j.properties 3762 12-04-14 17:07 package/files/performance.py 38015961 12-04-14 17:07 package/files/phoenix-4.2.2-client.jar 4039324 12-04-14 17:07 package/files/phoenix-4.2.2-server.jar 3171 12-04-14 17:07 package/files/phoenix_utils.py 1669 12-04-14 17:07 package/files/psql.py 1820 12-04-14 17:07 package/files/readme.txt 2314 12-04-14 17:07 package/files/sqlline.py 0 04-19-15 21:57 package/scripts/ 2287 04-19-15 21:57 package/scripts/addon_hbase_master.py 1659 04-19-15 11:45 package/scripts/addon_hbase_regionserver.py 1079 04-18-15 17:36 package/scripts/params.py 0 04-17-15 18:42 package/templates/ -------- ------- 42106790 21 files
Please note that different from the master package, the add on package doesn't have appConfig.json and resources.json, because appConfig.json is not supported for now, while you can define all the variables you need in the master package's appConfig.json. For add on package, you should/can not add any new components to the master application, so resources.json is not needed either.
What is remaining is the metainfo.xml, in which you should/can not add any new components. However, you can apply some components in the add on package's metainfo.json belonging to all components in the master package's metainfo.json by specifying 'ALL' in the component name
<metainfo> <schemaVersion>2.0</schemaVersion> <applicationPackage> <name>PHOENIX</name> <comment> Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. </comment> <version>2.0</version> <type>ADDON-PACKAGE</type> <minHadoopVersion>1.0</minHadoopVersion> <components> <component> <name>HBASE_REGIONSERVER</name> <commandScript> <script>scripts/addon_hbase_regionserver.py</script> <scriptType>PYTHON</scriptType> <timeout>600</timeout> </commandScript> </component> <component> <name>HBASE_MASTER</name> <commandScript> <script>scripts/addon_hbase_master.py</script> <scriptType>PYTHON</scriptType> <timeout>600</timeout> </commandScript> </component> </components> <osSpecifics> <osSpecific> <osType>any</osType> <packages> <package> <type>tarball</type> <name>files/phoenix-4.2.2-server.jar</name> </package> </packages> </osSpecific> </osSpecifics> </applicationPackage> </metainfo>
Please note the type of the application package is specified as 'ADDON-PACKAGE'.
The add on package is expected to implement INSTALL for each component, without other commands defined for the master application. Note, you can still use Slider's extensive library that can be used to implement the commands. More details can be found here.
Below is an example of the python script for HbaseMaster:
import sys import os from shutil import copyfile from resource_management import * class HbaseMaster(Script): def install(self, env): config = Script.get_config() src = config['commandParams']['addonPackageRoot'] + "/package/files/phoenix-4.2.2-client.jar" dst = config['configurations']['global']['app_root'] + "/lib/phoenix-4.2.2-client.jar" copyfile(src, dst) filestocopytobin = ["end2endTest.py","hadoop-metrics2-hbase.properties","hadoop-metrics2-phoenix.properties","hbase-site.xml","log4j.properties","performance.py","phoenix_utils.py","psql.py","readme.txt","sqlline.py"] for file in filestocopytobin: src = config['commandParams']['addonPackageRoot'] + "/package/files/" + file dst = config['configurations']['global']['app_root'] + "/bin/" + file copyfile(src, dst) bin_file = config['configurations']['global']['app_root'] + "/bin/" + "sqlline.py" os.chmod(bin_file, 0555) if __name__ == "__main__": HbaseMaster().execute() pass
Below is an example of the python script for HbaseRegionServer:
import sys from shutil import copyfile from resource_management import * class HbaseRegionserver(Script): def install(self, env): config = Script.get_config() src = config['commandParams']['addonPackageRoot'] + "/package/files/phoenix-4.2.2-server.jar" dst = config['configurations']['global']['app_root'] + "/lib/phoenix-4.2.2-server.jar" copyfile(src, dst) def configure(self, env): import params env.set_params(params) def start(self, env): import params env.set_params(params) self.configure(env) # for security def stop(self, env): import params env.set_params(params) def status(self, env): import status_params env.set_params(status_params) if __name__ == "__main__": HbaseRegionserver().execute() pass
Similarly, you can provide params.py to provide all configuration needed to run the install command
from resource_management import * # server configurations config = Script.get_config() hbase_root = config['configurations']['global']['app_root'] jar_location = config['commandParams']['addonPackageRoot'] + "/package/files/phoenix-4.2.2-server.jar" print ('jar_location' + jar_location)
When submitting an application with add on package with Slider, you need to use '--addon' option to specify the add on package name and path to its zipped package
slider create [application_name] --template [path to appConfig.json] --resources [path to resources.json] --addon PHOENIX [path to Phoenix.zip]