Building Apache Slider¶
Here's how to set this up.
Before you begin¶
Networking¶
The network on the development system must be functional, with hostname lookup of the local host working. Tests will fail without this. For maven builds to work, remote network access is often a pre-requisite.
Java¶
Slider is built on Java 7 or later.
Python¶
Slider requires python 2.7+
Maven¶
You will need a version of Apache Maven 3.0+, set up with enough memory
MAVEN_OPTS=-Xms256m -Xmx512m -Djava.awt.headless=true
Important: As of December 2014, Maven 3.1 is not supported due to version issues.
Protoc¶
You need a copy of the protoc
compiler for protobuf compilation
- OS/X:
brew install protobuf
- Others: consult Building Hadoop documentation.
The version of protoc
installed must be the same as that used by Hadoop itself.
This is absolutely critical to prevent JAR version problems.
Obtain the source code¶
Download the source tarball for the release to be build. Alternatively, you can clone the Slider git repo.
git clone https://git-wip-us.apache.org/repos/asf/incubator-slider.git -b develop
Build, install and run unit tests¶
mvn clean install
Slider code base includes unit tests and functional tests. By default functional tests are not run as these tests are developed to be run against live hadoop clusters and require some manual setup. You can run them based as described at functional test.
Create Slider Package¶
mvn clean site:site site:stage package -DskipTests
At this point you are ready to use the Slider toolset. The build instructions below are optional and are proven to be useful when debugging deep into the hadoop code base.
Building a compatible version of Apache Hadoop¶
Slider is built against Apache Hadoop -you can download and install a copy from the Apache Hadoop Web Site.
During development, its convenient (but not mandatory) to have a local version of Hadoop -so that we can find and fix bugs/add features in Hadoop as well in Slider.
To build and install locally, check out apache svn/github, branch branch-2
,
and create a branch off that tag
git clone git://git.apache.org/hadoop-common.git cd hadoop-common git fetch --tags origin git checkout -b origin/branch-2 git checkout -b branch-2 export HADOOP_VERSION=2.6.0-SNAPSHOT
(to build against a release, check out that specific release and create a branch off it:)
git checkout release-2.6.0 -- git checkout -b release-2.6.0
To build and install it locally, skipping the tests:
mvn clean install -DskipTests
To make a tarball for use in test runs:
#On osx mvn clean install package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true # on linux mvn clean package -Pdist -Pnative -Dtar -DskipTests -Dmaven.javadoc.skip=true
Then clean expand this
pushd hadoop-dist/target/ tar -xvzf hadoop-$HADOOP_VERSION.tar.gz popd
This creates an expanded version of Hadoop. You can now actually run Hadoop from this directory. Do note that unless you have the native code built for your target platform, Hadoop will be slower.
Building a compatible HBase version¶
If you need to build a version of HBase -rather than use a released version, here are the instructions (for the hbase-0.98 release branch)
Checkout the HBase trunk
branch from apache svn/github.
git clone git://git.apache.org/hbase.git cd hbase git remote rename origin apache git fetch --tags apache
then
git checkout -b apache/0.98
or
git checkout tags/0.98.4
If you have already been building versions of HBase, remove the existing set of artifacts for safety:
rm -rf ~/.m2/repository/org/apache/hbase/
The maven command for building hbase artifacts against this hadoop version is
mvn clean install assembly:single -DskipTests -Dmaven.javadoc.skip=true
To use a different version of Hadoop from that defined in the hadoop-two.version
property of/pom.xml
:
mvn clean install assembly:single -DskipTests -Dmaven.javadoc.skip=true -Dhadoop-two.version=$HADOOP_VERSION
This will create an hbase tar.gz
file in the directory hbase-assembly/target/
in the hbase source tree.
export HBASE_VERSION=0.98.4 pushd hbase-assembly/target gunzip hbase-$HBASE_VERSION-bin.tar.gz tar -xvf hbase-$HBASE_VERSION-bin.tar gzip hbase-$HBASE_VERSION-bin.tar popd
This will create an untarred directory containing
hbase. Both the .tar.gz
and untarred file are needed for testing. Most
tests just work directly with the untarred file as it saves time uploading
and downloading then expanding the file.
(and if you set HBASE_VERSION
to something else, you can pick up that version
-making sure that slider is in sync)
For more information (including recommended Maven memory configuration options), see HBase building
For building just the JAR files:
mvn clean install -DskipTests -Dhadoop.profile=2.0 -Dhadoop-two.version=$HADOOP_VERSION
Tip: you can force set a version in Maven by having it update all the POMs:
mvn versions:set -DnewVersion=0.98.1-SNAPSHOT
Building Accumulo¶
Clone accumulo from apache;
git clone http://git-wip-us.apache.org/repos/asf/accumulo.git
Check out the 1.6.1 tag
git checkout 1.6.1
In the accumulo project directory, build it
mvn clean install -Passemble -DskipTests -Dmaven.javadoc.skip=true \ -Dhadoop.profile=2
The default Hadoop version for accumulo-1.6.1 is hadoop 2.4.0; to build against a different version use the command
mvn clean install -Passemble -DskipTests -Dmaven.javadoc.skip=true \ -Dhadoop.profile=2 -Dhadoop.version=$HADOOP_VERSION
This creates an accumulo tar.gz file in assemble/target/
. Extract this
to create an expanded directory
accumulo/assemble/target/accumulo-1.6.1-bin.tar.gz
This can be done with the command sequence
export ACCUMULO_VERSION=1.6.1 pushd assemble/target/ gunzip -f accumulo-$ACCUMULO_VERSION-bin.tar.gz tar -xvf accumulo-$ACCUMULO_VERSION-bin.tar.gz popd
Note that the final location of the accumulo files is needed for the configuration,
it may be directly under target/ or it may be in a subdirectory, with
a path such as target/accumulo-$ACCUMULO_VERSION-dev/accumulo-$ACCUMULO_VERSION/
Building the Slider RPM¶
It is possible to build an RPM file for slider. This is an architecture
independent RPM with the artifacts and layout of the slider .tar file, but hosted
under /usr/lib/slider
.
The configuration directory is /usr/lib/slider/conf
; Binaries are found under
/usr/lib/slider/conf/bin
The RPM can only be built on a Linux system with the rpm
command installed;
it is an optional artifact that must be explicitly created by enabling the maven
rpm
profile
mvn clean install -DskipTests -Prpm
This creates an RPM under slider-assembly/target/rpm/slider/RPMS/noarch/
; the
RPM name is built from the project version, with timestamp generation during
-SNAPSHOT
builds to ensure that RPM updates succeed.
The RPM can be manually examined by open what is really an archive file, and verifying its contents.
For a release, the installation of the RPM itself must be verified
- build the RPM:
mvn install -Prpm
scp
the RPM to the target machine. If you build it on the target machine this step can be ommitted- Install the RPM
rpm -Uvh slider*.rpm
- Verify that the installation has succeeded by executing the
slider version
command. - Modify the configuration files in
/usr/lib/slider/conf
to bind slider to the target cluster. - To uninstall:
rpm -e slider
- Verify that the installation has succeeded by executing the
Example¶
# rpm -Uvh slider-0.31.0-incubating_SNAPSHOT20140709153353.noarch.rpm Preparing... ########################################### [100%] 1:slider ########################################### [100%] # ls -l /usr/lib/slider total 16 drwxr-xr-x 3 root root 4096 Jul 9 19:57 agent drwxr-xr-x 2 mapred hadoop 4096 Jul 9 19:57 bin drwxr-xr-x 2 mapred hadoop 4096 Jul 9 19:57 conf drw-r--r-- 2 mapred hadoop 4096 Jul 9 19:57 lib # ls -l /usr/lib/slider/bin total 12 -rwxr-xr-x 1 mapred hadoop 2345 Jul 9 2014 slider -rwxr-xr-x 1 mapred hadoop 5096 Jul 9 2014 slider.py # /usr/lib/slider/bin/slider version slider_home = "/usr/lib/slider" slider_jvm_opts = "-Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Xmx256m -Djava.confdir=/usr/lib/slider/conf" classpath = "/usr/lib/slider/lib/*:/usr/lib/slider/conf:" command is java -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Xmx256m -Djava.confdir=/usr/lib/slider/conf --classpath "/usr/lib/slider/lib/*:/usr/lib/slider/conf:" org.apache.slider.Slider version 2014-05-16 19:34:34,730 [main] INFO client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2014-05-16 19:34:35,300 [main] INFO client.SliderClient - Slider Core-0.31.0-incubating-SNAPSHOT Built against commit# d44d4c1bf0 2014-05-16 19:34:35,304 [main] INFO client.SliderClient - Compiled against Hadoop 2.4.0 2014-05-16 19:34:35,310 [main] INFO client.SliderClient - Hadoop runtime version branch-2.4.0 with source checksum 375b2832a6641759c6eaf6e3e998147 and build date 2014-03-31T08:31Z 2014-05-16 19:34:35,314 [main] INFO util.ExitUtil - Exiting with status 0
This output shows that slider is not configured yet: the Resource manager
Address of /0.0.0.0:8032
is invalid.
Uninstallation can also be tested
# rpm -e slider # /usr/lib/slider/bin/slider version -bash: /usr/lib/slider/bin/slider: No such file or directory
RPM Configuration files¶
The configuration directory of slider is chosen when the RPM is built. It is fixed
in slider-assembly/pom.xml
to src/conf-hdp
<src.confdir>src/conf-hdp</src.confdir>
This configuration sets the yarn.application.classpath
value to that required
by HDP installations. To target other installations, alternate maven profiles
will need to be defined.
Testing¶
Debugging a failing test¶
-
Locate the directory
target/$TESTNAME
where TESTNAME is the name of the test case and or test method. This directory contains the Mini YARN Cluster logs. For example,TestLiveRegionService
stores its data undertarget/TestLiveRegionService
-
Look under that directory for
-logdir
directories, then an application and container containing logs. There may be more than node being simulated; every node manager creates its own logdir. -
Look for the
out.txt
anderr.txt
files for stdout and stderr log output. -
Slider uses SLF4J to log to
out.txt
; remotely executed processes may use either stream for logging
Example:
target/TestLiveRegionService/TestLiveRegionService-logDir-nm-1_0/ application_1376095770244_0001/container_1376095770244_0001_01_000001/out.txt
- The actual test log from JUnit itself goes to the console and into
target/surefire/
; this shows the events happening in the YARN services as well as (if configured) HDFS and Zookeeper. It is noisy -everything after the teardown message happens during cluster teardown, after the test itself has been completed. Exceptions and messages here can generally be ignored.
This is all a bit complicated -debugging is simpler if a single test is run at a time, which is straightforward
mvn clean test -Dtest=TestLiveRegionService
Building the JAR file¶
You can create the JAR file and set up its directories with
mvn package -DskipTests
Development Notes¶
Git branch model¶
The git branch model uses is Git Flow.
This is a common workflow model for Git, and built in to Atlassian Source Tree.
The command line git-flow
tool is easy to install
brew install git-flow
or
apt-get install git-flow
You should then work on all significant features in their own branch and merge them back in when they are ready.
# until we get a public JIRA we're just using an in-house one. sorry git flow feature start SLIDER-8192 # finishes merges back in to develop/ git flow feature finish SLIDER-8192 # release branch git flow release start 0.4.0 git flow release finish 0.4.0
Groovy¶
Slider uses Groovy 2.x as its language for writing tests —for better assertions and easier handling of lists and closures. The first prototype used Groovy on the production source, this was dropped in favor of a Java-only production codebase.
Maven utils¶
Here are some handy aliases to make maven easier
alias mi="mvn install -DskipTests" alias mvi="mvn install -DskipTests" alias mci="mvn clean install -DskipTests" alias mvt="mvn test" alias mvct="mvn clean test" alias mvp="mvn package -DskipTests" alias mvcp="mvn clean package -DskipTests" alias mvnsite="mvn site:site -Dmaven.javadoc.skip=true" alias mvs="mvn site:site -Dmaven.javadoc.skip=true -DskipTests" alias mvndep="mvn dependency:tree -Dverbose"