~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~   http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.

  ---
  Native Libraries Guide
  ---
  ---
  ${maven.build.timestamp}

Native Libraries Guide

%{toc|section=1|fromDepth=0}

* Overview

   This guide describes the native hadoop library and includes a small
   discussion about native shared libraries.

   Note: Depending on your environment, the term "native libraries" could
   refer to all *.so's you need to compile; and, the term "native
   compression" could refer to all *.so's you need to compile that are
   specifically related to compression. Currently, however, this document
   only addresses the native hadoop library (<<<libhadoop.so>>>).

* Native Hadoop Library

   Hadoop has native implementations of certain components for performance
   reasons and for non-availability of Java implementations. These
   components are available in a single, dynamically-linked native library
   called the native hadoop library. On the *nix platforms the library is
   named <<<libhadoop.so>>>.

* Usage

   It is fairly easy to use the native hadoop library:

    [[1]] Review the components.

    [[2]] Review the supported platforms.

    [[3]] Either download a hadoop release, which will include a pre-built
       version of the native hadoop library, or build your own version of
       the native hadoop library. Whether you download or build, the name
       for the library is the same: libhadoop.so

    [[4]] Install the compression codec development packages (>zlib-1.2,
       >gzip-1.2):
          + If you download the library, install one or more development
            packages - whichever compression codecs you want to use with
            your deployment.
          + If you build the library, it is mandatory to install both
            development packages.

    [[5]] Check the runtime log files.

* Components

   The native hadoop library includes two components, the zlib and gzip
   compression codecs:

     * zlib

     * gzip

   The native hadoop library is imperative for gzip to work.

* Supported Platforms

   The native hadoop library is supported on *nix platforms only. The
   library does not to work with Cygwin or the Mac OS X platform.

   The native hadoop library is mainly used on the GNU/Linus platform and
   has been tested on these distributions:

     * RHEL4/Fedora

     * Ubuntu

     * Gentoo

   On all the above distributions a 32/64 bit native hadoop library will
   work with a respective 32/64 bit jvm.

* Download

   The pre-built 32-bit i386-Linux native hadoop library is available as
   part of the hadoop distribution and is located in the <<<lib/native>>>
   directory. You can download the hadoop distribution from Hadoop Common
   Releases.

   Be sure to install the zlib and/or gzip development packages -
   whichever compression codecs you want to use with your deployment.

* Build

   The native hadoop library is written in ANSI C and is built using the
   GNU autotools-chain (autoconf, autoheader, automake, autoscan,
   libtool). This means it should be straight-forward to build the library
   on any platform with a standards-compliant C compiler and the GNU
   autotools-chain (see the supported platforms).

   The packages you need to install on the target platform are:

     * C compiler (e.g. GNU C Compiler)

     * GNU Autools Chain: autoconf, automake, libtool

     * zlib-development package (stable version >= 1.2.0)

   Once you installed the prerequisite packages use the standard hadoop
   build.xml file and pass along the compile.native flag (set to true) to
   build the native hadoop library:

----
   $ ant -Dcompile.native=true <target>
----

   You should see the newly-built library in:

----
   $ build/native/<platform>/lib
----

   where <platform> is a combination of the system-properties:
   ${os.name}-${os.arch}-${sun.arch.data.model} (for example,
   Linux-i386-32).

   Please note the following:

     * It is mandatory to install both the zlib and gzip development
       packages on the target platform in order to build the native hadoop
       library; however, for deployment it is sufficient to install just
       one package if you wish to use only one codec.

     * It is necessary to have the correct 32/64 libraries for zlib,
       depending on the 32/64 bit jvm for the target platform, in order to
       build and deploy the native hadoop library.

* Runtime

   The bin/hadoop script ensures that the native hadoop library is on the
   library path via the system property:
   <<<-Djava.library.path=<path> >>>

   During runtime, check the hadoop log files for your MapReduce tasks.

     * If everything is all right, then:
       <<<DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...>>>
       <<<INFO util.NativeCodeLoader - Loaded the native-hadoop library>>>

     * If something goes wrong, then:
       <<<INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable>>>

* Native Shared Libraries

   You can load any native shared library using DistributedCache for
   distributing and symlinking the library files.

   This example shows you how to distribute a shared library, mylib.so,
   and load it from a MapReduce task.

    [[1]] First copy the library to the HDFS:
       <<<bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1>>>

    [[2]] The job launching program should contain the following:
       <<<DistributedCache.createSymlink(conf);>>>
       <<<DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);>>>

    [[3]] The MapReduce task can contain:
       <<<System.loadLibrary("mylib.so");>>>

   Note: If you downloaded or built the native hadoop library, you don’t
   need to use DistibutedCache to make the library available to your
   MapReduce tasks.