~~ Licensed under the Apache License, Version 2.0 (the "License"); ~~ you may not use this file except in compliance with the License. ~~ You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. See accompanying LICENSE file. --- Native Libraries Guide --- --- ${maven.build.timestamp} Native Libraries Guide %{toc|section=1|fromDepth=0} * Overview This guide describes the native hadoop library and includes a small discussion about native shared libraries. Note: Depending on your environment, the term "native libraries" could refer to all *.so's you need to compile; and, the term "native compression" could refer to all *.so's you need to compile that are specifically related to compression. Currently, however, this document only addresses the native hadoop library (<<>>). * Native Hadoop Library Hadoop has native implementations of certain components for performance reasons and for non-availability of Java implementations. These components are available in a single, dynamically-linked native library called the native hadoop library. On the *nix platforms the library is named <<>>. * Usage It is fairly easy to use the native hadoop library: [[1]] Review the components. [[2]] Review the supported platforms. [[3]] Either download a hadoop release, which will include a pre-built version of the native hadoop library, or build your own version of the native hadoop library. Whether you download or build, the name for the library is the same: libhadoop.so [[4]] Install the compression codec development packages (>zlib-1.2, >gzip-1.2): + If you download the library, install one or more development packages - whichever compression codecs you want to use with your deployment. + If you build the library, it is mandatory to install both development packages. [[5]] Check the runtime log files. * Components The native hadoop library includes two components, the zlib and gzip compression codecs: * zlib * gzip The native hadoop library is imperative for gzip to work. * Supported Platforms The native hadoop library is supported on *nix platforms only. The library does not to work with Cygwin or the Mac OS X platform. The native hadoop library is mainly used on the GNU/Linus platform and has been tested on these distributions: * RHEL4/Fedora * Ubuntu * Gentoo On all the above distributions a 32/64 bit native hadoop library will work with a respective 32/64 bit jvm. * Download The pre-built 32-bit i386-Linux native hadoop library is available as part of the hadoop distribution and is located in the <<>> directory. You can download the hadoop distribution from Hadoop Common Releases. Be sure to install the zlib and/or gzip development packages - whichever compression codecs you want to use with your deployment. * Build The native hadoop library is written in ANSI C and is built using the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool). This means it should be straight-forward to build the library on any platform with a standards-compliant C compiler and the GNU autotools-chain (see the supported platforms). The packages you need to install on the target platform are: * C compiler (e.g. GNU C Compiler) * GNU Autools Chain: autoconf, automake, libtool * zlib-development package (stable version >= 1.2.0) Once you installed the prerequisite packages use the standard hadoop build.xml file and pass along the compile.native flag (set to true) to build the native hadoop library: ---- $ ant -Dcompile.native=true ---- You should see the newly-built library in: ---- $ build/native//lib ---- where is a combination of the system-properties: ${os.name}-${os.arch}-${sun.arch.data.model} (for example, Linux-i386-32). Please note the following: * It is mandatory to install both the zlib and gzip development packages on the target platform in order to build the native hadoop library; however, for deployment it is sufficient to install just one package if you wish to use only one codec. * It is necessary to have the correct 32/64 libraries for zlib, depending on the 32/64 bit jvm for the target platform, in order to build and deploy the native hadoop library. * Runtime The bin/hadoop script ensures that the native hadoop library is on the library path via the system property: <<<-Djava.library.path= >>> During runtime, check the hadoop log files for your MapReduce tasks. * If everything is all right, then: <<>> <<>> * If something goes wrong, then: <<>> * Native Shared Libraries You can load any native shared library using DistributedCache for distributing and symlinking the library files. This example shows you how to distribute a shared library, mylib.so, and load it from a MapReduce task. [[1]] First copy the library to the HDFS: <<>> [[2]] The job launching program should contain the following: <<>> <<>> [[3]] The MapReduce task can contain: <<>> Note: If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to make the library available to your MapReduce tasks.