This guide describes the native hadoop library and includes a small discussion about native shared libraries.
Note: Depending on your environment, the term “native libraries” could refer to all *.so’s you need to compile; and, the term “native compression” could refer to all *.so’s you need to compile that are specifically related to compression. Currently, however, this document only addresses the native hadoop library (libhadoop.so). The document for libhdfs library (libhdfs.so) is here.
Hadoop has native implementations of certain components for performance reasons and for non-availability of Java implementations. These components are available in a single, dynamically-linked native library called the native hadoop library. On the *nix platforms the library is named libhadoop.so.
It is fairly easy to use the native hadoop library:
The native hadoop library includes various components:
The native hadoop library is supported on *nix platforms only. The library does not to work with Cygwin or the Mac OS X platform.
The native hadoop library is mainly used on the GNU/Linus platform and has been tested on these distributions:
On all the above distributions a 32/64 bit native hadoop library will work with a respective 32/64 bit jvm.
The pre-built 32-bit i386-Linux native hadoop library is available as part of the hadoop distribution and is located in the lib/native directory. You can download the hadoop distribution from Hadoop Common Releases.
Be sure to install the zlib and/or gzip development packages - whichever compression codecs you want to use with your deployment.
The native hadoop library is written in ANSI C and is built using the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool). This means it should be straight-forward to build the library on any platform with a standards-compliant C compiler and the GNU autotools-chain (see the supported platforms).
The packages you need to install on the target platform are:
Once you installed the prerequisite packages use the standard hadoop pom.xml file and pass along the native flag to build the native hadoop library:
$ mvn package -Pdist,native -DskipTests -Dtar
You should see the newly-built library in:
$ hadoop-dist/target/hadoop-3.0.2/lib/native
Please note the following:
The bin/hadoop script ensures that the native hadoop library is on the library path via the system property: -Djava.library.path=<path>
During runtime, check the hadoop log files for your MapReduce tasks.
NativeLibraryChecker is a tool to check whether native libraries are loaded correctly. You can launch NativeLibraryChecker as follows:
$ hadoop checknative -a 14/12/06 01:30:45 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 14/12/06 01:30:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /home/ozawa/hadoop/lib/native/libhadoop.so.1.0.0 zlib: true /lib/x86_64-linux-gnu/libz.so.1 snappy: true /usr/lib/libsnappy.so.1 zstd: true /usr/lib/libzstd.so.1 lz4: true revision:99 bzip2: false
You can load any native shared library using DistributedCache for distributing and symlinking the library files.
This example shows you how to distribute a shared library, mylib.so, and load it from a MapReduce task.
Note: If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to make the library available to your MapReduce tasks.