| 1 |
Build instructions for Hadoop |
| 2 |
|
| 3 |
---------------------------------------------------------------------------------- |
| 4 |
Requirements: |
| 5 |
|
| 6 |
* Unix System |
| 7 |
* JDK 1.6+ |
| 8 |
* Maven 3.0 or later |
| 9 |
* Findbugs 1.3.9 (if running findbugs) |
| 10 |
* ProtocolBuffer 2.5.0 |
| 11 |
* CMake 2.6 or newer (if compiling native code) |
| 12 |
* Zlib devel (if compiling native code) |
| 13 |
* openssl devel ( if compiling native hadoop-pipes ) |
| 14 |
* Internet connection for first build (to fetch all Maven and Hadoop dependencies) |
| 15 |
|
| 16 |
---------------------------------------------------------------------------------- |
| 17 |
Maven main modules: |
| 18 |
|
| 19 |
hadoop (Main Hadoop project) |
| 20 |
- hadoop-project (Parent POM for all Hadoop Maven modules. ) |
| 21 |
(All plugins & dependencies versions are defined here.) |
| 22 |
- hadoop-project-dist (Parent POM for modules that generate distributions.) |
| 23 |
- hadoop-annotations (Generates the Hadoop doclet used to generated the Javadocs) |
| 24 |
- hadoop-assemblies (Maven assemblies used by the different modules) |
| 25 |
- hadoop-common-project (Hadoop Common) |
| 26 |
- hadoop-hdfs-project (Hadoop HDFS) |
| 27 |
- hadoop-mapreduce-project (Hadoop MapReduce) |
| 28 |
- hadoop-tools (Hadoop tools like Streaming, Distcp, etc.) |
| 29 |
- hadoop-dist (Hadoop distribution assembler) |
| 30 |
|
| 31 |
---------------------------------------------------------------------------------- |
| 32 |
Where to run Maven from? |
| 33 |
|
| 34 |
It can be run from any module. The only catch is that if not run from utrunk |
| 35 |
all modules that are not part of the build run must be installed in the local |
| 36 |
Maven cache or available in a Maven repository. |
| 37 |
|
| 38 |
---------------------------------------------------------------------------------- |
| 39 |
Maven build goals: |
| 40 |
|
| 41 |
* Clean : mvn clean |
| 42 |
* Compile : mvn compile [-Pnative] |
| 43 |
* Run tests : mvn test [-Pnative] |
| 44 |
* Create JAR : mvn package |
| 45 |
* Run findbugs : mvn compile findbugs:findbugs |
| 46 |
* Run checkstyle : mvn compile checkstyle:checkstyle |
| 47 |
* Install JAR in M2 cache : mvn install |
| 48 |
* Deploy JAR to Maven repo : mvn deploy |
| 49 |
* Run clover : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license] |
| 50 |
* Run Rat : mvn apache-rat:check |
| 51 |
* Build javadocs : mvn javadoc:javadoc |
| 52 |
* Build distribution : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar] |
| 53 |
* Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION |
| 54 |
|
| 55 |
Build options: |
| 56 |
|
| 57 |
* Use -Pnative to compile/bundle native code |
| 58 |
* Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist) |
| 59 |
* Use -Psrc to create a project source TAR.GZ |
| 60 |
* Use -Dtar to create a TAR with the distribution (using -Pdist) |
| 61 |
|
| 62 |
Snappy build options: |
| 63 |
|
| 64 |
Snappy is a compression library that can be utilized by the native code. |
| 65 |
It is currently an optional component, meaning that Hadoop can be built with |
| 66 |
or without this dependency. |
| 67 |
|
| 68 |
* Use -Drequire.snappy to fail the build if libsnappy.so is not found. |
| 69 |
If this option is not specified and the snappy library is missing, |
| 70 |
we silently build a version of libhadoop.so that cannot make use of snappy. |
| 71 |
This option is recommended if you plan on making use of snappy and want |
| 72 |
to get more repeatable builds. |
| 73 |
|
| 74 |
* Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy |
| 75 |
header files and library files. You do not need this option if you have |
| 76 |
installed snappy using a package manager. |
| 77 |
* Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library |
| 78 |
files. Similarly to snappy.prefix, you do not need this option if you have |
| 79 |
installed snappy using a package manager. |
| 80 |
* Use -Dbundle.snappy to copy the contents of the snappy.lib directory into |
| 81 |
the final tar file. This option requires that -Dsnappy.lib is also given, |
| 82 |
and it ignores the -Dsnappy.prefix option. |
| 83 |
|
| 84 |
Tests options: |
| 85 |
|
| 86 |
* Use -DskipTests to skip tests when running the following Maven goals: |
| 87 |
'package', 'install', 'deploy' or 'verify' |
| 88 |
* -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,.... |
| 89 |
* -Dtest.exclude=<TESTCLASSNAME> |
| 90 |
* -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java |
| 91 |
|
| 92 |
---------------------------------------------------------------------------------- |
| 93 |
Building components separately |
| 94 |
|
| 95 |
If you are building a submodule directory, all the hadoop dependencies this |
| 96 |
submodule has will be resolved as all other 3rd party dependencies. This is, |
| 97 |
from the Maven cache or from a Maven repository (if not available in the cache |
| 98 |
or the SNAPSHOT 'timed out'). |
| 99 |
An alternative is to run 'mvn install -DskipTests' from Hadoop source top |
| 100 |
level once; and then work from the submodule. Keep in mind that SNAPSHOTs |
| 101 |
time out after a while, using the Maven '-nsu' will stop Maven from trying |
| 102 |
to update SNAPSHOTs from external repos. |
| 103 |
|
| 104 |
---------------------------------------------------------------------------------- |
| 105 |
Protocol Buffer compiler |
| 106 |
|
| 107 |
The version of Protocol Buffer compiler, protoc, must match the version of the |
| 108 |
protobuf JAR. |
| 109 |
|
| 110 |
If you have multiple versions of protoc in your system, you can set in your |
| 111 |
build shell the HADOOP_PROTOC_PATH environment variable to point to the one you |
| 112 |
want to use for the Hadoop build. If you don't define this environment variable, |
| 113 |
protoc is looked up in the PATH. |
| 114 |
---------------------------------------------------------------------------------- |
| 115 |
Importing projects to eclipse |
| 116 |
|
| 117 |
When you import the project to eclipse, install hadoop-maven-plugins at first. |
| 118 |
|
| 119 |
$ cd hadoop-maven-plugins |
| 120 |
$ mvn install |
| 121 |
|
| 122 |
Then, generate eclipse project files. |
| 123 |
|
| 124 |
$ mvn eclipse:eclipse -DskipTests |
| 125 |
|
| 126 |
At last, import to eclipse by specifying the root directory of the project via |
| 127 |
[File] > [Import] > [Existing Projects into Workspace]. |
| 128 |
|
| 129 |
---------------------------------------------------------------------------------- |
| 130 |
Building distributions: |
| 131 |
|
| 132 |
Create binary distribution without native code and without documentation: |
| 133 |
|
| 134 |
$ mvn package -Pdist -DskipTests -Dtar |
| 135 |
|
| 136 |
Create binary distribution with native code and with documentation: |
| 137 |
|
| 138 |
$ mvn package -Pdist,native,docs -DskipTests -Dtar |
| 139 |
|
| 140 |
Create source distribution: |
| 141 |
|
| 142 |
$ mvn package -Psrc -DskipTests |
| 143 |
|
| 144 |
Create source and binary distributions with native code and documentation: |
| 145 |
|
| 146 |
$ mvn package -Pdist,native,docs,src -DskipTests -Dtar |
| 147 |
|
| 148 |
Create a local staging version of the website (in /tmp/hadoop-site) |
| 149 |
|
| 150 |
$ mvn clean site; mvn site:stage -DstagingDirectory=/tmp/hadoop-site |
| 151 |
|
| 152 |
---------------------------------------------------------------------------------- |
| 153 |
|
| 154 |
Handling out of memory errors in builds |
| 155 |
|
| 156 |
---------------------------------------------------------------------------------- |
| 157 |
|
| 158 |
If the build process fails with an out of memory error, you should be able to fix |
| 159 |
it by increasing the memory used by maven -which can be done via the environment |
| 160 |
variable MAVEN_OPTS. |
| 161 |
|
| 162 |
Here is an example setting to allocate between 256 and 512 MB of heap space to |
| 163 |
Maven |
| 164 |
|
| 165 |
export MAVEN_OPTS="-Xms256m -Xmx512m" |
| 166 |
|
| 167 |
---------------------------------------------------------------------------------- |
| 168 |
|
| 169 |
Building on OS/X |
| 170 |
|
| 171 |
---------------------------------------------------------------------------------- |
| 172 |
|
| 173 |
A one-time manual step is required to enable building Hadoop OS X with Java 7 |
| 174 |
every time the JDK is updated. |
| 175 |
see: https://issues.apache.org/jira/browse/HADOOP-9350 |
| 176 |
|
| 177 |
$ sudo mkdir `/usr/libexec/java_home`/Classes |
| 178 |
$ sudo ln -s `/usr/libexec/java_home`/lib/tools.jar `/usr/libexec/java_home`/Classes/classes.jar |
| 179 |
|
| 180 |
---------------------------------------------------------------------------------- |
| 181 |
|
| 182 |
Building on Windows |
| 183 |
|
| 184 |
---------------------------------------------------------------------------------- |
| 185 |
Requirements: |
| 186 |
|
| 187 |
* Windows System |
| 188 |
* JDK 1.6+ |
| 189 |
* Maven 3.0 or later |
| 190 |
* Findbugs 1.3.9 (if running findbugs) |
| 191 |
* ProtocolBuffer 2.5.0 |
| 192 |
* CMake 2.6 or newer |
| 193 |
* Windows SDK or Visual Studio 2010 Professional |
| 194 |
* Unix command-line tools from GnuWin32 or Cygwin: sh, mkdir, rm, cp, tar, gzip |
| 195 |
* zlib headers (if building native code bindings for zlib) |
| 196 |
* Internet connection for first build (to fetch all Maven and Hadoop dependencies) |
| 197 |
|
| 198 |
If using Visual Studio, it must be Visual Studio 2010 Professional (not 2012). |
| 199 |
Do not use Visual Studio Express. It does not support compiling for 64-bit, |
| 200 |
which is problematic if running a 64-bit system. The Windows SDK is free to |
| 201 |
download here: |
| 202 |
|
| 203 |
http://www.microsoft.com/en-us/download/details.aspx?id=8279 |
| 204 |
|
| 205 |
---------------------------------------------------------------------------------- |
| 206 |
Building: |
| 207 |
|
| 208 |
Keep the source code tree in a short path to avoid running into problems related |
| 209 |
to Windows maximum path length limitation. (For example, C:\hdc). |
| 210 |
|
| 211 |
Run builds from a Windows SDK Command Prompt. (Start, All Programs, |
| 212 |
Microsoft Windows SDK v7.1, Windows SDK 7.1 Command Prompt.) |
| 213 |
|
| 214 |
JAVA_HOME must be set, and the path must not contain spaces. If the full path |
| 215 |
would contain spaces, then use the Windows short path instead. |
| 216 |
|
| 217 |
You must set the Platform environment variable to either x64 or Win32 depending |
| 218 |
on whether you're running a 64-bit or 32-bit system. Note that this is |
| 219 |
case-sensitive. It must be "Platform", not "PLATFORM" or "platform". |
| 220 |
Environment variables on Windows are usually case-insensitive, but Maven treats |
| 221 |
them as case-sensitive. Failure to set this environment variable correctly will |
| 222 |
cause msbuild to fail while building the native code in hadoop-common. |
| 223 |
|
| 224 |
set Platform=x64 (when building on a 64-bit system) |
| 225 |
set Platform=Win32 (when building on a 32-bit system) |
| 226 |
|
| 227 |
Several tests require that the user must have the Create Symbolic Links |
| 228 |
privilege. |
| 229 |
|
| 230 |
All Maven goals are the same as described above with the exception that |
| 231 |
native code is built by enabling the 'native-win' Maven profile. -Pnative-win |
| 232 |
is enabled by default when building on Windows since the native components |
| 233 |
are required (not optional) on Windows. |
| 234 |
|
| 235 |
If native code bindings for zlib are required, then the zlib headers must be |
| 236 |
deployed on the build machine. Set the ZLIB_HOME environment variable to the |
| 237 |
directory containing the headers. |
| 238 |
|
| 239 |
set ZLIB_HOME=C:\zlib-1.2.7 |
| 240 |
|
| 241 |
At runtime, zlib1.dll must be accessible on the PATH. Hadoop has been tested |
| 242 |
with zlib 1.2.7, built using Visual Studio 2010 out of contrib\vstudio\vc10 in |
| 243 |
the zlib 1.2.7 source tree. |
| 244 |
|
| 245 |
http://www.zlib.net/ |
| 246 |
|
| 247 |
---------------------------------------------------------------------------------- |
| 248 |
Building distributions: |
| 249 |
|
| 250 |
* Build distribution with native code : mvn package [-Pdist][-Pdocs][-Psrc][-Dtar] |
| 251 |
|