Flume Developer Notes ===================== Jonathan Hsieh 6/22/11 // This is in asciidoc markup == Introduction This is meant to be a a guide for issues that occur when building, debugging and setting up Flume as developer. == High level directory and file structure. Flume uses the Maven build system and has a Maven project object model (pom) that has many components broken down into Maven modules. Below we describe the contents of different directories. ---- ./bin/ Flume startup scripts ./conf/ Flume configuration file samples ./contrib/flogger Flume logger: a Flume client implemented in C ./docs/man Flume man pages ./flume-config-web Flume master configuration servlet module ./flume-core Flume core module ./flume-distribution Flume distribution package module ./flume-docs Flume documentation generation module ./flume-log4j-appender Flume log4j-avro appender module ./flume-microbenchmarks Flume performance microbenchmark test suite ./flume-node-web Flume node status servlet module ./flume-windows-dist Flume node Windows distribution package module ./plugins/ Flume plugin modules (hello world skeleton and hbase) ./src/javaperf Flume performance tests (out of date) ./src/javatest-torture Flume reliability tests (out of date) ---- The files exclusions in `.gitignore` are either autogenerated by Maven or Eclipse. == Building and Testing Flume === Prerequisites There are several tools required to do a full build of Flume but only the Thrift compiler is required for development and testing builds. To build documentation, you will need to have asciidoc installed. To build Windows installers, you will need to have makensis installed. ==== Building Thrift The Thrift compiler is required to build Flume and currently does not have a binary packages avaiblle for Linux based platforms. (Windows is available in binary). There are several requirements necesary to build it. Here's a link to the requirements http://wiki.apache.org/thrift/ThriftRequirements This page also contains links explaining how to install the requirements for various platforms. === Using Maven We are using Maven v2.x.x. The Maven build system steps through several phases to create build artefacts. At the highest level, the phases that are relevent to most devs are "compile" -> "test" -> "package" -> "install". There are several options and "profiles" available in the Flume build. The default profile is a "dev" profile. Below we include a examples of common build command lines to build different profiles. A development build that runs unit tests and installs to local Maven repo. This builds and tests all plugins, but excludes modules that have aren't needed during development (eg. Windows installer, documentation). ---- mvn install ---- A development build that skips the execution of unit tests. ---- mvn install -DskipTests ---- A development build that runs unit tests. (no package generation) ---- mvn test ---- A development build that runs unit tests including only specific tests (where is a regex of a class name without .java or .class or path). ---- mvn test -Dtest= ---- Window node build, skipping unit tests (requires makensis). NOTE: makensis is available on Linux and Mac OS X homebrew so this can be built while running in these operating systems. ---- mvn install -Pwindows -DskipTests ---- Full build, skipping unit tests (requires asciidoc), and does not build Windows. ---- mvn install -Pfull-build -DskipTests ---- Full build, make both docs and Windows. ---- mvn install -Pfull-build,windows ---- ==== Pointing the Maven build at the proper Thrift executable Flume has, over time, upgraded to newer versions of Thrift. The Maven build requires a pointer to the proper Thrift compiler. If you install Thrift in a non-standard location (not /usr/local/thrift/bin), you will need to provide the build some extra information. This may be the case if you overrode the standard Thrift install (+make install+ 's default target) or are running Thrift from a home directory. One way to provide this is via the Maven command line by setting the thrift.executable variable (this assumes that we made different dirs for different versions of Thrift): ---- mvn install -Dthrift.executable=/usr/local/thrift-0.6.0/bin/thrift ---- Another way to provide this information to your Maven build is to modifiy your Maven profile by adding/modifiying your ~/.m2/settings.xml file and overriding the default thrift.executable setting to point to your Thrift compiler executable. In the example below, we install different versions of the Thrift compiler in different directories and thus need to change the setting. ---- flume /usr/local/thrift-0.6.0/bin/thrift flume ---- ==== Including or excluding specific sets of tests. We've added hooks to the maven build that will enable you to exclude or include specific tests on a test run. This is useful for excluding flakey tests or making a build that focuses solely upon flakey tests. To do this we created two variables: # test.include.pattern # test.exclude.pattern These variables take regular expression patterns of the files to be included or excluded. For the next set of examples, let's say you have flakey test called TestFlaky1 and TestFlaky2. You can execute tests that skip TestFlaky1 and TestFlaky2 by using the following command line: ---- mvn test -Dtest.exclude.pattern=**/TestFlaky*.java ---- Alternately, you could be more explicit ---- mvn test -Dtest.exclude.pattern=**/TestFlaky1.java,**/TestFlaky2.java ---- Conversely, you could execute only the flaky tests by using: ---- mvn test -Dtest.include.pattern=**/TestFlaky*.java ---- You can also have a combination of imports and exports. This runs TestFlaky* but skips over TestFlaky2: ---- mvn test -Dtest.include.pattern=**/TestFlaky*.java -Dtest.exclude.pattern=**/TestFlaky2.java ---- NOTE: Both test.exclude.pattern and test.include.pattern get overridden if the test parameter is used. Consider: ---- mvn test -Dtest.exclude.pattern=**/TestFlaky*.java -Dtest=TestFlaky1 --- In this case, TestFlaky1 will be run despite being in the test.exclude.pattern. === Running the most recent build To run the most recent build of Flume, first build the distribuion packages. ---- mvn install -DskipTests ---- You can then traverse into ./flume-distribution/target/flume-distribution--bin/flume-distribution-. This directory is setup exactly as the tarball installation of Flume would be. === Running Performance Microbenchmarks. The suite of source and sink microbenchmark tests (located in ./flume-microbenchmarks/javaperf) can be run by using `mvn test -Pperf`. Just like with the normal test cases, you can use the `-Dtest=`. So you can do: ---- mvn test -Pperf -Dtest=PerfThriftSinks ---- The logs should output lines that are formatted similarly to these lines: ---- [junit] nullsink,ubuntu,begin,10998597,552872,disk_loaded,2895851957,301662152,receiver_started,156786445,305698624,sink_started,105303802,305704456,thrift sink to thrift source done,39520160510,320377056,MB/s,4.579940971898899,23094932,320379168 [junit] [ 0us, 547,544 b mem] Starting (after gc) [junit] [ 10,998,597ns d 10,998,597ns 552,872 b mem] begin [junit] [ 2,914,443,637ns d 2,895,851,957ns 301,662,152 b mem] disk_loaded [junit] [ 3,514,297,391ns d 156,786,445ns 305,698,624 b mem] receiver_started [junit] [ 4,082,661,503ns d 105,303,802ns 305,704,456 b mem] sink_started [junit] [ 44,235,264,972ns d 39,520,160,510ns 320,377,056 b mem] thrift sink to thrift source done [junit] [ 44,878,445,315ns d 23,094,932ns 320,379,168 b mem] MB/s,4.579940971898899 ---- The first line is a summary of all the information in cvs format. The other lines are in a tabular, more human-readable form. The left column is cumulative time in ns and the middle is delta from previous in ns. The last column of numbers the amount of memory in heap, followed but some comments or labels. === Building on Windows platforms Building Flume in Windows is possible. One can generate packages and installer executable on Windows. This build assumes a cygwin envrionment, but may not require it. This build requires * Maven for Windows * makensis (for Windows installer build) * java 1.6+ You should be able run the normal mvn commands. The current Windows installer executable does not handle all error handling situations and does not checks to see if not run as administrator. === Building documentation Documentation for Flume is written in asciidoc. It relies on several libraries to generate images. * asciidoc v8.5.2 * graphviz (dot) v2.26.3 * xmlto Documents can be built by running 'mvn -Pfull-build' == Integrated Development Environments for Flume Currently most Flume developers use the Eclipse IDE. We have included some instructions for getting started with Eclipse. === Setting up a Flume Eclipse projects from the Maven POMs. If you use Eclipse we suggest you use the m2eclipse plugin available here to properly create an environment for dev and testing in Eclipse. http://m2eclipse.sonatype.org/ After installing it in Eclipse you will want to "Import" the Flume pom.xml project. This can be done by going to the Eclipse applications menu, navigating to File > Import... > Existing Maven Projects. From there, browse to and select the directory that contains the root of the Flume project. The build requires the location of the Thrift compiler executable -- see the instructions about .m2/settings.xml files in the building Flume section for more details. The flume-core project will have errors -- these can be resolved by manually adding these dirs to you build source dirs: * ./flume-core/target/generated-sources/antlr3 * ./flume-core/target/generated-sources/avro * ./flume-core/target/generated-sources/thrift * ./flume-core/target/generated-sources/version == Debugging Flume === Flume's web applications The default setup for Flume is to run its servlets from .WAR files that include precompiled jsps. On can have the node or master start specfic servlets .WARs, by pointing the following properties in the system's flume-site.conf file, like below. ---- flume.master.webapps.root webapps/flumemaster.war Path where Flume master war lives. If a file it will load the war, if a dir it will load all *.war in that dir. flume.node.webapps.root webapps/flumemaster.war Path where Flume node war lives. If a file it will load the war, if a dir it will load all *.war in that dir. ---- // TODO document how to debug JSPs while in Eclipse == Rules of the Repository We have a few basic rules for code in the repository. The master/trunk pointer: * MUST always build. * SHOULD always pass all unit tests When commitng code we tag pushes with JIRA numbers, and their short descriptions. Generally these are in the following format: ---- FLUME-42: Description from the jira ---- All source files must include the following header (or a variant depending on comment characters): ---- /** * Licensed to Cloudera, Inc. under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. Cloudera, Inc. licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ ---- No build generated files should be checked in. Here are some examples of generate files that should not be checked: * html documentation * thrift-generated source * avro-generated source * antlr generated source * auto-generated versioning annotations