We welcome contributions to the Apache DataFu. If you're interested, please read the following guide:
https://cwiki.apache.org/confluence/display/DATAFU/Contributing+to+Apache+DataFu
Common tasks for working in the DataFu code can be found below. For information on how to contribute patches, please follow the wiki link above.
If you haven't done so already:
git clone https://git-wip-us.apache.org/repos/asf/datafu.git
cd datafu
The following command generates the necessary files to load the project in Eclipse:
./gradlew eclipse
To clean up the eclipse files:
./gradlew cleanEclipse
Note that you may run out of heap when executing tests in Eclipse. To fix this adjust your heap settings for the TestNG plugin. Go to Eclipse->Preferences. Select TestNG->Run/Debug. Add "-Xmx1G" to the JVM args.
All the JARs for the project can be built with the following command:
./gradlew assemble
This builds SNAPSHOT versions of the JARs for DataFu Pig, Spark and Hourglass. The built JARs can be found under datafu-pig/build/libs
, datafu-spark/build/libs
and datafu-hourglass/build/libs
, respectively.
A single project - for example, DataFu Pig - may be built by running the command below.
./gradlew :datafu-pig:assemble
Tests can be run with the following command:
./gradlew test
All the tests can also be run from within eclipse.
To run a single project's test - for example, for DataFu Pig only:
./gradlew :datafu-pig:test
To run a specific set of tests from the command line, you can define the test.single
system property with a value matching the test class you want to run. For example, to run all tests defined in the QuantileTests
test class for DataFu Pig:
./gradlew :datafu-pig:test --tests QuantileTests
You can similarly run a specific Hourglass test like so:
./gradlew :datafu-hourglass:test --tests PartitionCollapsingTests