Apache Mahout > Mahout Wiki > Developer Resources > How To Contribute |
"Contributing" to an Apache project is about more then just writing code – it's about doing what you can to make the project better. There are lots of ways to contribute....
Contributors should join the Mahout mailing lists. In particular:
Please keep discussions about Mahout on list so that everyone benefits. Emailing individual committers with questions about specific Mahout issues is discouraged. See http://people.apache.org/~hossman/#private_q.
You can also track issues that you've raised in JIRA.
What do you like to work on? There are a ton of things in Mahout that we would love to have contributions for. Data ingestion, data visualization, documentation, new algorithms, performance improvements, better tests, etc. The best place to start is by looking in JIRA under the Mahout project and seeing what bugs have been reported and seeing if any look like you could take them on. Small, well written, well tested patches are a great way to get your feet wet. It could be something as simple as fixing a typo. The more important piece is you are showing you understand the necessary steps for making changes to the code. Mahout is a pretty big beast at this point, so changes, especially from non-committers, need to be evolutionary not revolutionary since it is often very difficult to evaluate the merits of a very large patch. Think small, at least to start!
Beyond JIRA, hang out on the dev@ mailing list. That's where we discuss what we are working on in the internals and where you can get a sense of where people are working.
Also, documentation is a great way to familiarize yourself with the code and is always a welcome addition to the codebase and to this Wiki.
Also, check out the MAHOUT_INTRO_CONTRIBUTE items in JIRA, as these have been deemed to be fairly easy to start on.
Also feel free to jump in on the backlog or on a the next version
If you are interested in working towards being a committer, general guidelines are available online.
This section identifies the ''optimal'' steps community member can take to submit a changes or additions to the Mahout code base. This can be new features, bug fixes optimizations of existing features, or tests of existing code to prove it works as advertised (and to make it more robust against possible future changes).
Please note that these are the "optimal" steps, and community members that don't have the time or resources to do everything outlined on this below should not be discouraged from submitting their ideas "as is" per "Yonik Seeley's (Solr committer) Law of Patches" ...
A half-baked patch in Jira, with no documentation, no tests
and no backwards compatibility is better than no patch at all.
Just because you may not have the time to write unit tests, or cleanup backwards compatibility issues, or add documentation, doesn't mean other people don't. Putting your patch out there allows other people to try it and possibly improve it.
First of all, you need the Mahout source code.
Get the source code on your local drive using SVN. Most development is done on the "trunk":
> svn checkout http://svn.apache.org/repos/asf/mahout/trunk mahout-trunk
Note that committers have to use https instead of http here, but http is fine for read-only access to the trunk code.
Before you start, you should send a message to the Mahout developer mailing list (Note: you have to subscribe before you can post), or file a bug in Jira. Describe your proposed changes and check that they fit in with what others are doing and have planned for the project. Be patient, it may take folks a while to understand your requirements.
Modify the source code and add some (very) nice features using your favorite IDE.
But take care about the following points
A "patch file" is the format that all good contributions come in. It bundles up everything that is being added, removed, or changed in your contribution.
Please make sure that all unit tests succeed before constructing your patch.
> cd mahout-trunk
> mvn clean test
After a while, if you see
BUILD SUCCESSFUL
all is ok, but if you see
BUILD FAILED
please, read carefully the errors messages and check your code.
Check to see what files you have modified with:
svn stat
Add any new files with:
svn add src/.../MyNewClass.java
Subversions "add" command only modifies your local copy, so it doess not require commit permissions. By using "svn add", your entire comtribution can be included in a single patch file, without needing to submit a seperate set of "new" files.
Edit the ''CHANGES.txt'' file, adding a description of your change, including the bug number it fixes.
In order to create a patch, just type:
svn diff > MAHOUT-$issuenumber.patch
$issuenumber here should be the number of the JIRA issue the patch is supposed to fix. This will report all modifications done on Mahout sources on your local disk and save them into the ''MAHOUT-$issuenumber.patch'' file. Read the patch file. Make sure it includes ONLY the modifications required to fix a single issue.
Please do not:
Please do:
Finally, patches should be attached to a bug report in Jira. If you are revising an existing patch, please re-use the exact same name as the previous attachment, Jira will "grey out" the older versions so it's clear which version is the newest.
Please be patient. Committers are busy people too. If no one responds to your patch after a few days, please make friendly reminders. Please incorporate other's suggestions into into your patch if you think they're reasonable. Finally, remember that even a patch that is not committed is useful to the community.
If there's a Jira issue that already has a patch you think is really good, and works well for you – please add a comment saying so. If there's room for improvement (more tests, better javadocs, etc...) then make the changes and attach it as well. If a lot of people review a patch and give it a thumbs up, that's a good sign for committers when deciding if it's worth spending time on the patch – and if other people have already put in effort to improve the docs/tests for a patch, that helps even more.
From the base directory (assuming that is where the patch is generated from), run:
patch -p 0 -i <PATH TO PATCH> [--dry-run]
The following resources may prove helpful when developing Mahout contributions. (These are not an endorsement of any specific development tools). Note, these are the same code styles that Lucene and Solr use.