Developer Documentation : How to contribute to Apache Lens?

Welcome contributors! This page provides necessary guidelines on how to contribute towards furthering the development and evolution of Apache Lens.

Contributions

Contributions are welcome in all the following forms which improves the project overall.

  • Code contributions
  • Documentation
  • Quality Improvements
  • Reviewing
  • Miscellaneous contributions
    • Simpler tasks like setting component field to No component issues in jira, set appropriate Priority for jira issues
    • Propose new features and improvements for the project
    • Participate in discussions on dev list and jiras
    • Verify and Vote for a release
    • Volunteer for releasing a version of the project
    • Improve review process, release process, builds, packaging, jenkins jobs, code contribution process and etc.

Development Environment Setup

Below sections guide a developer on how to contribute code or doc changes to Lens.

Source Repository

Lens uses git for its code repository. The repository is available at https://git-wip-us.apache.org/repos/asf/lens.git.

If you are comfortable working in github environment by forking a github repo, sothat you can push changes to your repository before they are accepted in apache, we have a mirror of source at https://github.com/apache/lens. Its better to add the apache repo as remote than github repo, because github repo might be delayed as it is a mirror of apache repo.

Build tools

  • A Java Development Kit. You can use java7 and java8.
  • Generating site and javadoc with java8 is not supported yet. See [Enunciate issue Enunciate issue and Javadoc issue for more details.
  • Apache maven (3.x+)

    Ensure all the tools are installed by executing mvn, git and javac respectively.

    As the Lens builds use the external Maven repository to download artifacts, Maven needs to be set up with the proxy settings needed to make external HTTP requests. The first build of every Lens project needs internet connectivity to download Maven dependencies.

  • Be online for that first build, on a good network
  • See Maven proxy settings

Integrated Development Environment (IDE)

You are free to use whatever IDE you prefer or your favorite text editor. Note that:

  • Building and testing is often done on the command line or at least via the Maven support in the IDEs.
  • Set up the IDE to follow the source layout rules of the project.

Building from source

Building Lens from Source

Download Apache Lens source release from here.

   unzip apache-lens-<version>-source.release.zip
   cd apache-lens-<version>
   mvn clean package -DskipTests

OR Clone Apache Lens source code from https://git-wip-us.apache.org/repos/asf/lens.git

   git clone https://git-wip-us.apache.org/repos/asf/lens.git
   cd lens
   mvn clean package -DskipTests

Once one of the above sets of commands completes successfully, the build will produce lens-dist/target/apache-lens-<version-bin/apache-lens-version-bin/server> and lens-dist/target/apache-lens-<version-bin/apache-lens-version-bin/client>. Former can be used as the Lens server installation directory and later can be used as Lens client installation directory to run lens server and lens client.

The build will also produce debians for both client and server in lens-dist/target. Client debian uses /usr/local/lens/client as the Lens client installation directory and Server debian uses /usr/local/lens/server as the Lens server installation directory.

Apache Lens depends on Hive. Please build Hive from Source or install it using the documentation here. After installing Lens and Hive, refer here for running lens client and lens server from installation directories.

Building Hive from Source

   git clone https://github.com/InMobi/hive.git

   cd hive

   git checkout <release-tag>

   mvn clean package -DskipTests -Phadoop-2,dist

Once above package command completes successfully, packaging/target will have apache-hive-$project.version-bin. This build also produces source, binary tar.gz files and deb package for hive.

Set the environment variable HIVE_HOME to point to the Hive installation directory built from source:

  cd packaging/target/apache-hive-$project.version-bin/apache-hive-$project.version-bin
  export HIVE_HOME=`pwd`

Code Contributions

All code changes should be initiated based on an issue in LENS JIRA, so that other contributors are aware of the proposed work and have the opportunity to actively participate (through review, suggestions, etc). This also allows scoping the changes in appropriate releases. Code contributions are to be made available as a patch against a specific JIRA created for the task. Once patches are attached to the JIRA, the JIRA issue should be marked as "Patch available" by clicking submit. Lens project follows RTC (Review then commit). If the change is bigger than a couple of lines of code, contributor should raise a review request on review board and attach the patch on jira once review request gets "Ship it" from one of the reviewers. It is recommended that large changes are broken up into smaller changes, thus making it easy for review. The patches should comply with the requirements below before they are made available for review.

Code compliance

All contributions should satisfy the following requirements

  • All public classes and methods should have informative javadoc comments.
    • Do not use @author tags.
  • All existing unit tests and integration tests should pass.
  • New unit tests should be provided to demonstrate bugs and fixes. Lens uses TestNG test framework. If any code change does not include unit test, the contributor should give the reason why it is not possible to include a unit test.
  • Project documentation corresponding to the change should be updated along with the code change. See Documentation section to know how Lens documentation is organized and how to update
  • Code must be formatted according to java standards, with the following changes:
    • Trailing White spaces: Remove all trailing white spaces.
    • Indentation: Never use tabs! Always use 2 space indents.
    • Line wrapping: Always use a 120-column line width.
  • Use slf4j framewrok for logging and use parameterized logging. Avoid commons logging and log4j fully, and those should be removed from transitive dependencies of newer dependencies added.
  • All working files (java, xml, others) should have the ASF license header in all versioned files.
  • If new features requires illustrative examples, they should be added in lens-examples.

Naming convention for configuration properties

Developers should follow the following naming convention for configuration properties

  • All server configuration names start with lens.server.
  • All configuration overridable for each query start with lens.query.
  • All configuration of drivers start with lens.driver. For HiveDriver the names start with lens.driver.hive. and for JDBCDriver it is lens.driver.jdbc
  • All configuration overridable for each session start with lens.session.

How do I suggest my code changes to the community?

So you've cloned lens, assigned a jira to yourself and made changes for that. You've also tested and verified the changes. Now you want to suggest the change to the lens community. There's two major steps involved in that:

Generate a patch

Creating patch

Check to see what files you have modified with:

   git status

Add any new files with:

  git add src/.../MyNewClass.java
  git add src/.../TestMyNewClass.java

In order to create a patch, type the following:

  git diff > LENS-1234.patch

The above command will only generate diff of your uncommitted files. If you've made commits, you'll have to diff between last community commit of lens and your own commit id. In this case, it's recommended to commit all uncommitted changes, and do:

  git diff master..HEAD > LENS-1234.patch

where master is the branch whose last commit is a community commit. Instead, you can provide commit id also, if you've made commits in master branch itself.

This will report all modifications done on Lens sources on your local disk and save them into the LENS-1234.patch file. Read the patch file. Make sure it includes ONLY the modifications required to fix a single issue.

Please do not:

  • reformatting code unrelated to the bug being fixed: formatting changes should be separate patches/commits.
  • comment out code that is now obsolete: just remove it.
  • make things public which are not required by end users.

    Please do:

  • Try to adhere to the coding style of files you edit.
  • Comment code whose function or rationale is not obvious.
  • Update documentation (e.g., package.html files, this wiki, etc.)
Naming your patch

Patches for master should be named according to the Jira: jira-xyz.patch, eg LENS-1234.patch.

Patches for a branch should be named jira-xyz-branch.patch, eg LENS-1234-branch-x.patch. The branch name suffix should be the exact name of a git branch.

It's OK to upload a new patch to Jira with the same name as an existing patch. However many contributors find it convenient to add a numeric suffix to the patch indicating the patch revision. e.g. LENS-1234.01.patch, LENS-1234.02.patch etc.

Testing your patch

Before submitting your patch, make sure all tests pass by running mvn clean package . Upon successful completion of the build, you can upload the patch on the JIRA and mark the JIRA as patch available. Till the automatic jenkins setup is available to verifying patch available issues, please update the test report on jira. Once a committer reviews the change, it will be committed to the repo and jira issue will be resolved.

Applying a patch

To apply a patch either you generated or found from JIRA, you can issue

  git apply --check lens_patch.patch
  git apply lens_patch.patch

Review request

Install reviewboard command line tool:

  • Make sure you have python installed and easy_install is available.
  • $ easy_install -U setuptools
  • More help available at Reviewboard Documentation.
Posting a review request

Whenever you want to publish a new review request, commit all the changes you want to send to the request and do:

  rbt post --parent=master

Reviewboard is not foolproof. Sometimes this will create a review request but wouldn't attach the patch. In that case, you have to manually create the patch and upload it to the url of the request. To create the patch:

  git diff master..HEAD > ~/LENS-##.patch

For updating an existing review request:

  rbt post -u --parent=master

Any changes you make from the command line to the review request are not published. They are only submitted as a draft. So the second step would be to open the review request url and update the necessary info like Title, Reviewer, Bug number etc.

After the patch is merged

After merge, make sure you close the review request by either clicking close on the request's URL or by command line:

  rbt close --close-type={submitted/discarded} #####(request id)

But it won't be merged until you follow one more step:

Submit patch

After the patch is reviewed on review board and validated by committers, they will post a "Ship It!" review. After that, you're expected to take the patch from reviewboard, attach it on the jira and make the jira "Patch Available". Only after that's done, will it be committed.

Unifying the above

So The above steps can get time consuming after few times. We recommend you use the above steps at least a few times to understand and internalize the process. There is also an automated way of doing the above things. One of the lens developers has developed a command line tool for it. It's available at https://github.com/prongs/rbt-jira. The typical contributor workflow will be like following:

  • Make Changes
  • Generate patch using git diff
  • Post patch to reviewboard.
  • Review Happens
  • Go back to step 1 to make changes according to the review. Or go to next step if Ship It! is provided
  • Download patch from review board
  • Attach patch in jira
  • Make jira patch available
  • Still a review can cause the patch to be cancelled. Go back to step 1. Or next step if it's committed.
  • Done.

So there's multiple steps involved in loops. The rbt-jira tool aims to automate precisely them. The documentation of the tool you can check on it's own github, as it's still in development and is subject to change.

Quality improvements

Here are some guidelines for contributing to Apache Lens, to improve quality of the project

  • Actively look at Review board and Patch available queue on lens and check the issues if they are verifiable, by looking for the following
    • All the issues are updated with enough documentation which can be used in verifying.
    • All bug fixes have an illustrative unit test to illustrate the bug, which can be added to regression.
    • None of the patches degrade the quality of the project.
  • Add more code quality tools for the build like findbugs.
  • Verify resolved issues and close them by adding regression
  • Improve test suite by adding unit tests, smoke tests, regression tests, integration tests and etc.
  • Report issues you encountered while trying out a feature

Documentation

You can contribute to improving project documentation by reporting issues in existing documentation or proposing changes. Most of the doc changes are done in existing files under src/site or add you can add new files under src/site or you can use space on confluence for some of the documentation. You can provide the changes for documentation in code similar to code contributions. For updating confluence space, you can request for edit access on dev mailing list for your account.

Below are some guidelines on updating document

Project documentation

The design, architecture and feature documentation needs to be updated in src/site of parent module. The documentation is organized into four menus under the site.

Lens Menu

Has main page, getting started page, install and run pages. This is place for all additions/changes required which improves documentation for the whole project

User Menu

User menu is meant for end user to use the lens as platform. It should be the place to reach end user. The documentation here should not talk about implementation details. For new feature user guide needs to be updated with how end user can use it. If feature is only a server improvement or admin feature, which end user shouldn't care, then this is not the place to add them. Usually api documentation, client side documentation, user level configuration go here.

Admin Menu

Admin menu is meant for administrator of the lens platform. All the documentation with respect to server deployment, configurations, monitoring goes here. Any new feature or improvement which is effecting admin should update this admin doc on how admin can use that feature.

Developer Menu

Developer menu is meant for developers to understand lens design and architecture, modules Developer documentation would contain overall design and architecture doc, feature design docs, extension api doc, how to contribute/commit/release docs and etc.

Configuration documentation

Configuration pages should be linked from user guide and admin guide with respect to the configuration exposed to them. See Developer FAQ on how to update config docs.

REST api documentation

REST api documentation is auto generated through Enunciate. Once the javadoc for the resource api is updated correctly, the REST api doc should get updated.

If a new resource is added in lens-server module, it should be updated in lens-server/enunciate.xml sothat the REST api doc gets generated. If a new module is added with resources and the module pom entry need to updated with enunciate plugin usage and tools/scripts/generate-site-public.sh needs to be updated with site generation and publishing the docs.

Feature documentation

End users of Lens include :

  • Querying users : mostly un-aware of system details, and un-aware of underlying data layout.
  • Schema designer : Schema designer would understand the data model and come up with schema for their data.
  • Server administrator : Lens server administrator understands the server setup, how multiple execution engines can be added/removed and how to support multiple storage.
  • Developer : Developers can develop new drivers, new features on server or client or anything to do with code.

    Whenever a new feature is added to Lens, the developer should understand to whom the feature is applicable and put it in proper menu - "User Menu" for Querying users and schema designers; "Admin Menu" for server administrators and "Developer Menu" for developer related features.

    The following details about the feature should be documented :

  • The use case : Explaining Why the feature is required.
  • The feature itself : What the feature is.
  • How to enable the feature, if any.
  • Who are the users of feature and who are not, if any.
  • Illustration with example would be very welcome

    The design documentation related to the feature can go in developer documentation or Design docs. It would be necessary to add in which version the feature is available if the documentation is in confluence. Also when behavior is modified or improved on existing features, version tagging is quite useful. Any defaults (in terms of config or behavior) assumed with the feature should also be highlighted. Configuration descriptions should be linked to config apt files, so that they are always in sync with code.

Confluence usage

Cwiki should be used for documentation that cannot go into code, which is some adhoc documentation. This falls into following main categories

  • Design doc for a feature, which is just proposed or it is under implementation. Once the feature is implemented doc should be updated in project documentation
  • Project roadmap
  • Discussions : Placeholder for discussions that cannot be done in jira
  • Presentations : Links to slideshare or google docs
  • Events : Developer/User Meetup minutes

    Cwiki should not be used for documentation that is already present in code, which is the following

  • Architecture documentation
  • Feature documentation
  • How to contribute/commit/release pages
  • Generated doc for REST API/Javadoc
  • Getting started pages

Review

Lens uses Review board for review requests. If you are interested in reviewing in changes put by other contributors actively look at the review board for requests put up

Some things that are important to check for in patches/review requests

  • Code style as per coding guidelines in contributer guide
  • Correctness of the patch
  • Exception handling and thread safety
  • Log levels
  • Documentation (project documentation, javadoc, feature design docs)
  • Any assumptions made in the patch that might not be practical or that could be cumbersome to manage
  • Increase in complexity of installation, use, or operability

Becoming a committer

"What do I need to do in order to become a committer?" The simple (though frustrating) answer to this question is, "If you want to become a committer, behave like a committer." If you follow this advice, then rest assured that the PMC will notice, and committership will seek you out rather than the other way around. So besides continuing to contribute high-quality code and tests, there are many other things that you should naturally be undertaking as part of getting deeper into the project's life:

  • Help out users and other developers on the mailing lists, in JIRA, and in IRC
  • Review and test the patches submitted by others; this can help to offload the burden on existing committers, who will definitely appreciate your efforts
  • Participate in discussions about releases, roadmaps, architecture, and long-term plans
  • Help improve the website and the wiki
  • Participate in (or even initiate) real-world events such as user/developer meetups, papers/talks at conferences, etc
  • Improve project infrastructure in order to increase the efficiency of committers and other contributors
  • Help raise the project's quality bar (e.g. by setting up code coverage analysis)
  • As much as possible, keep your activity sustained rather than sporadic

Stay involved

Contributors should join the Lens mailing lists. In particular, the commit list (to see changes as they are made), the dev list (to join discussions of changes) and the user list (to help others). Also refer to Apache contributors guide and Apache voting process.

Developer FAQ

How to update documentation?

The new doc files can added in src/site of parent module. Or doc change can be done in existing files under src/site. Update the config files and run the TestGenerateConfigDoc test calss for updating the config docs. Once the changes are done, contributor can run the below command.

  mvn site:run

This will start localhost doc server on http://localhost:8080, which can opened through browser and doc can be validated.

How to update the config docs?

Add/Delete/Modify the config to the resource config files(Ex: lens-client-default.xml file for Client resource). Run the TestGenerateConfigDoc, which automatically updates the config apt files under src/site.

Resource Config FileName Location of config property file
Server lensserver-default.xml lens-server/src/main/resources/lensserver-default.xml
Client lens-client-default.xml lens-client/src/main/resources/lens-client-default.xml
Session lenssession-default.xml lens-server/src/main/resources/lenssession-default.xml
Hive Driver hivedriver-default.xml lens-driver-hive/src/main/resources/hivedriver-default.xml
JDBC Driver jdbcdriver-default.xml lens-driver-jdbc/src/main/resources/jdbcdriver-default.xml
Cube olap-query-conf.xml lens-cube/src/main/resources/olap-query-conf.xml

How to update CLI doc?

So cli doc is auto generated by reading all the annotations in cli java files. So if you've added a new cli command or modified an existing command, please make sure to have enough documentation in the help section of the command's annotations. Look at other commands to get an idea. After any modifications, run TestGenerateCLIUserDoc to automatically re-generate cli doc. Once you do that, start the site locally and verify the change is reflected at http://localhost:8080/user/cli.html. Also verify it doesn't look out of place by looking at neighbouring command documentations.

How to add license headers for newly added files?

Run the command mvn license:format. This is add license headers for all the files automatically. If some files need to excluded they should be put in excludes section in parent project pom's license-maven-plugin.

How to check all licenses are fine?

Run the command mvn apache-rat:check. If check needs to be excluded for any file, it should be put in excludes section in parent project pom's apache-rat-plugin.

What is versioning strategy in Lens?

Lens follows three number versioning which is major.minor.revision. If the current release is 2.0.0, the next usual development release will be 2.1.0. If there needs to be separate release on released version and not from development branch (usually critical patch release), it will be 2.0.1. If the next release is not compatible with previous release, then the major version needs to be incremented, then it would become 3.0.0. This way all 2.x.x releases will be compatible with one another. And incompatibility is clearly communicated to users through major version number change.

The jira fix version for all the issues in 2.0.x is called 2.0, and 2.1.x is called 2.1. For patch releases from release branch, the jira fix version can be exact patch release version number.

What is the branching strategy in Lens?

Lens has two main branches - master and current-release-line. All the day to day development happens on master branch. current-release-line branch is used to make releases. When master branch is ready for release (all improvements/features/bug fixes marked for release are fixed and all tests passing), master will be merged to current-release-line. The version number on master will be incremented to next development version. The only issues that can be cherry-picked into current-release-line till release is rolled are critical/blocker bugs, documentation changes and test case changes. Once all the issues are verified for the release, and release will be triggered from current-release-line.

If a critical release (not pulling code from master) needs to be made, a new branch will be created with release number, by checking out current-release-line branch. And changes will be put on the branch. Once the branch is ready they will merged to current-release-line and released. The changes should be cherry-picked back into master from current-release-line once the release is made and resolving conflicts in master if any. Having two main branches makes all release tags to be created on current-release-line branch and removes the pile up of old and stale branches, which are created by one for each release.

For major version increments, current-release-line will be branched to a a $major.x-line and current-release-line and master will be moved to next major version.

There can be feature branches created from master, if feature is not actively developed in master branch directly. For a feature branch to be created a contirbutor can start discuss thread on dev list for consensus on whether it is required.