== == |SIP | 2 | |Title | Sqoop 1.0 release criteria and maintenance policy | |Author | Aaron Kimball (aaron at cloudera dot com) | |Created | April 29, 2010 | |Status | Accepted | |Discussion| "http://github.com/cloudera/sqoop/issues/issue/9":http://github.com/cloudera/sqoop/issues/issue/9 | h2. Abstract This SIP describes a proposal for creating the first officially-tagged release of Sqoop. This outlines the remaining features which would need to be implemented before creating the release, as well as the version maintenance policy to be adopted moving forward. h2. Problem statement Sqoop has been provided to users through ad-hoc releases with editions of Cloudera's Distribution for Hadoop. But no version of Sqoop available to date has been deemed a canonical Sqoop release. This SIP proposes to answer the questions of: * What constitutes the first release. * What support should be provided for this release going forward. * What release policy should be adopted for subsequent releases. The specification below addresses each of these issues in turn. h2. Specification h3. Sqoop 1.0.0 Release The first Sqoop release, Sqoop 1.0.0, will include all features currently available in Sqoop. In addition, the following new features should be added as well: * The command-line API refactoring (proposed in [[SIP-1]]) * A version information command * Better support for exports of large volumes of data (with intermediate checkpointing) * A file format for large object storage (proposed in [[SIP-3]]) * A backwards-compatible public API (proposed in [[SIP-4]]) The version information command should be straightforward to implement. Improved export support will be performed by adding an OutputFormat that uses batch @INSERT@ statements and incremental spills. For the file format and API refactoring, separate improvement proposals will be filed. h3. Release Support This release will be included in CDH3, Cloudera's Distribution for Hadoop and marked as Sqoop 1.0.0. Subsequent releases of Sqoop with a _1.y.z_ number should remain API-compatible with Sqoop 1.0.0. *API compatibility* is defined as the following: * The command-line API will provide at least the same degree of functionality: command-line arguments will not be removed in the 1.0 line. (1.y releases may deprecate arguments as a message that they will be removed in a 2.0 release, but these deprecated arguments will remain present in all 1.y releases.) ** New arguments with additional functionality may be added in 1.y releases. * Any code generated by Sqoop 1.0 will link and interoperate with any Sqoop 1.y library. ** More generally, code generated by Sqoop 1.x will link and interoperate with any Sqoop 1.y so long as x <= y. * No internal APIs are currently considered stable. *Internal APIs may change* between releases in the 1.y series. This includes all programmatic APIs except those declared public in [[SIP-4]]. (e.g., the @public@ members of the @org.apache.hadoop.sqoop.lib@ package) Code generated by Sqoop is used to interpret records materialized to HDFS. This brings up the issue of data compatibility. The following *data compatibility* guarantees will be provided: * For data imported to HDFS by Sqoop 1.0, code generated by Sqoop 1.0 will be able to interpret this data in the same way, when used in conjunction with any subsequent Sqoop 1.y library. h3. Release Policy Sqoop should endeavor to provide as frequent of a release plan as is reasonable. Bugfixes should be efficiently distributed to clients of Sqoop in a timely fashion. New features should be provided incrementally to elicit feedback and demonstrate forward progress. h4. Terminology The following terminology is used throughout this section: * Major version: the first digit in a version number. e.g., "1.3.7" has a major version of "1". * Minor version: the first two digits in a version number. e.g., "1.3.7" has a minor version of "1.3". * Major series: All versions with the same major version. e.g., "1.2.0" and "1.4.7" are in the same major series (series "1"). * Minor series: All versions with the same minor version. e.g., "1.2.0" and "1.2.3" are in the same minor series (series "1.2"). * Bugfix release: a fully specified version. e.g., 1.2.3. * End-of-life: A major or minor series has reached end-of-life when no new bugfixes will be provided for it. Users are expected to immediately upgrade to the next major or minor series as appropriate. * End-of-development: a major series has reached end-of-development when no new minor versions are planned in its series. All subsequent feature development will occur on the next major version. Bugfixes will still be provided on the last minor version in a major series that has reached end-of-development, until that major series later reaches end-of-life. h4. Bugfixes For any version @x.y.z@, the next bugfix release @x.y.(z+1)@ should be distributed when the following criteria are all met: * A sufficient number of bugfixes have been provided on top of @x.y.z@, or a bugfix of sufficient criticality has been provided. "Sufficient" shall be determined on a case-by-case basis. * A minimum amount of "soak time" for the previous release has been provided. This should be a period of at least two weeks, to ensure that a "z+2" release does not need to be provided the day after a "z+1" release, except in fatally crippled cases (e.g., z+1 proves impossible to compile or install, or introduces a data loss bug, etc). * A "y+2" release is not yet available. e.g., the 1.0.z minor series will automatically be considered as end-of-life when Sqoop 1.2.0 is released. 1.0.z bugfix releases will be provided so long as the 1.1.z series remains the most current minor series. The last 1.y minor series will receive bugfix releases until end-of-life is proposed for the Sqoop 1 major series. This will occur sufficiently far in the future that Sqoop 2.y is considered stable. ** While end-of-life for a minor series is automatic when two new minor series are available that supercede it, end-of-life for a major series will be performed only by SIP. A major series will not reach end-of-life until the subsequent major series is ready for immediate migration by all clients. End-of-life for a major series will not take place without ample notice to allow clients to develop a migration plan. * No four-digit versions will be provided. When a new bugfix release is provided in a given minor series, the previous bugfix releases in that minor series are considered obsolete. Users are expected to run the most current bugfix release in a given minor series. h4. New features New features will be provided only in a new major or minor series. Bugfix releases do not include new features. * New features that do not represent backwards-incompatible changes will be provided in a new minor version. e.g., Sqoop 1.1 will contain new features not present in Sqoop 1.0. * A new feature release will include all bugfixes present in the previous minor release's latest bugfix release. e.g., Sqoop 1.1 will include all bugfixes present in a hypothetical Sqoop 1.0.4 release. * Forwards compatibility is not guaranteed. Sqoop 1.1 may generate code or import data in a fashion incompatible with Sqoop 1.0-based tools. Documentation will be included to highlight these differences. * No new features in a minor release will result in backwards incompatibility. A feature that is fundamentally backwards-incompatible with Sqoop 1.0 will be included (at the earliest) in Sqoop 2.0. ** When a sufficient number of incompatible changes have accrued, the 1.0 line will be terminated as end-of-life. A change which significantly prohibits the ability of code to be backported in an engineer-efficient fashion will be taken as "sufficient." This will not occur without a SIP announcing this intent and proposing a date for the end-of-life to take effect. h4. Deprecation of features * Features, command-line arguments, API mechanisms, etc. which should eventually be removed will be preserved for a given major version. Any command line arguments or public APIs present in Sqoop 1.0 will remain present in Sqoop 1.1, 1.2, etc. These may disappear in Sqoop 2.0. ** When deprecation is certain for a feature, API, etc, then any subsequently released minor series in the current major series will mark them as such. e.g., if @--some-flag@ is to be deprecated in Sqoop 2.0, and the current minor series is 1.2, then Sqoop 1.3, 1.4, etc. will post warnings about the deprecated nature of this flag when it is used. * The introduction of a sufficient number of deprecations will be used as grounds for end-of-development for the current major version. End-of-development for a major series will be preceeded by a SIP announcing this intent. * No features will be removed without a deprecation marker in at least one minor version. The final minor version on a major series may be identical to its predecessor with the exception of the introduction of deprecation flags publicising all intended deprecations. h2. Compatibility Issues h3. Hadoop versions Sqoop 1.0 is and intends to be compatible with the trunk of the Apache Hadoop project. Currently, Sqoop depends on no outstanding patches of Apache Hadoop; therefore it will likely be compatible with the planned Hadoop 0.21 release. Sqoop 1.0 should be compatible with future releases of Hadoop (e.g., 0.22) subject to Hadoop's API deprecation policy. Subsequent editions of Sqoop should target 0.21-based Apache Hadoop. Sqoop 1.0 is currently intended to be compatible with the beta release of CDH3 available at the Sqoop 1.0 ship date. Sqoop is currently tested with the (unreleased) CDH 3 beta 2. Sqoop 1.0 will not be compatible with the currently-released CDH 3 beta 1. Nightly snapshots of CDH3b2 are available from Cloudera's maven repository at "https://repository.cloudera.com/nexus":https://repository.cloudera.com/nexus If Sqoop uncovers bugs in Hadoop's database interface APIs (or other aspects of Hadoop), then later versions of Hadoop (e.g., 0.21.1) may be required to provide full correctness. h3. Prior editions of Sqoop Sqoop 1.0 *will not* be API-compatible with unversioned editions of Sqoop present in CDH3 beta 1 or CDH2. These Sqoop releases did not have a version number and are not subject to the end-of-development/end-of-life policy articulated in this document. Upon release of Sqoop 1.0, all prior unversioned Sqoop editions in CDH3 (beta) will be immediately declared end-of-life. All prior unversioned Sqoop editions in CDH2 are already implicitly in the end-of-development phase. Bugfix patches to CDH2-based Sqoop releases will be provided with CDH2 updates in accordance with Cloudera's update policy. The code generated by unversioned releases of Sqoop will be incompatible with Sqoop 1.0.0. Sqoop 1.0.0 will generate code that interprets data in a compatible fashion to existing generated code created by unversioned releases of Sqoop. This guarantee does not extend to future releases in the 1.x major series. h2. Test Plan Current Sqoop functional and unit tests are executed daily against CDH3 beta 2 and Apache Hadoop (development trunk). These tests will be run on any release candidate and will be required to pass. A release candidate will be made available to the community before tagging the release as official. When Apache Hadoop makes a branch for version 0.21, tests will be run against that platform as well. h2. Discussion Please provide feedback and comments at "http://github.com/cloudera/sqoop/issues/issue/9":http://github.com/cloudera/sqoop/issues/issue/9