The interface taxonomy classification provided here is for guidance to developers and users of interfaces. The classification guides a developer to declare the targeted audience or users of an interface and also its stability.
Benefits to the user of an interface: Knows which interfaces to use or not use and their stability.
Benefits to the developer: to prevent accidental changes of interfaces and hence accidental impact on users or other components or system. This is particularly useful in large systems with many developers who may not all have a shared state/history of the project.
Hadoop adopts the following interface classification, this classification was derived from the OpenSolaris taxonomy and, to some extent, from taxonomy used inside Yahoo. Interfaces have two main attributes: Audience and Stability
Audience denotes the potential consumers of the interface. While many interfaces are internal/private to the implementation, other are public/external interfaces that are meant for wider consumption by applications and/or clients. For example, in posix, libc is an external or public interface, while large parts of the kernel are internal or private interfaces. Also, some interfaces are targeted towards other specific subsystems.
Identifying the audience of an interface helps define the impact of breaking it. For instance, it might be okay to break the compatibility of an interface whose audience is a small number of specific subsystems. On the other hand, it is probably not okay to break a protocol interface that millions of Internet users depend on.
Hadoop uses the following kinds of audience in order of increasing/wider visibility:
Hadoop doesn’t have a Company-Private classification, which is meant for APIs which are intended to be used by other projects within the company, since it doesn’t apply to opensource projects. Also, certain APIs are annotated as @VisibleForTesting (from com.google.common .annotations.VisibleForTesting) - these are meant to be used strictly for unit tests and should be treated as “Private” APIs.
A Private interface is for internal use within the project (such as HDFS or MapReduce) and should not be used by applications or by other projects. Most interfaces of a project are Private (also referred to as project-private). Unless an interface is intentionally exposed for external consumption, it should be marked Private.
A Limited-Private interface is used by a specified set of projects or systems (typically closely related projects). Other projects or systems should not use the interface. Changes to the interface will be communicated/negotiated with the specified projects. For example, in the Hadoop project, some interfaces are LimitedPrivate{HDFS, MapReduce} in that they are private to the HDFS and MapReduce projects.
Changes to an API fall into two broad categories: compatible and incompatible. A compatible change is a change that meets the following criteria:
Any change that does not meet these three criteria is an incompatible change. Stated simply a compatible change will not break existing clients. These examples are compatible changes:
These examples are incompatible changes:
Stability denotes how stable an interface is and when compatible and incompatible changes to the interface are allowed. Hadoop APIs have the following levels of stability.
A Stable interface is exposed as a preferred means of communication. A Stable interface is expected not to change incompatibly within a major release and hence serves as a safe development target. A Stable interface may evolve compatibly between minor releases.
Incompatible changes allowed: major (X.0.0) Compatible changes allowed: maintenance (x.Y.0)
An Evolving interface is typically exposed so that users or external code can make use of a feature before it has stabilized. The expectation that an interface should “eventually” stabilize and be promoted to Stable, however, is not a requirement for the interface to be labeled as Evolving.
Incompatible changes are allowed for Evolving interface only at minor releases.
Incompatible changes allowed: minor (x.Y.0) Compatible changes allowed: maintenance (x.y.Z)
An Unstable interface is one for which no compatibility guarantees are made. An Unstable interface is not necessarily unstable. An unstable interface is typically exposed because a user or external code needs to access an interface that is not intended for consumption. The interface is exposed as an Unstable interface to state clearly that even though the interface is exposed, it is not the preferred access path, and no compatibility guarantees are made for it.
Unless there is a reason to offer a compatibility guarantee on an interface, whether it is exposed or not, it should be labeled as Unstable. Private interfaces also should be Unstable in most cases.
Incompatible changes to Unstable interfaces are allowed at any time.
Incompatible changes allowed: maintenance (x.y.Z) Compatible changes allowed: maintenance (x.y.Z)
How will the classification be recorded for Hadoop APIs?
Each interface or class will have the audience and stability recorded using annotations in the org.apache.hadoop.classification package.
The javadoc generated by the maven target javadoc:javadoc lists only the public API.
One can derive the audience of java classes and java interfaces by the audience of the package in which they are contained. Hence it is useful to declare the audience of each java package as public or private (along with the private audience variations).
How will the classification be recorded for other interfaces, such as CLIs?
Why aren’t the java scopes (private, package private and public) good enough?
But I can easily access a Private interface if it is Java public. Where is the protection and control?
Why bother declaring the stability of a Private interface? Aren’t Private interfaces always Unstable?
What is the harm in applications using a Private interface that is Stable? How is it different from a Public Stable interface?
Why bother with Limited-Private? Isn’t it giving special treatment to some projects? That is not fair.
Let’s treat all Private interfaces as Limited-Private for all of Hadoop. What is the harm if projects in the Hadoop family have access to private classes?
Aren’t all Public interfaces Stable?