Apache Commons Statistics User GuideContentsOverview
Apache Commons Statistics provides utilities for statistical applications. The code
originated in the Commons Statistics is divided into a number of submodules:
Example ModulesIn addition to the modules above, the Commons Statistics source distribution contains example code demonstrating library functionality and/or providing useful development utilities. These modules are not part of the public API of the library and no guarantees are made concerning backwards compatibility. The example module parent page contains a listing of available modules. Probability DistributionsOverview
The APIThe distribution framework provides the means to compute probability density, probability mass and cumulative probability functions for several well-known discrete (integer-valued) and continuous probability distributions. The API also allows for the computation of inverse cumulative probabilities and sampling from distributions.
For an instance TDistribution t = TDistribution.of(29); double lowerTail = t.cumulativeProbability(-2.656); // P(T(29) <= -2.656) double upperTail = t.survivalProbability(2.75); // P(T(29) > 2.75)
For discrete
PoissonDistribution pd = PoissonDistribution.of(1.23); double p1 = pd.probability(5); double p2 = pd.probability(5, 5); double p3 = pd.probability(4, 5); // p2 == 0 // p1 == p3
Inverse distribution functions can be computed using the
\[ x = \begin{cases} \inf \{ x \in \mathbb R : P(X \le x) \ge p\} & \text{for } 0 \lt p \le 1 \\ \inf \{ x \in \mathbb R : P(X \le x) \gt 0 \} & \text{for } p = 0 \end{cases} \]
where \[ x = \begin{cases} \inf \{ x \in \mathbb R : P(X \ge x) \le p\} & \text{for } 0 \le p \lt 1 \\ \inf \{ x \in \mathbb R : P(X \ge x) \lt 1 \} & \text{for } p = 1 \end{cases} \] NormalDistribution n = NormalDistribution.of(0, 1); double x1 = n.inverseCumulativeProbability(1e-300); double x2 = n.inverseSurvivalProbability(1e-300); // x1 == -x2 ~ -37.0471
For discrete All distributions provide accessors for the parameters used to create the distribution, and a mean and variance. The return value when the mean or variance is undefined is noted in the class javadoc. ChiSquaredDistribution chi2 = ChiSquaredDistribution.of(42); double df = chi2.getDegreesOfFreedom(); // 42 double mean = chi2.getMean(); // 42 double variance = chi2.getVariance(); // 84 CauchyDistribution cauchy = CauchyDistribution.of(1.23, 4.56); double location = cauchy.getLocation(); // 1.23 double scale = cauchy.getScale(); // 4.56 double undefined1 = cauchy.getMean(); // NaN double undefined2 = cauchy.getVariance(); // NaN
The supported domain of the distribution is provided by the
BinomialDistribution b = BinomialDistribution.of(13, 0.15); int lower = b.getSupportLowerBound(); // 0 int upper = b.getSupportUpperBound(); // 13
All distributions implement a // From Commons RNG Simple UniformRandomProvider rng = RandomSource.KISS.create(123L); NormalDistribution n = NormalDistribution.of(0, 1); double x = n.createSampler(rng).sample(); // Generate a number of samples GeometricDistribution g = GeometricDistribution.of(0.75); int[] k = g.createSampler(rng).samples(100).toArray(); // k.length == 100
Note that even when distributions are immutable, the sampler is not immutable as it
depends on the instance of the mutable Implementation Details
Instances are constructed using factory methods, typically a static method in the
distribution class named Exceptions will be raised by the factory method when constructing the distribution using invalid parameters. See the class javadoc for exception conditions. Unless otherwise noted, distribution instances are immutable. This allows sharing an instance between threads for computations.
Exceptions will not be raised by distributions for an invalid
An exception will be raised by distributions for an invalid Complementary Probabilities
The distributions provide the cumulative probability The difference is illustrated with the result of computing the upper tail of a probability distribution. ChiSquaredDistribution chi2 = ChiSquaredDistribution.of(42); double q1 = 1 - chi2.cumulativeProbability(168); double q2 = chi2.survivalProbability(168); // q1 == 0 // q2 != 0
In this case the value
Probability computations should use the appropriate cumulative or survival function
to calculate the lower or upper tail respectively. The same care should be applied
when inverting probability distributions. It is preferred to compute either
ChiSquaredDistribution chi2 = ChiSquaredDistribution.of(42); double q = 5.43e-17; // Incorrect: p = 1 - q == 1.0 !!! double x1 = chi2.inverseCumulativeProbability(1 - q); // Correct: invert q double x2 = chi2.inverseSurvivalProbability(q); // x1 == +infinity // x2 ~ 168.0
Note: The survival probability functions were not present in the
|