Container for running an evaluation
An Evaluation is the running of one or more metrics on one or more target datasets and a (possibly optional) reference dataset. Evaluation can handle two types of metrics, unary and binary. The validity of an Evaluation is dependent upon the number and type of metrics as well as the number of datasets.
A unary metric is a metric that runs over a single dataset. If you add a unary metric to the Evaluation you are only required to add a reference dataset or a target dataset. If there are multiple datasets in the evaluation then the unary metric is run over all of them.
A binary metric is a metric that runs over a reference dataset and target dataset. If you add a binary metric you are required to add a reference dataset and at least one target dataset. The binary metrics are run over every (reference dataset, target dataset) pair in the Evaluation.
An Evaluation must have at least one metric to be valid.
Default Evaluation constructor.
Parameters: |
|
---|---|
Raises: | ValueError |
Add a Dataset to the Evaluation.
A target Dataset is compared against the reference dataset when the Evaluation is run with one or more metrics.
Parameters: | target_dataset (dataset.Dataset) – The target Dataset to add to the Evaluation. |
---|---|
Raises ValueError: | |
If a dataset to add isn’t an instance of Dataset. |
Add multiple Datasets to the Evaluation.
Parameters: | target_datasets (list of dataset.Dataset) – The list of datasets that should be added to the Evaluation. |
---|---|
Raises ValueError: | |
If a dataset to add isn’t an instance of Dataset. |
Add a metric to the Evaluation.
A metric is an instance of a class which inherits from metrics.Metric.
Parameters: | metric (metrics) – The metric instance to add to the Evaluation. |
---|---|
Raises ValueError: | |
If the metric to add isn’t a class that inherits from metrics.Metric. |
Add multiple metrics to the Evaluation.
A metric is an instance of a class which inherits from metrics.Metric.
Parameters: | metrics (list of metrics) – The list of metric instances to add to the Evaluation. |
---|---|
Raises ValueError: | |
If a metric to add isn’t a class that inherits from metrics.Metric. |
The list of “binary” metrics (A metric which takes two Datasets) that the Evaluation should use.
A list containing the results of running regular metric evaluations. The shape of results is (num_target_datasets, num_metrics) if the user doesn’t specify subregion information. Otherwise the shape is (num_target_datasets, num_metrics, num_subregions).
Run the evaluation.
There are two phases to a run of the Evaluation. First, if there are any “binary” metrics they are run through the evaluation. Binary metrics are only run if there is a reference dataset and at least one target dataset.
If there is subregion information provided then each dataset is subset before being run through the binary metrics.
..note:: Only the binary metrics are subset with subregion information.
Next, if there are any “unary” metrics they are run. Unary metrics are only run if there is at least one target dataset or a reference dataset.
The target dataset(s) which should each be compared with the reference dataset when the evaluation is run.
The list of “unary” metrics (A metric which takes one Dataset) that the Evaluation should use.
A list containing the results of running the unary metric evaluations. The shape of unary_results is (num_targets, num_metrics) where num_targets = num_target_ds + (1 if ref_dataset != None else 0