04. Loading from and Saving to Disk

Note: the appearance of this notebook will depend on the environment and screen size you're using. If the tables are being clipped or the figures look off, consider trying Google Colab or Github via the buttons below. This notebook was created in VSCode, and will likely look best locally.

Static Badge Static Badge

Setup

So far, we've created all of the confusion matrices on the fly, directly in the notebook. This is not really realistic; most of the time a confusion matrix is created in some validation loop in a different environment than the one we'll be using to do analysis on the results. In this notebook, we'll load in some confusion matrices from the filesystem, and we'll save the report configuration to the filesystem for easy reproducibility.

As an example, let's say we conducted an experiment with the MNIST digits dataset, where we compare the performance of an MLP against that of an SVM. We performed 5 fold cross-validation, and save each produced test split confusion matrix as '{$MODEL}_{$FOLD}.csv' files.

In [1]:
# Shows the contents of the './mnist_digits' directory
%ls ./mnist_digits
mlp_0.csv  mlp_2.csv  mlp_4.csv  svm_1.csv  svm_3.csv
mlp_1.csv  mlp_3.csv  svm_0.csv  svm_2.csv  svm_4.csv

First, let's configure a Study, where we analyze the models based on their average accuracy, F1 (binary, micro and macro aggregated) and MCC scores.

In [2]:
import prob_conf_mat as pcm

study = pcm.Study(seed=0, num_samples=10000, ci_probability=0.95)

study.add_metric(metric="f1@macro", aggregation="fe_gaussian")

The Study object requires us to pass in a valid confusion matrix for each experiment. Luckily, some useful utilities are provided in the prob_conf_mat.io module. Using Python's standard pathlib, we can now iterate over all produced confusion matrices, and create an experiment for them.

Make sure to provide a prior for the prevalence and confusion, otherwise we'll see a lot of warnings.

In [3]:
from pathlib import Path

from prob_conf_mat.io import load_csv

# Iterate over all found csv files
for file_path in sorted(Path("./mnist_digits").glob("*.csv")):
    # Split the file name to recover the model and fold
    model, fold = file_path.stem.split("_")

    # Load in the confusion matrix using the utility function
    confusion_matrix = load_csv(location=file_path)

    # Add the experiment to the study
    study.add_experiment(
        experiment_name=f"{model}/fold_{fold}",  # The name of the experiment group and experiment
        confusion_matrix=confusion_matrix,  # The confusion matrix
        prevalence_prior="ones",
        confusion_prior="zeros",
    )

When we inspect the study, we see that all the necessary experiments and metrics have been introduced.

In [4]:
study
Out[4]:
Study(experiments=['mlp/fold_0', 'mlp/fold_1', 'mlp/fold_2', 'mlp/fold_3', 'mlp/fold_4', 'svm/fold_0', 'svm/fold_1', 'svm/fold_2', 'svm/fold_3', 'svm/fold_4']), metrics=['f1@macro'])

Now let's pretend to run an analysis.

In [5]:
report_1 = study.report_aggregated_metric_summaries(metric="f1@macro")

Saving the Study

Let's say you wanted to share your analysis with a colleague. You could share all of the necessary files as a directory, or equivalently, you could just provide the study's configuration. We can access the configuration as a Python dict using the .to_dict() method.

In [6]:
from pprint import pprint

study_config = study.to_dict()

pprint(study_config, indent=2, width=100, depth=3, sort_dicts=False)
{ 'seed': 0,
  'num_samples': 10000,
  'ci_probability': 0.95,
  'experiments': { 'mlp': { 'fold_0': {...},
                            'fold_1': {...},
                            'fold_2': {...},
                            'fold_3': {...},
                            'fold_4': {...}},
                   'svm': { 'fold_0': {...},
                            'fold_1': {...},
                            'fold_2': {...},
                            'fold_3': {...},
                            'fold_4': {...}}},
  'metrics': {'f1@macro': {'aggregation': 'fe_gaussian'}}}

This is a standard Python dict, and can be saved however you like (e.g., as a human readable YAML or TOML file, as a pickle object, as a JSON file, etc.). This can then be shared with others to replicate your work.

Specifically, we can recreate a study by using the .from_dict classmethod:

In [7]:
new_study = pcm.Study.from_dict(study_config)

The new study will be configured in the same way as before:

In [8]:
new_study.report_aggregated_metric_summaries(metric="f1@macro")
Out[8]:
Group Median Mode HDI MU Kurtosis Skew Var. Within Var. Between I2
mlp 0.85850.8576[0.8495, 0.8686]0.0191 -0.0121-0.0161 0.0001 0.000136.65%
svm 0.85440.8543[0.8450, 0.8642]0.0192 0.0039-0.0568 0.0001 0.000135.08%

Note that the state of RNG is not saved. This means that generally speaking, the results will not be exactly the same. This is because the order in which experiments and metrics are added to the study, as well as the order in which report elements are generated will influence the RNG's state.

However, in this case, this ordering is the same as before, and thus the output is the same as well.

In [9]:
report_1
Out[9]:
Group Median Mode HDI MU Kurtosis Skew Var. Within Var. Between I2
mlp 0.85850.8576[0.8495, 0.8686]0.0191 -0.0121-0.0161 0.0001 0.000136.65%
svm 0.85440.8543[0.8450, 0.8642]0.0192 0.0039-0.0568 0.0001 0.000135.08%

With a large enough num_samples parameter, the study's outcomes will be robust to minor RNG changes.

Next Steps

This was the last tutorial. You should now be able to use prob_conf_mat with some confidence. The other documentation sections will help deepen you understanding of the library, when you're ready for it.

For more on IO:

For more advanced material:

  • Check out the how-to guide on extending the library with your own metrics or your own averaging, experiment aggregation, or IO methods
  • Check out the replication case-study for some advanced use-cases and custom plots