Note

This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the user guide for the big picture.

nidl.estimators.probes.ModelProbing

class nidl.estimators.probes.ModelProbing(embedding_estimator, probe, scoring=None, **kwargs)[source]

Bases: BaseEstimator

Estimator to probe the representation of an embedding estimator.

It has the following logic during fit:

  1. Embeds the training data through the embedding estimator (handles multi-gpu foward pass).

  2. Fit the probe on the training embedding (handles multi-cpu training).

Then, score and predict methods evaluate the probe on a dataset with the same logic.

Parameters:
embedding_estimator: BaseEstimator

The estimator to be probed. It must implement the transform_step method that takes a batch of data X and returns the corresponding embeddings.

probe: sklearn.base.BaseEstimator

The probe model to be trained on the embedding. It must implement fit and predict methods on numpy array.

scoring: str, callable, list, tuple, or dict, default=None

Strategy to evaluate the performance of the probe when calling the score method on this estimator.

If scoring represents a single score, one can use:

If scoring represents multiple scores, one can use:

  • a list or tuple of unique strings;

  • a callable returning a dictionary where the keys are the metric names and the values are the metric scores;

  • a dictionary with metric names as keys and callables a values.

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from nidl.estimators.probes import ModelProbing
>>> from nidl.dummy import DummyEmbeddingEstimator
>>> probing = ModelProbing(
...     embedding_estimator=DummyEmbeddingEstimator(),
...     probe=LogisticRegression(),
...     scoring=["accuracy", "balanced_accuracy"],
... )
>>> probing.fit(train_dataloader)
ModelProbing(...)
>>> probing.score(test_dataloader)
{'accuracy': 0.85, 'balanced_accuracy': 0.83}
__init__(embedding_estimator, probe, scoring=None, **kwargs)[source]
fit(train_dataloader, val_dataloader=None)[source]

Fit the probe on the training data embeddings.

Parameters:
train_dataloader: torch.utils.data.DataLoader

Training dataloader yielding batches in the form (X, y) used for further embedding and training of the probes.

val_dataloader: torch.utils.data.DataLoader or None, default=None

Ignored.

Returns:
self: object

The fitted estimator.

predict(test_dataloader)[source]

Predict the labels on the test dataset.

Parameters:
test_dataloader: torch.utils.data.DataLoader

Testing dataloader yielding batches in the form (X, y). y is ignored here.

Returns:
y_pred: torch.Tensor

The predicted labels.

score(test_dataloader, scoring=None)[source]

Score the probe on the test dataset.

Parameters:
test_dataloader: torch.utils.data.DataLoader

Testing dataloader yielding batches in the form (X, y). y must have same number of samples as X.

scoring: str, callable, list, tuple, or dict, default=None

Strategy to evaluate the performance of the probe. This allows to override the default scoring strategy defined at initialization.

If scoring represents a single score, one can use:

If scoring represents multiple scores, one can use:

  • a list or tuple of unique strings;

  • a callable returning a dictionary where the keys are the metric names and the values are the metric scores;

  • a dictionary with metric names as keys and callables a values.

Returns:
float or dict

The score(s) of the probe on the data embeddings. If a single score is used in the scoring strategy, returns a float. If multiple scores are defined, returns a dictionary with metric names as keys and metric scores as values.

Examples using nidl.estimators.probes.ModelProbing

Model probing of embedding estimators

Model probing of embedding estimators