UnivariateAmputer

class ampute.UnivariateAmputer(strategy='mcar', subset=None, ratio_missingness=0.5, copy=True, random_state=None)[source]

Ampute a datasets in an univariate manner.

Univariate imputation refer to the introduction of missing values, one feature at a time.

Parameters
strategystr, default=”mcar”

The missingness strategy to ampute the data. Possible choices are:

  • "mcar": missing completely at random. This strategy implies that the missing values are amputated for a feature without any dependency with other features.

subsetlist of {int, str}, int or float, default=None

The subset of the features to be amputated. The possible choices are:

  • None: all features are amputated.

  • list of {int, str}: the indices or names of the features to be amputated.

  • float: the ratio of features to be amputated.

  • int: the number of features to be amputated.

ratio_missingnessfloat or array-like, default=0.5

The ratio representing the amount of missing data to be generated. If a float, all features to be imputed will have the same ratio. If an array-like, the ratio of missingness for each feature will be drawn from the array. It should be consistent with subset when a list is provided for subset.

copybool, default=True

Whether to perform the amputation inplace or to trigger a copy. The default will trigger a copy.

Attributes
amputated_features_indices_ndarray of shape (n_selected_features,)

The indices of the features that have been amputated.

Examples

>>> from numpy.random import default_rng
>>> rng = default_rng(0)
>>> n_samples, n_features = 5, 3
>>> X = rng.normal(size=(n_samples, n_features))

One can amputate values using the common transformer scikit-learn API:

>>> amputer = UnivariateAmputer(random_state=42)
>>> amputer.fit_transform(X)
array([[ 0.12573022, -0.13210486,  0.64042265],
       [        nan, -0.53566937,         nan],
       [        nan,         nan,         nan],
       [        nan,         nan,  0.04132598],
       [-2.32503077,         nan, -1.24591095]])

The amputer can be used in a scikit-learn Pipeline.

>>> from sklearn.impute import SimpleImputer
>>> from sklearn.pipeline import make_pipeline
>>> pipeline = make_pipeline(
...     UnivariateAmputer(random_state=42),
...     SimpleImputer(strategy="mean"),
... )
>>> pipeline.fit_transform(X)
array([[ 0.12573022, -0.13210486,  0.64042265],
       [-1.09965028, -0.53566937, -0.18805411],
       [-1.09965028, -0.33388712, -0.18805411],
       [-1.09965028, -0.33388712,  0.04132598],
       [-2.32503077, -0.33388712, -1.24591095]])

You can use the class as a callable if you don’t need to use a sklearn.pipeline.Pipeline:

>>> from ampute import UnivariateAmputer
>>> UnivariateAmputer(random_state=42)(X)
array([[ 0.12573022, -0.13210486,  0.64042265],
    [        nan, -0.53566937,         nan],
    [        nan,         nan,         nan],
    [        nan,         nan,  0.04132598],
    [-2.32503077,         nan, -1.24591095]])

Methods

fit(X[, y])

Validation of the parameters of amputer.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

transform(X[, y])

Amputate the dataset X with missing values.

fit(X, y=None)[source]

Validation of the parameters of amputer.

Parameters
X{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

The dataset to be amputated.

yIgnored

Present to follow the scikit-learn API.

Returns
self

The validated amputer.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

transform(X, y=None)[source]

Amputate the dataset X with missing values.

Parameters
X{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

The dataset to be amputated.

yIgnored

Present to follow the scikit-learn API.

Returns
X_amputed{ndarray, sparse matrix, dataframe} of shape (n_samples, n_features)

The dataset with missing values.

Examples using ampute.UnivariateAmputer