UnivariateAmputer¶

class ampute.UnivariateAmputer(strategy='mcar', subset=None, ratio_missingness=0.5, copy=True, random_state=None)[source]¶

Ampute a datasets in an univariate manner.

Univariate imputation refer to the introduction of missing values, one feature at a time.

Parameters

strategystr, default=”mcar”

The missingness strategy to ampute the data. Possible choices are:

"mcar": missing completely at random. This strategy implies that the missing values are amputated for a feature without any dependency with other features.

subsetlist of {int, str}, int or float, default=None

The subset of the features to be amputated. The possible choices are:

None: all features are amputated.
list of {int, str}: the indices or names of the features to be amputated.
float: the ratio of features to be amputated.
int: the number of features to be amputated.

ratio_missingnessfloat or array-like, default=0.5

The ratio representing the amount of missing data to be generated. If a float, all features to be imputed will have the same ratio. If an array-like, the ratio of missingness for each feature will be drawn from the array. It should be consistent with subset when a list is provided for subset.

copybool, default=True

Whether to perform the amputation inplace or to trigger a copy. The default will trigger a copy.

Attributes

amputated_features_indices_ndarray of shape (n_selected_features,): The indices of the features that have been amputated.

Examples

>>> from numpy.random import default_rng
>>> rng = default_rng(0)
>>> n_samples, n_features = 5, 3
>>> X = rng.normal(size=(n_samples, n_features))

One can amputate values using the common transformer scikit-learn API:

>>> amputer = UnivariateAmputer(random_state=42)
>>> amputer.fit_transform(X)
array([[ 0.12573022, -0.13210486,  0.64042265],
       [        nan, -0.53566937,         nan],
       [        nan,         nan,         nan],
       [        nan,         nan,  0.04132598],
       [-2.32503077,         nan, -1.24591095]])

The amputer can be used in a scikit-learn Pipeline.

>>> from sklearn.impute import SimpleImputer
>>> from sklearn.pipeline import make_pipeline
>>> pipeline = make_pipeline(
...     UnivariateAmputer(random_state=42),
...     SimpleImputer(strategy="mean"),
... )
>>> pipeline.fit_transform(X)
array([[ 0.12573022, -0.13210486,  0.64042265],
       [-1.09965028, -0.53566937, -0.18805411],
       [-1.09965028, -0.33388712, -0.18805411],
       [-1.09965028, -0.33388712,  0.04132598],
       [-2.32503077, -0.33388712, -1.24591095]])

You can use the class as a callable if you don’t need to use a sklearn.pipeline.Pipeline:

>>> from ampute import UnivariateAmputer
>>> UnivariateAmputer(random_state=42)(X)
array([[ 0.12573022, -0.13210486,  0.64042265],
    [        nan, -0.53566937,         nan],
    [        nan,         nan,         nan],
    [        nan,         nan,  0.04132598],
    [-2.32503077,         nan, -1.24591095]])

Methods

`fit`(X[, y])	Validation of the parameters of amputer.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X[, y])	Amputate the dataset `X` with missing values.

fit(X, y=None)[source]¶

Validation of the parameters of amputer.

Parameters

X{array-like, sparse matrix, dataframe} of shape (n_samples, n_features): The dataset to be amputated.
yIgnored: Present to follow the scikit-learn API.

Returns

self: The validated amputer.

fit_transform(X, y=None, **fit_params)[source]¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

set_params(**params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.

transform(X, y=None)[source]¶

Amputate the dataset X with missing values.

Parameters

X{array-like, sparse matrix, dataframe} of shape (n_samples, n_features): The dataset to be amputated.
yIgnored: Present to follow the scikit-learn API.

Returns

X_amputed{ndarray, sparse matrix, dataframe} of shape (n_samples, n_features): The dataset with missing values.

Examples using `ampute.UnivariateAmputer`¶

API reference

Examples

UnivariateAmputer¶

Examples using ampute.UnivariateAmputer¶

Examples using `ampute.UnivariateAmputer`¶