UnivariateAmputer¶
- class ampute.UnivariateAmputer(strategy='mcar', subset=None, ratio_missingness=0.5, copy=True, random_state=None)[source]¶
Ampute a datasets in an univariate manner.
Univariate imputation refer to the introduction of missing values, one feature at a time.
- Parameters
- strategystr, default=”mcar”
The missingness strategy to ampute the data. Possible choices are:
"mcar": missing completely at random. This strategy implies that the missing values are amputated for a feature without any dependency with other features.
- subsetlist of {int, str}, int or float, default=None
The subset of the features to be amputated. The possible choices are:
None: all features are amputated.list of {int, str}: the indices or names of the features to be amputated.float: the ratio of features to be amputated.int: the number of features to be amputated.
- ratio_missingnessfloat or array-like, default=0.5
The ratio representing the amount of missing data to be generated. If a
float, all features to be imputed will have the same ratio. If an array-like, the ratio of missingness for each feature will be drawn from the array. It should be consistent withsubsetwhen a list is provided forsubset.- copybool, default=True
Whether to perform the amputation inplace or to trigger a copy. The default will trigger a copy.
- Attributes
- amputated_features_indices_ndarray of shape (n_selected_features,)
The indices of the features that have been amputated.
Examples
>>> from numpy.random import default_rng >>> rng = default_rng(0) >>> n_samples, n_features = 5, 3 >>> X = rng.normal(size=(n_samples, n_features))
One can amputate values using the common transformer
scikit-learnAPI:>>> amputer = UnivariateAmputer(random_state=42) >>> amputer.fit_transform(X) array([[ 0.12573022, -0.13210486, 0.64042265], [ nan, -0.53566937, nan], [ nan, nan, nan], [ nan, nan, 0.04132598], [-2.32503077, nan, -1.24591095]])
The amputer can be used in a scikit-learn
Pipeline.>>> from sklearn.impute import SimpleImputer >>> from sklearn.pipeline import make_pipeline >>> pipeline = make_pipeline( ... UnivariateAmputer(random_state=42), ... SimpleImputer(strategy="mean"), ... ) >>> pipeline.fit_transform(X) array([[ 0.12573022, -0.13210486, 0.64042265], [-1.09965028, -0.53566937, -0.18805411], [-1.09965028, -0.33388712, -0.18805411], [-1.09965028, -0.33388712, 0.04132598], [-2.32503077, -0.33388712, -1.24591095]])
You can use the class as a callable if you don’t need to use a
sklearn.pipeline.Pipeline:>>> from ampute import UnivariateAmputer >>> UnivariateAmputer(random_state=42)(X) array([[ 0.12573022, -0.13210486, 0.64042265], [ nan, -0.53566937, nan], [ nan, nan, nan], [ nan, nan, 0.04132598], [-2.32503077, nan, -1.24591095]])
Methods
fit(X[, y])Validation of the parameters of amputer.
fit_transform(X[, y])Fit to data, then transform it.
get_params([deep])Get parameters for this estimator.
set_params(**params)Set the parameters of this estimator.
transform(X[, y])Amputate the dataset
Xwith missing values.- fit(X, y=None)[source]¶
Validation of the parameters of amputer.
- Parameters
- X{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
The dataset to be amputated.
- yIgnored
Present to follow the scikit-learn API.
- Returns
- self
The validated amputer.
- fit_transform(X, y=None, **fit_params)[source]¶
Fit to data, then transform it.
Fits transformer to
Xandywith optional parametersfit_paramsand returns a transformed version ofX.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.
- transform(X, y=None)[source]¶
Amputate the dataset
Xwith missing values.- Parameters
- X{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
The dataset to be amputated.
- yIgnored
Present to follow the scikit-learn API.
- Returns
- X_amputed{ndarray, sparse matrix, dataframe} of shape (n_samples, n_features)
The dataset with missing values.