chariots.sklearn¶
the sklearn module provides support for the scikit-learn framework.
this module provides two main classes (SKSupervisedOp, SKUnsupervisedOp) that need to be subclassed to be used. to do so you will need to set the model_class class attribute and potentially the model_parameters class attribute. this should be a VersionedFieldDict which defines the parameters your model should be initialized with. As for other machine learning ops, you can override the training_update_version class attribute to define which version will be changed when the operation is retrained:
>>> class PCAOp(SKUnsupervisedOp):
... training_update_version = VersionType.MAJOR
... model_parameters = VersionedFieldDict(VersionType.MAJOR, {"n_components": 2,})
... model_class = VersionedField(PCA, VersionType.MAJOR)
Once your op class is define, you can use it as any MLOp choosing your MLMode to define the behavior of your operation (fit and/or predict):
>>> train_pca = Pipeline([Node(IrisXDataSet(), output_nodes=["x"]), Node(PCAOp(MLMode.FIT), input_nodes=["x"])],
... 'train_pca')
-
class
chariots.ml.sklearn.
SKSupervisedOp
(mode: chariots.ml._ml_mode.MLMode, op_callbacks: Optional[List[chariots.pipelines.callbacks._op_callback.OpCallBack]] = None)[source]¶ Bases:
chariots.ml.sklearn._base_sk_op.BaseSKOp
Op base class to create supervised models using the scikit learn framework., If using the MLMode.FIT or MLMode.FIT_PREDICT, you will need to link this op to a X and a y upstream node:
>>> train_logistics = Pipeline([ ... Node(IrisFullDataSet(), output_nodes=["x", "y"]), ... Node(PCAOp(MLMode.PREDICT), input_nodes=["x"], output_nodes="x_transformed"), ... Node(LogisticOp(MLMode.FIT), input_nodes=["x_transformed", "y"]) ... ], 'train_logistics')
and if you are using the op with the MLMode.PREDICT mode you will only need to link the op to an X upstream node:
>>> pred = Pipeline([ ... Node(IrisFullDataSet(),input_nodes=['__pipeline_input__'], output_nodes=["x"]), ... Node(PCAOp(MLMode.PREDICT), input_nodes=["x"], output_nodes="x_transformed"), ... Node(LogisticOp(MLMode.PREDICT), input_nodes=["x_transformed"], output_nodes=['__pipeline_output__']) ... ], 'pred')
To change the behavior of the Op, you can:
change the predict_function class attribute with a new VersionedField (to use predict_proba for instance)
change the fit_extra_parameters class attribute with a new VersionedFieldDict (to pass some new parameters during prediction)
-
fit
(X, y)[source]¶ method used by the operation to fit the underlying model
DO NOT TRY TO OVERRIDE THIS METHOD.
- Parameters
X – the input that the underlying supervised model will fit on (type must be compatible with the sklearn lib such as numpy arrays or pandas data frames)
y – the output that hte underlying supervised model will fit on (type must be compatible with the sklearn lib such as numpy arrays or pandas data frames)
-
fit_extra_parameters
= <chariots.versioning._versioned_field_dict.VersionedFieldDict object>¶
-
predict
(X) → Any[source]¶ method used internally by the op to predict with the underlying model.
DO NOT TRY TO OVERRIDE THIS METHOD.
- Parameters
X – the input the model has to predict on. (type must be compatible with the sklearn lib such as numpy arrays or pandas data frames)
-
predict_function
= 'predict'¶
-
class
chariots.ml.sklearn.
SKUnsupervisedOp
(mode: chariots.ml._ml_mode.MLMode, op_callbacks: Optional[List[chariots.pipelines.callbacks._op_callback.OpCallBack]] = None)[source]¶ Bases:
chariots.ml.sklearn._base_sk_op.BaseSKOp
base class to create unsupervised models using the scikit-learn framework. Whatever the mode you will need to link this op with a single upstream node:
>>> train_logistics = Pipeline([ ... Node(IrisFullDataSet(), output_nodes=["x", "y"]), ... Node(PCAOp(MLMode.PREDICT), input_nodes=["x"], output_nodes="x_transformed"), ... Node(LogisticOp(MLMode.FIT), input_nodes=["x_transformed", "y"]) ... ], 'train_logistics') >>> pred = Pipeline([ ... Node(IrisFullDataSet(),input_nodes=['__pipeline_input__'], output_nodes=["x"]), ... Node(PCAOp(MLMode.PREDICT), input_nodes=["x"], output_nodes="x_transformed"), ... Node(LogisticOp(MLMode.PREDICT), input_nodes=["x_transformed"], output_nodes=['__pipeline_output__']) ... ], 'pred')
-
fit
(X)[source]¶ method used to fit the underlying unsupervised model.
DO NOT TRY TO OVERRIDE THIS METHOD.
- Parameters
X – the dataset (compatible type with the sklearn lib as pandas data-frames or numpy arrays).
-
fit_extra_parameters
= <chariots.versioning._versioned_field_dict.VersionedFieldDict object>¶
-
-
class
chariots.ml.sklearn.
BaseSKOp
(mode: chariots.ml._ml_mode.MLMode, op_callbacks: Optional[List[chariots.pipelines.callbacks._op_callback.OpCallBack]] = None)[source]¶ Bases:
chariots.ml._base_ml_op.BaseMLOp
base Op class for all the supervised and unsupervised scikit-learn ops
-
fit
(*args, **kwargs)[source]¶ fits the inner model of the op on data (in args and kwargs) this method must not return any data (use the FIT_PREDICT mode to predict on the same data the op was trained on)
-
model_class
= None¶
-
model_parameters
= <chariots.versioning._versioned_field_dict.VersionedFieldDict object>¶
-