chariots.sklearn

the sklearn module provides support for the scikit-learn framework.

this module provides two main classes (SKSupervisedOp, SKUnsupervisedOp) that need to be subclassed to be used. to do so you will need to set the model_class class attribute and potentially the model_parameters class attribute. this should be a VersionedFieldDict which defines the parameters your model should be initialized with. As for other machine learning ops, you can override the training_update_version class attribute to define which version will be changed when the operation is retrained:

>>> class PCAOp(SKUnsupervisedOp):
...     training_update_version = VersionType.MAJOR
...     model_parameters = VersionedFieldDict(VersionType.MAJOR, {"n_components": 2,})
...     model_class = VersionedField(PCA, VersionType.MAJOR)

Once your op class is define, you can use it as any MLOp choosing your MLMode to define the behavior of your operation (fit and/or predict):

>>> train_pca = Pipeline([Node(IrisXDataSet(), output_nodes=["x"]), Node(PCAOp(MLMode.FIT), input_nodes=["x"])],
...                      'train_pca')
class chariots.ml.sklearn.SKSupervisedOp(mode: chariots.ml._ml_mode.MLMode, op_callbacks: Optional[List[chariots.pipelines.callbacks._op_callback.OpCallBack]] = None)[source]

Bases: chariots.ml.sklearn._base_sk_op.BaseSKOp

Op base class to create supervised models using the scikit learn framework., If using the MLMode.FIT or MLMode.FIT_PREDICT, you will need to link this op to a X and a y upstream node:

>>> train_logistics = Pipeline([
...     Node(IrisFullDataSet(), output_nodes=["x", "y"]),
...     Node(PCAOp(MLMode.PREDICT), input_nodes=["x"], output_nodes="x_transformed"),
...     Node(LogisticOp(MLMode.FIT), input_nodes=["x_transformed", "y"])
... ], 'train_logistics')

and if you are using the op with the MLMode.PREDICT mode you will only need to link the op to an X upstream node:

>>> pred = Pipeline([
...     Node(IrisFullDataSet(),input_nodes=['__pipeline_input__'], output_nodes=["x"]),
...     Node(PCAOp(MLMode.PREDICT), input_nodes=["x"], output_nodes="x_transformed"),
...     Node(LogisticOp(MLMode.PREDICT), input_nodes=["x_transformed"], output_nodes=['__pipeline_output__'])
... ], 'pred')

To change the behavior of the Op, you can:

  • change the predict_function class attribute with a new VersionedField (to use predict_proba for instance)

  • change the fit_extra_parameters class attribute with a new VersionedFieldDict (to pass some new parameters during prediction)

fit(X, y)[source]

method used by the operation to fit the underlying model

DO NOT TRY TO OVERRIDE THIS METHOD.

Parameters
  • X – the input that the underlying supervised model will fit on (type must be compatible with the sklearn lib such as numpy arrays or pandas data frames)

  • y – the output that hte underlying supervised model will fit on (type must be compatible with the sklearn lib such as numpy arrays or pandas data frames)

fit_extra_parameters = <chariots.versioning._versioned_field_dict.VersionedFieldDict object>
predict(X) → Any[source]

method used internally by the op to predict with the underlying model.

DO NOT TRY TO OVERRIDE THIS METHOD.

Parameters

X – the input the model has to predict on. (type must be compatible with the sklearn lib such as numpy arrays or pandas data frames)

predict_function = 'predict'
class chariots.ml.sklearn.SKUnsupervisedOp(mode: chariots.ml._ml_mode.MLMode, op_callbacks: Optional[List[chariots.pipelines.callbacks._op_callback.OpCallBack]] = None)[source]

Bases: chariots.ml.sklearn._base_sk_op.BaseSKOp

base class to create unsupervised models using the scikit-learn framework. Whatever the mode you will need to link this op with a single upstream node:

>>> train_logistics = Pipeline([
...     Node(IrisFullDataSet(), output_nodes=["x", "y"]),
...     Node(PCAOp(MLMode.PREDICT), input_nodes=["x"], output_nodes="x_transformed"),
...     Node(LogisticOp(MLMode.FIT), input_nodes=["x_transformed", "y"])
... ], 'train_logistics')

>>> pred = Pipeline([
...     Node(IrisFullDataSet(),input_nodes=['__pipeline_input__'], output_nodes=["x"]),
...     Node(PCAOp(MLMode.PREDICT), input_nodes=["x"], output_nodes="x_transformed"),
...     Node(LogisticOp(MLMode.PREDICT), input_nodes=["x_transformed"], output_nodes=['__pipeline_output__'])
... ], 'pred')
fit(X)[source]

method used to fit the underlying unsupervised model.

DO NOT TRY TO OVERRIDE THIS METHOD.

Parameters

X – the dataset (compatible type with the sklearn lib as pandas data-frames or numpy arrays).

fit_extra_parameters = <chariots.versioning._versioned_field_dict.VersionedFieldDict object>
predict(X) → Any[source]

transforms the dataset using the underlying unsupervised model

DO NOT TRY TO OVERRIDE THIS METHOD.

Parameters

X – the dataset to transform (type must be compatible with the sklearn library such as pandas data frames or numpy arrays).

class chariots.ml.sklearn.BaseSKOp(mode: chariots.ml._ml_mode.MLMode, op_callbacks: Optional[List[chariots.pipelines.callbacks._op_callback.OpCallBack]] = None)[source]

Bases: chariots.ml._base_ml_op.BaseMLOp

base Op class for all the supervised and unsupervised scikit-learn ops

fit(*args, **kwargs)[source]

fits the inner model of the op on data (in args and kwargs) this method must not return any data (use the FIT_PREDICT mode to predict on the same data the op was trained on)

model_class = None
model_parameters = <chariots.versioning._versioned_field_dict.VersionedFieldDict object>
predict(*args, **kwargs) → Any[source]

the method used to do predictions/inference once the model has been fitted/loaded