Sklearn
The sklearn builder()
, converts
a pipeline made of Node
s into a sklearn
Pipeline
.
Requirements
This requires sklearn
which can be installed with:
Each kind of node corresponds to a different part of the end pipeline:
Fixed
- The estimator will simply be cloned, allowing you
to directly configure some object in a pipeline.
from sklearn.ensemble import RandomForestClassifier
from amltk.pipeline import Fixed
est = Fixed(RandomForestClassifier(n_estimators=25))
built_pipeline = est.build("sklearn")
Pipeline(steps=[('RandomForestClassifier', RandomForestClassifier(n_estimators=25))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('RandomForestClassifier', RandomForestClassifier(n_estimators=25))])
RandomForestClassifier(n_estimators=25)
Component
- The estimator will be built from the
component's config. This is mostly useful to allow a space to be defined for
the component.
from sklearn.ensemble import RandomForestClassifier
from amltk.pipeline import Component
est = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
# ... Likely get the configuration through an optimizer or sampling
configured_est = est.configure({"n_estimators": 25})
built_pipeline = configured_est.build("sklearn")
Pipeline(steps=[('RandomForestClassifier', RandomForestClassifier(n_estimators=25))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('RandomForestClassifier', RandomForestClassifier(n_estimators=25))])
RandomForestClassifier(n_estimators=25)
Sequential
- The sequential will be converted into a
Pipeline
, building whatever nodes are contained
within in.
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from amltk.pipeline import Component, Sequential
pipeline = Sequential(
PCA(n_components=3),
Component(RandomForestClassifier, config={"n_estimators": 25})
)
built_pipeline = pipeline.build("sklearn")
Pipeline(steps=[('PCA', PCA(n_components=3)), ('RandomForestClassifier', RandomForestClassifier(n_estimators=25))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('PCA', PCA(n_components=3)), ('RandomForestClassifier', RandomForestClassifier(n_estimators=25))])
PCA(n_components=3)
RandomForestClassifier(n_estimators=25)
Split
- The split will be converted into a
ColumnTransformer
, where each path
and the data that should go through it is specified by the split's config.
You can provide a ColumnTransformer
directly as the item to the Split
,
or otherwise if left blank, it will default to the standard sklearn one.
You can use a Fixed
with the special keyword "passthrough"
as you might normally
do with a ColumnTransformer
.
By default, we provide two special keywords you can provide to a Split
,
namely "categorical"
and "numerical"
, which will
automatically configure a ColumnTransorfmer
to pass the appropraite
columns of a data-frame to the given paths.
from amltk.pipeline import Split, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
categorical_pipeline = [
SimpleImputer(strategy="constant", fill_value="missing"),
Component(
OneHotEncoder,
space={
"min_frequency": (0.01, 0.1),
"handle_unknown": ["ignore", "infrequent_if_exist"],
},
config={"drop": "first"},
),
]
numerical_pipeline = [SimpleImputer(strategy="median"), StandardScaler()]
split = Split(
{
"categorical": categorical_pipeline,
"numerical": numerical_pipeline
}
)
╭─ Split(Split-cEua0bAh) ──────────────────────────────────────────────────────╮
│ ╭─ Sequential(categorical) ─────────╮ ╭─ Sequential(numerical) ────────────╮ │
│ │ ╭─ Fixed(SimpleImputer) ────────╮ │ │ ╭─ Fixed(SimpleImputer) ─────────╮ │ │
│ │ │ item SimpleImputer(fill_valu… │ │ │ │ item SimpleImputer(strategy='… │ │ │
│ │ │ strategy='constant') │ │ │ ╰────────────────────────────────╯ │ │
│ │ ╰───────────────────────────────╯ │ │ ↓ │ │
│ │ ↓ │ │ ╭─ Fixed(StandardScaler) ─╮ │ │
│ │ ╭─ Component(OneHotEncoder) ────╮ │ │ │ item StandardScaler() │ │ │
│ │ │ item class │ │ │ ╰─────────────────────────╯ │ │
│ │ │ OneHotEncoder(...) │ │ ╰────────────────────────────────────╯ │
│ │ │ config {'drop': 'first'} │ │ │
│ │ │ space { │ │ │
│ │ │ 'min_frequency': ( │ │ │
│ │ │ 0.01, │ │ │
│ │ │ 0.1 │ │ │
│ │ │ ), │ │ │
│ │ │ 'handle_unknown': │ │ │
│ │ │ [ │ │ │
│ │ │ 'ignore', │ │ │
│ │ │ 'infrequent_i… │ │ │
│ │ │ ] │ │ │
│ │ │ } │ │ │
│ │ ╰───────────────────────────────╯ │ │
│ ╰───────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
You can manually specify the column selectors if you prefer.
Join
- The join will be converted into a
FeatureUnion
.
from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
join = Join(PCA(n_components=2), SelectKBest(k=3), name="my_feature_union")
pipeline = join.build("sklearn")
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA(n_components=2)), ('SelectKBest', SelectKBest(k=3))]))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA(n_components=2)), ('SelectKBest', SelectKBest(k=3))]))])
FeatureUnion(transformer_list=[('PCA', PCA(n_components=2)), ('SelectKBest', SelectKBest(k=3))])
PCA(n_components=2)
SelectKBest(k=3)
Choice
- The estimator will be built from the chosen
component's config. This is very similar to Component
.
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from amltk.pipeline import Choice
# The choice here is usually provided during the `.configure()` step.
estimator_choice = Choice(
RandomForestClassifier(),
MLPClassifier(),
config={"__choice__": "RandomForestClassifier"}
)
built_pipeline = estimator_choice.build("sklearn")
Pipeline(steps=[('RandomForestClassifier', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('RandomForestClassifier', RandomForestClassifier())])
RandomForestClassifier()
def build(node, *, pipeline_type=SklearnPipeline, **pipeline_kwargs)
#
Build a pipeline into a usable object.
PARAMETER | DESCRIPTION |
---|---|
node |
The node from which to build a pipeline. |
pipeline_type |
The type of pipeline to build. Defaults to the standard sklearn pipeline but can be any derivative of that, i.e. ImbLearn's pipeline. |
**pipeline_kwargs |
The kwargs to pass to the pipeline_type.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
SklearnPipelineT
|
The built pipeline |