Components
You can use the various different node types to build a pipeline.
You can connect these nodes together using either the constructors explicitly, as shown in the examples. We also provide some index operators:
>>
- Connect nodes together to form aSequential
&
- Connect nodes together to form aJoin
|
- Connect nodes together to form aChoice
There is also another short-hand that you may find useful to know:
{comp1, comp2, comp3}
- This will automatically be converted into aChoice
between the given components.(comp1, comp2, comp3)
- This will automatically be converted into aJoin
between the given components.[comp1, comp2, comp3]
- This will automatically be converted into aSequential
between the given components.
For each of these components we will show examples using
the "sklearn"
builder
The components are:
Component#
A Component
of the pipeline with
a possible item and no children.
This is the basic building block of most pipelines, it accepts
as it's item=
some function that will be
called with build_item()
to
build that one part of the pipeline.
When build_item()
is called,
The .config
on this node will be passed
to the function to build the item.
A common pattern is to use a Component
to
wrap a constructor, specifying the space=
and config=
to be used when building the
item.
from amltk.pipeline import Component
from sklearn.ensemble import RandomForestClassifier
rf = Component(
RandomForestClassifier,
config={"max_depth": 3},
space={"n_estimators": (10, 100)}
)
config = {"n_estimators": 50} # Sample from some space or something
configured_rf = rf.configure(config)
estimator = configured_rf.build_item()
╭─ Component(RandomForestClassifier) ──────╮
│ item class RandomForestClassifier(...) │
│ config {'max_depth': 3} │
│ space {'n_estimators': (10, 100)} │
╰──────────────────────────────────────────╯
RandomForestClassifier(max_depth=3, n_estimators=50)
Whenever some other node sees a function/constructor, i.e. RandomForestClassifier
,
this will automatically be converted into a Component
.
from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier
pipeline = Sequential(RandomForestClassifier, name="my_pipeline")
╭─ Sequential(my_pipeline) ──────────────────╮
│ ╭─ Component(RandomForestClassifier) ────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ ╰────────────────────────────────────────╯ │
╰────────────────────────────────────────────╯
The default .name
of a component is the name of the class/function that it will
use. You can explicitly set the name=
if you want to when constructing the
component.
Like all Node
s, a Component
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Sequential#
A Sequential
set of operations in a pipeline.
This indicates the different children in
.nodes
should act one after
another, feeding the output of one into the next.
from amltk.pipeline import Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
pipeline = Sequential(
PCA(n_components=3),
Component(RandomForestClassifier, space={"n_estimators": (10, 100)}),
name="my_pipeline"
)
space = pipeline.search_space("configspace")
configuration = space.sample_configuration()
configured_pipeline = pipeline.configure(configuration)
sklearn_pipeline = pipeline.build("sklearn")
╭─ Sequential(my_pipeline) ───────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ─────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰─────────────────────────────────────────╯ │
╰─────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
my_pipeline:RandomForestClassifier:n_estimators, Type: UniformInteger,
Range: [10, 100], Default: 55
Configuration(values={
'my_pipeline:RandomForestClassifier:n_estimators': 84,
})
╭─ Sequential(my_pipeline) ────────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ──────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ config {'n_estimators': 84} │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰──────────────────────────────────────────╯ │
╰──────────────────────────────────────────────╯
Pipeline(steps=[('PCA', PCA(n_components=3)), ('RandomForestClassifier', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('PCA', PCA(n_components=3)), ('RandomForestClassifier', RandomForestClassifier())])
PCA(n_components=3)
RandomForestClassifier()
You may also just chain together nodes using an infix operator >>
if you prefer:
from amltk.pipeline import Join, Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
pipeline = (
Sequential(name="my_pipeline")
>> PCA(n_components=3)
>> Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
)
╭─ Sequential(my_pipeline) ───────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ─────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰─────────────────────────────────────────╯ │
╰─────────────────────────────────────────────╯
Whenever some other node sees a list, i.e. [comp1, comp2, comp3]
, this
will automatically be converted into a Sequential
.
from amltk.pipeline import Choice
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
pipeline_choice = Choice(
[SimpleImputer(), RandomForestClassifier()],
[StandardScaler(), MLPClassifier()],
name="pipeline_choice"
)
╭─ Choice(pipeline_choice) ───────────────────────────────────────────────╮
│ ╭─ Sequential(Seq-qizvtpNM) ──────────╮ ╭─ Sequential(Seq-sXD6ZpNr) ──╮ │
│ │ ╭─ Fixed(SimpleImputer) ─╮ │ │ ╭─ Fixed(StandardScaler) ─╮ │ │
│ │ │ item SimpleImputer() │ │ │ │ item StandardScaler() │ │ │
│ │ ╰────────────────────────╯ │ │ ╰─────────────────────────╯ │ │
│ │ ↓ │ │ ↓ │ │
│ │ ╭─ Fixed(RandomForestClassifier) ─╮ │ │ ╭─ Fixed(MLPClassifier) ─╮ │ │
│ │ │ item RandomForestClassifier() │ │ │ │ item MLPClassifier() │ │ │
│ │ ╰─────────────────────────────────╯ │ │ ╰────────────────────────╯ │ │
│ ╰─────────────────────────────────────╯ ╰─────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────────╯
Like all Node
s, a Sequential
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Choice#
A Choice
between different subcomponents.
This indicates that a choice should be made between the different children in
.nodes
, usually done when you
configure()
with some config
from
a search_space()
.
from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})
estimator_choice = Choice(rf, mlp, name="estimator")
space = estimator_choice.search_space("configspace")
config = space.sample_configuration()
configured_choice = estimator_choice.configure(config)
chosen_estimator = configured_choice.chosen()
estimator = chosen_estimator.build_item()
╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
│ ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮ │
│ │ item class MLPClassifier(...) │ │ item class │ │
│ │ space { │ │ RandomForestClassifier(...) │ │
│ │ 'activation': [ │ │ space {'n_estimators': (10, 100)} │ │
│ │ 'logistic', │ ╰────────────────────────────────────╯ │
│ │ 'relu', │ │
│ │ 'tanh' │ │
│ │ ] │ │
│ │ } │ │
│ ╰────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
estimator:MLPClassifier:activation, Type: Categorical, Choices: {logistic,
relu, tanh}, Default: logistic
estimator:RandomForestClassifier:n_estimators, Type: UniformInteger, Range:
[10, 100], Default: 55
estimator:__choice__, Type: Categorical, Choices: {MLPClassifier,
RandomForestClassifier}, Default: MLPClassifier
Conditions:
estimator:MLPClassifier:activation | estimator:__choice__ == 'MLPClassifier'
estimator:RandomForestClassifier:n_estimators | estimator:__choice__ ==
'RandomForestClassifier'
Configuration(values={
'estimator:MLPClassifier:activation': 'relu',
'estimator:__choice__': 'MLPClassifier',
})
╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
│ config {'__choice__': 'MLPClassifier'} │
│ ╭─ Component(MLPClassifier) ──────╮ ╭─ Component(RandomForestClassifier)─╮ │
│ │ item class MLPClassifier(...) │ │ item class │ │
│ │ config {'activation': 'relu'} │ │ RandomForestClassifier(...) │ │
│ │ space { │ │ space {'n_estimators': (10, 100)} │ │
│ │ 'activation': [ │ ╰────────────────────────────────────╯ │
│ │ 'logistic', │ │
│ │ 'relu', │ │
│ │ 'tanh' │ │
│ │ ] │ │
│ │ } │ │
│ ╰─────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Component(MLPClassifier) ──────────────────────────╮
│ item class MLPClassifier(...) │
│ config {'activation': 'relu'} │
│ space {'activation': ['logistic', 'relu', 'tanh']} │
╰─────────────────────────────────────────────────────╯
MLPClassifier()
You may also just add nodes to a Choice
using an infix operator |
if you prefer:
from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})
estimator_choice = (
Choice(name="estimator") | mlp | rf
)
╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
│ ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮ │
│ │ item class MLPClassifier(...) │ │ item class │ │
│ │ space { │ │ RandomForestClassifier(...) │ │
│ │ 'activation': [ │ │ space {'n_estimators': (10, 100)} │ │
│ │ 'logistic', │ ╰────────────────────────────────────╯ │
│ │ 'relu', │ │
│ │ 'tanh' │ │
│ │ ] │ │
│ │ } │ │
│ ╰────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Whenever some other node sees a set, i.e. {comp1, comp2, comp3}
, this
will automatically be converted into a Choice
.
from amltk.pipeline import Choice, Component, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.impute import SimpleImputer
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})
pipeline = Sequential(
SimpleImputer(fill_value=0),
{mlp, rf},
name="my_pipeline",
)
╭─ Sequential(my_pipeline) ────────────────────────────────────────────────────╮
│ ╭─ Fixed(SimpleImputer) ───────────╮ │
│ │ item SimpleImputer(fill_value=0) │ │
│ ╰──────────────────────────────────╯ │
│ ↓ │
│ ╭─ Choice(Choice-7CmvbHZT) ────────────────────────────────────────────────╮ │
│ │ ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifie─╮ │ │
│ │ │ item class MLPClassifier(...) │ │ item class │ │ │
│ │ │ space { │ │ RandomForestClassifier(..… │ │ │
│ │ │ 'activation': [ │ │ space { │ │ │
│ │ │ 'logistic', │ │ 'n_estimators': ( │ │ │
│ │ │ 'relu', │ │ 10, │ │ │
│ │ │ 'tanh' │ │ 100 │ │ │
│ │ │ ] │ │ ) │ │ │
│ │ │ } │ │ } │ │ │
│ │ ╰────────────────────────────────╯ ╰──────────────────────────────────╯ │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Like all Node
s, a Choice
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
Order of nodes
The given nodes of a choice are always ordered according
to their name, so indexing choice.nodes
may not be reliable
if modifying the choice dynamically.
Please use choice["name"]
to access the nodes instead.
See Also
Source code in src/amltk/pipeline/components.py
Split#
A Split
of data in a pipeline.
This indicates the different children in
.nodes
should
act in parallel but on different subsets of data.
from amltk.pipeline import Component, Split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_selector
categorical_pipeline = [
SimpleImputer(strategy="constant", fill_value="missing"),
OneHotEncoder(drop="first"),
]
numerical_pipeline = Component(SimpleImputer, space={"strategy": ["mean", "median"]})
preprocessor = Split(
{
"categories": categorical_pipeline,
"numerical": numerical_pipeline,
},
config={
# This is how you would configure the split for the sklearn builder in particular
"categories": make_column_selector(dtype_include="category"),
"numerical": make_column_selector(dtype_exclude="category"),
},
name="my_split"
)
space = preprocessor.search_space("configspace")
configuration = space.sample_configuration()
configured_preprocessor = preprocessor.configure(configuration)
built_preprocessor = configured_preprocessor.build("sklearn")
╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
│ config { │
│ 'categories': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7fdbd19051e0>, │
│ 'numerical': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7fdbd1906320> │
│ } │
│ ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ │
│ │ ╭─ Fixed(SimpleImputer) ────────╮ │ │ ╭─ Component(SimpleImputer) ─────╮ │ │
│ │ │ item SimpleImputer(fill_valu… │ │ │ │ item class SimpleImputer(...) │ │ │
│ │ │ strategy='constant') │ │ │ │ space { │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ 'strategy': [ │ │ │
│ │ ↓ │ │ │ 'mean', │ │ │
│ │ ╭─ Fixed(OneHotEncoder) ────────╮ │ │ │ 'median' │ │ │
│ │ │ item OneHotEncoder(drop='fir… │ │ │ │ ] │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ } │ │ │
│ ╰───────────────────────────────────╯ │ ╰────────────────────────────────╯ │ │
│ ╰────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
my_split:numerical:SimpleImputer:strategy, Type: Categorical, Choices:
{mean, median}, Default: mean
Configuration(values={
'my_split:numerical:SimpleImputer:strategy': 'median',
})
╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
│ config { │
│ 'categories': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7fdbd19051e0>, │
│ 'numerical': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7fdbd1906320> │
│ } │
│ ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ │
│ │ ╭─ Fixed(SimpleImputer) ────────╮ │ │ ╭─ Component(SimpleImputer) ─────╮ │ │
│ │ │ item SimpleImputer(fill_valu… │ │ │ │ item class │ │ │
│ │ │ strategy='constant') │ │ │ │ SimpleImputer(...) │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ config {'strategy': 'median'} │ │ │
│ │ ↓ │ │ │ space { │ │ │
│ │ ╭─ Fixed(OneHotEncoder) ────────╮ │ │ │ 'strategy': [ │ │ │
│ │ │ item OneHotEncoder(drop='fir… │ │ │ │ 'mean', │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ 'median' │ │ │
│ ╰───────────────────────────────────╯ │ │ ] │ │ │
│ │ │ } │ │ │
│ │ ╰────────────────────────────────╯ │ │
│ ╰────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Pipeline(steps=[('my_split', ColumnTransformer(transformers=[('categories', Pipeline(steps=[('SimpleImputer', SimpleImputer(fill_value='missing', strategy='constant')), ('OneHotEncoder', OneHotEncoder(drop='first'))]), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd19051e0>), ('SimpleImputer', SimpleImputer(strategy='median'), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1906320>)]))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('my_split', ColumnTransformer(transformers=[('categories', Pipeline(steps=[('SimpleImputer', SimpleImputer(fill_value='missing', strategy='constant')), ('OneHotEncoder', OneHotEncoder(drop='first'))]), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd19051e0>), ('SimpleImputer', SimpleImputer(strategy='median'), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1906320>)]))])
ColumnTransformer(transformers=[('categories', Pipeline(steps=[('SimpleImputer', SimpleImputer(fill_value='missing', strategy='constant')), ('OneHotEncoder', OneHotEncoder(drop='first'))]), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd19051e0>), ('SimpleImputer', SimpleImputer(strategy='median'), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1906320>)])
<sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd19051e0>
SimpleImputer(fill_value='missing', strategy='constant')
OneHotEncoder(drop='first')
<sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1906320>
SimpleImputer(strategy='median')
The split is a slight oddity when compared to the other kinds of components in that
it allows a dict
as it's first argument, where the keys are the names of the
different paths through which data will go and the values are the actual nodes that
will receive the data.
If nodes are passed in as they are for all other components, usually the name of the
first node will be important for any builder trying to make sense of how
to use the Split
Like all Node
s, a Split
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Join#
Join
together different parts of the pipeline.
This indicates the different children in
.nodes
should act in tandem with one
another, for example, concatenating the outputs of the various members of the
Join
.
from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})
join = Join(pca, kbest, name="my_feature_union")
space = join.search_space("configspace")
pipeline = join.build("sklearn")
╭─ Join(my_feature_union) ────────────────────────────────────────────╮
│ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │
│ │ item class PCA(...) │ │ item class SelectKBest(...) │ │
│ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │
│ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
my_feature_union:PCA:n_components, Type: UniformInteger, Range: [1, 3],
Default: 2
my_feature_union:SelectKBest:k, Type: UniformInteger, Range: [1, 3],
Default: 2
Pipeline(steps=[('my_feature_union',
FeatureUnion(transformer_list=[('PCA', PCA()),
('SelectKBest',
SelectKBest())]))])
You may also just join together nodes using an infix operator &
if you prefer:
from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})
# Can not parametrize or name the join
join = pca & kbest
# With a parametrized join
join = (
Join(name="my_feature_union") & pca & kbest
)
item = join.build("sklearn")
╭─ Join(Join-I2f1QH4p) ───────────────────────────────────────────────╮
│ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │
│ │ item class PCA(...) │ │ item class SelectKBest(...) │ │
│ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │
│ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────╯
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())]))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())]))])
FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())])
PCA()
SelectKBest()
Whenever some other node sees a tuple, i.e. (comp1, comp2, comp3)
, this
will automatically be converted into a Join
.
from amltk.pipeline import Sequential, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.ensemble import RandomForestClassifier
pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})
# Can not parametrize or name the join
join = Sequential(
(pca, kbest),
RandomForestClassifier(n_estimators=5),
name="my_feature_union",
)
╭─ Sequential(my_feature_union) ──────────────────────────────────────────╮ │ ╭─ Join(Join-sAhetug0) ───────────────────────────────────────────────╮ │ │ │ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │ │ │ │ │ item class PCA(...) │ │ item class SelectKBest(...) │ │ │ │ │ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │ │ │ │ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │ │ │ ╰─────────────────────────────────────────────────────────────────────╯ │ │ ↓ │ │ ╭─ Fixed(RandomForestClassifier) ─────────────╮ │ │ │ item RandomForestClassifier(n_estimators=5) │ │ │ ╰─────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────╯
Like all Node
s, a Join
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Fixed#
A Fixed
part of the pipeline that
represents something that can not be configured and used directly as is.
It consists of an .item
that is fixed,
non-configurable and non-searchable. It also has no children.
This is useful for representing parts of the pipeline that are fixed, for example
if you have a pipeline that is a Sequential
of nodes, but you want to
fix the first component to be a PCA
with n_components=3
, you can use a Fixed
to represent that.
from amltk.pipeline import Component, Fixed, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
pca = Fixed(PCA(n_components=3))
pipeline = Sequential(pca, rf, name="my_pipeline")
╭─ Sequential(my_pipeline) ───────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ─────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰─────────────────────────────────────────╯ │
╰─────────────────────────────────────────────╯
Whenever some other node sees an instance of something, i.e. something that can't be
called, this will automatically be converted into a Fixed
.
from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
pipeline = Sequential(
PCA(n_components=3),
RandomForestClassifier(n_estimators=50),
name="my_pipeline",
)
╭─ Sequential(my_pipeline) ────────────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Fixed(RandomForestClassifier) ──────────────╮ │
│ │ item RandomForestClassifier(n_estimators=50) │ │
│ ╰──────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────╯
The default .name
of a component is the class name of the item that it will
use. You can explicitly set the name=
if you want to when constructing the
component.
A Fixed
accepts only an explicit name=
,
item=
,
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Searchable#
A Searchable
node of the pipeline which just represents a search space, no item attached.
While not usually applicable to pipelines you want to build, this component
is useful for creating a search space, especially if the the real pipeline you
want to optimize can not be built directly. For example, if you are optimize
a script, you may wish to use a Searchable
to represent the search space
of that script.
from amltk.pipeline import Searchable
script_space = Searchable({"mode": ["orange", "blue", "red"], "n": (10, 100)})
╭─ Searchable(Searchable-0jLpL6X0) ─────────────────────────╮
│ space {'mode': ['orange', 'blue', 'red'], 'n': (10, 100)} │
╰───────────────────────────────────────────────────────────╯
A Searchable
explicitly does not allow for item=
to be set, nor can it have
any children. A Searchable
accepts an explicit
name=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
class Join(*nodes, name=None, item=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#
Join
together different parts of the pipeline.
This indicates the different children in
.nodes
should act in tandem with one
another, for example, concatenating the outputs of the various members of the
Join
.
from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})
join = Join(pca, kbest, name="my_feature_union")
space = join.search_space("configspace")
pipeline = join.build("sklearn")
╭─ Join(my_feature_union) ────────────────────────────────────────────╮
│ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │
│ │ item class PCA(...) │ │ item class SelectKBest(...) │ │
│ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │
│ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
my_feature_union:PCA:n_components, Type: UniformInteger, Range: [1, 3],
Default: 2
my_feature_union:SelectKBest:k, Type: UniformInteger, Range: [1, 3],
Default: 2
Pipeline(steps=[('my_feature_union',
FeatureUnion(transformer_list=[('PCA', PCA()),
('SelectKBest',
SelectKBest())]))])
You may also just join together nodes using an infix operator &
if you prefer:
from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})
# Can not parametrize or name the join
join = pca & kbest
# With a parametrized join
join = (
Join(name="my_feature_union") & pca & kbest
)
item = join.build("sklearn")
╭─ Join(Join-7ayXUEwB) ───────────────────────────────────────────────╮
│ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │
│ │ item class PCA(...) │ │ item class SelectKBest(...) │ │
│ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │
│ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────╯
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())]))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())]))])
FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())])
PCA()
SelectKBest()
Whenever some other node sees a tuple, i.e. (comp1, comp2, comp3)
, this
will automatically be converted into a Join
.
from amltk.pipeline import Sequential, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.ensemble import RandomForestClassifier
pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})
# Can not parametrize or name the join
join = Sequential(
(pca, kbest),
RandomForestClassifier(n_estimators=5),
name="my_feature_union",
)
╭─ Sequential(my_feature_union) ──────────────────────────────────────────╮ │ ╭─ Join(Join-gfZDjsg6) ───────────────────────────────────────────────╮ │ │ │ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │ │ │ │ │ item class PCA(...) │ │ item class SelectKBest(...) │ │ │ │ │ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │ │ │ │ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │ │ │ ╰─────────────────────────────────────────────────────────────────────╯ │ │ ↓ │ │ ╭─ Fixed(RandomForestClassifier) ─────────────╮ │ │ │ item RandomForestClassifier(n_estimators=5) │ │ │ ╰─────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────╯
Like all Node
s, a Join
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
nodes: tuple[Node, ...]
attr
#
The nodes that this node leads to.
class Choice(*nodes, name=None, item=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#
A Choice
between different subcomponents.
This indicates that a choice should be made between the different children in
.nodes
, usually done when you
configure()
with some config
from
a search_space()
.
from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})
estimator_choice = Choice(rf, mlp, name="estimator")
space = estimator_choice.search_space("configspace")
config = space.sample_configuration()
configured_choice = estimator_choice.configure(config)
chosen_estimator = configured_choice.chosen()
estimator = chosen_estimator.build_item()
╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
│ ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮ │
│ │ item class MLPClassifier(...) │ │ item class │ │
│ │ space { │ │ RandomForestClassifier(...) │ │
│ │ 'activation': [ │ │ space {'n_estimators': (10, 100)} │ │
│ │ 'logistic', │ ╰────────────────────────────────────╯ │
│ │ 'relu', │ │
│ │ 'tanh' │ │
│ │ ] │ │
│ │ } │ │
│ ╰────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
estimator:MLPClassifier:activation, Type: Categorical, Choices: {logistic,
relu, tanh}, Default: logistic
estimator:RandomForestClassifier:n_estimators, Type: UniformInteger, Range:
[10, 100], Default: 55
estimator:__choice__, Type: Categorical, Choices: {MLPClassifier,
RandomForestClassifier}, Default: MLPClassifier
Conditions:
estimator:MLPClassifier:activation | estimator:__choice__ == 'MLPClassifier'
estimator:RandomForestClassifier:n_estimators | estimator:__choice__ ==
'RandomForestClassifier'
Configuration(values={
'estimator:MLPClassifier:activation': 'logistic',
'estimator:__choice__': 'MLPClassifier',
})
╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
│ config {'__choice__': 'MLPClassifier'} │
│ ╭─ Component(MLPClassifier) ────────╮ ╭─ Component(RandomForestClassifier)─╮ │
│ │ item class MLPClassifier(...) │ │ item class │ │
│ │ config {'activation': 'logistic'} │ │ RandomForestClassifier(...) │ │
│ │ space { │ │ space {'n_estimators': (10, 100)} │ │
│ │ 'activation': [ │ ╰────────────────────────────────────╯ │
│ │ 'logistic', │ │
│ │ 'relu', │ │
│ │ 'tanh' │ │
│ │ ] │ │
│ │ } │ │
│ ╰───────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Component(MLPClassifier) ──────────────────────────╮
│ item class MLPClassifier(...) │
│ config {'activation': 'logistic'} │
│ space {'activation': ['logistic', 'relu', 'tanh']} │
╰─────────────────────────────────────────────────────╯
MLPClassifier(activation='logistic')
You may also just add nodes to a Choice
using an infix operator |
if you prefer:
from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})
estimator_choice = (
Choice(name="estimator") | mlp | rf
)
╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
│ ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮ │
│ │ item class MLPClassifier(...) │ │ item class │ │
│ │ space { │ │ RandomForestClassifier(...) │ │
│ │ 'activation': [ │ │ space {'n_estimators': (10, 100)} │ │
│ │ 'logistic', │ ╰────────────────────────────────────╯ │
│ │ 'relu', │ │
│ │ 'tanh' │ │
│ │ ] │ │
│ │ } │ │
│ ╰────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Whenever some other node sees a set, i.e. {comp1, comp2, comp3}
, this
will automatically be converted into a Choice
.
from amltk.pipeline import Choice, Component, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.impute import SimpleImputer
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})
pipeline = Sequential(
SimpleImputer(fill_value=0),
{mlp, rf},
name="my_pipeline",
)
╭─ Sequential(my_pipeline) ────────────────────────────────────────────────────╮
│ ╭─ Fixed(SimpleImputer) ───────────╮ │
│ │ item SimpleImputer(fill_value=0) │ │
│ ╰──────────────────────────────────╯ │
│ ↓ │
│ ╭─ Choice(Choice-Rsb4oblH) ────────────────────────────────────────────────╮ │
│ │ ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifie─╮ │ │
│ │ │ item class MLPClassifier(...) │ │ item class │ │ │
│ │ │ space { │ │ RandomForestClassifier(..… │ │ │
│ │ │ 'activation': [ │ │ space { │ │ │
│ │ │ 'logistic', │ │ 'n_estimators': ( │ │ │
│ │ │ 'relu', │ │ 10, │ │ │
│ │ │ 'tanh' │ │ 100 │ │ │
│ │ │ ] │ │ ) │ │ │
│ │ │ } │ │ } │ │ │
│ │ ╰────────────────────────────────╯ ╰──────────────────────────────────╯ │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Like all Node
s, a Choice
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
Order of nodes
The given nodes of a choice are always ordered according
to their name, so indexing choice.nodes
may not be reliable
if modifying the choice dynamically.
Please use choice["name"]
to access the nodes instead.
See Also
Source code in src/amltk/pipeline/components.py
nodes: tuple[Node, ...]
attr
#
The nodes that this node leads to.
def chosen()
#
The chosen branch.
RETURNS | DESCRIPTION |
---|---|
Node
|
The chosen branch |
Source code in src/amltk/pipeline/components.py
class Sequential(*nodes, name=None, item=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#
A Sequential
set of operations in a pipeline.
This indicates the different children in
.nodes
should act one after
another, feeding the output of one into the next.
from amltk.pipeline import Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
pipeline = Sequential(
PCA(n_components=3),
Component(RandomForestClassifier, space={"n_estimators": (10, 100)}),
name="my_pipeline"
)
space = pipeline.search_space("configspace")
configuration = space.sample_configuration()
configured_pipeline = pipeline.configure(configuration)
sklearn_pipeline = pipeline.build("sklearn")
╭─ Sequential(my_pipeline) ───────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ─────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰─────────────────────────────────────────╯ │
╰─────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
my_pipeline:RandomForestClassifier:n_estimators, Type: UniformInteger,
Range: [10, 100], Default: 55
Configuration(values={
'my_pipeline:RandomForestClassifier:n_estimators': 28,
})
╭─ Sequential(my_pipeline) ────────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ──────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ config {'n_estimators': 28} │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰──────────────────────────────────────────╯ │
╰──────────────────────────────────────────────╯
Pipeline(steps=[('PCA', PCA(n_components=3)), ('RandomForestClassifier', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('PCA', PCA(n_components=3)), ('RandomForestClassifier', RandomForestClassifier())])
PCA(n_components=3)
RandomForestClassifier()
You may also just chain together nodes using an infix operator >>
if you prefer:
from amltk.pipeline import Join, Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
pipeline = (
Sequential(name="my_pipeline")
>> PCA(n_components=3)
>> Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
)
╭─ Sequential(my_pipeline) ───────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ─────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰─────────────────────────────────────────╯ │
╰─────────────────────────────────────────────╯
Whenever some other node sees a list, i.e. [comp1, comp2, comp3]
, this
will automatically be converted into a Sequential
.
from amltk.pipeline import Choice
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
pipeline_choice = Choice(
[SimpleImputer(), RandomForestClassifier()],
[StandardScaler(), MLPClassifier()],
name="pipeline_choice"
)
╭─ Choice(pipeline_choice) ───────────────────────────────────────────────╮
│ ╭─ Sequential(Seq-2X0w2zYN) ──────────╮ ╭─ Sequential(Seq-Ma80bjp8) ──╮ │
│ │ ╭─ Fixed(SimpleImputer) ─╮ │ │ ╭─ Fixed(StandardScaler) ─╮ │ │
│ │ │ item SimpleImputer() │ │ │ │ item StandardScaler() │ │ │
│ │ ╰────────────────────────╯ │ │ ╰─────────────────────────╯ │ │
│ │ ↓ │ │ ↓ │ │
│ │ ╭─ Fixed(RandomForestClassifier) ─╮ │ │ ╭─ Fixed(MLPClassifier) ─╮ │ │
│ │ │ item RandomForestClassifier() │ │ │ │ item MLPClassifier() │ │ │
│ │ ╰─────────────────────────────────╯ │ │ ╰────────────────────────╯ │ │
│ ╰─────────────────────────────────────╯ ╰─────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────────╯
Like all Node
s, a Sequential
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
nodes: tuple[Node, ...]
attr
#
The nodes in series.
tail: Node
prop
#
The last step in the pipeline.
def __len__()
#
def walk(path=None)
#
Walk the nodes in this chain.
PARAMETER | DESCRIPTION |
---|---|
path |
The current path to this node |
YIELDS | DESCRIPTION |
---|---|
list[Node]
|
The parents of the node and the node itself |
Source code in src/amltk/pipeline/components.py
class Split(*nodes, name=None, item=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#
A Split
of data in a pipeline.
This indicates the different children in
.nodes
should
act in parallel but on different subsets of data.
from amltk.pipeline import Component, Split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_selector
categorical_pipeline = [
SimpleImputer(strategy="constant", fill_value="missing"),
OneHotEncoder(drop="first"),
]
numerical_pipeline = Component(SimpleImputer, space={"strategy": ["mean", "median"]})
preprocessor = Split(
{
"categories": categorical_pipeline,
"numerical": numerical_pipeline,
},
config={
# This is how you would configure the split for the sklearn builder in particular
"categories": make_column_selector(dtype_include="category"),
"numerical": make_column_selector(dtype_exclude="category"),
},
name="my_split"
)
space = preprocessor.search_space("configspace")
configuration = space.sample_configuration()
configured_preprocessor = preprocessor.configure(configuration)
built_preprocessor = configured_preprocessor.build("sklearn")
╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
│ config { │
│ 'categories': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7fdbd1944af0>, │
│ 'numerical': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7fdbd1946c20> │
│ } │
│ ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ │
│ │ ╭─ Fixed(SimpleImputer) ────────╮ │ │ ╭─ Component(SimpleImputer) ─────╮ │ │
│ │ │ item SimpleImputer(fill_valu… │ │ │ │ item class SimpleImputer(...) │ │ │
│ │ │ strategy='constant') │ │ │ │ space { │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ 'strategy': [ │ │ │
│ │ ↓ │ │ │ 'mean', │ │ │
│ │ ╭─ Fixed(OneHotEncoder) ────────╮ │ │ │ 'median' │ │ │
│ │ │ item OneHotEncoder(drop='fir… │ │ │ │ ] │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ } │ │ │
│ ╰───────────────────────────────────╯ │ ╰────────────────────────────────╯ │ │
│ ╰────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
my_split:numerical:SimpleImputer:strategy, Type: Categorical, Choices:
{mean, median}, Default: mean
Configuration(values={
'my_split:numerical:SimpleImputer:strategy': 'mean',
})
╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
│ config { │
│ 'categories': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7fdbd1944af0>, │
│ 'numerical': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7fdbd1946c20> │
│ } │
│ ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ │
│ │ ╭─ Fixed(SimpleImputer) ────────╮ │ │ ╭─ Component(SimpleImputer) ─────╮ │ │
│ │ │ item SimpleImputer(fill_valu… │ │ │ │ item class │ │ │
│ │ │ strategy='constant') │ │ │ │ SimpleImputer(...) │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ config {'strategy': 'mean'} │ │ │
│ │ ↓ │ │ │ space { │ │ │
│ │ ╭─ Fixed(OneHotEncoder) ────────╮ │ │ │ 'strategy': [ │ │ │
│ │ │ item OneHotEncoder(drop='fir… │ │ │ │ 'mean', │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ 'median' │ │ │
│ ╰───────────────────────────────────╯ │ │ ] │ │ │
│ │ │ } │ │ │
│ │ ╰────────────────────────────────╯ │ │
│ ╰────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Pipeline(steps=[('my_split', ColumnTransformer(transformers=[('categories', Pipeline(steps=[('SimpleImputer', SimpleImputer(fill_value='missing', strategy='constant')), ('OneHotEncoder', OneHotEncoder(drop='first'))]), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1944af0>), ('SimpleImputer', SimpleImputer(), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1946c20>)]))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('my_split', ColumnTransformer(transformers=[('categories', Pipeline(steps=[('SimpleImputer', SimpleImputer(fill_value='missing', strategy='constant')), ('OneHotEncoder', OneHotEncoder(drop='first'))]), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1944af0>), ('SimpleImputer', SimpleImputer(), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1946c20>)]))])
ColumnTransformer(transformers=[('categories', Pipeline(steps=[('SimpleImputer', SimpleImputer(fill_value='missing', strategy='constant')), ('OneHotEncoder', OneHotEncoder(drop='first'))]), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1944af0>), ('SimpleImputer', SimpleImputer(), <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1946c20>)])
<sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1944af0>
SimpleImputer(fill_value='missing', strategy='constant')
OneHotEncoder(drop='first')
<sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1946c20>
SimpleImputer()
The split is a slight oddity when compared to the other kinds of components in that
it allows a dict
as it's first argument, where the keys are the names of the
different paths through which data will go and the values are the actual nodes that
will receive the data.
If nodes are passed in as they are for all other components, usually the name of the
first node will be important for any builder trying to make sense of how
to use the Split
Like all Node
s, a Split
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
nodes: tuple[Node, ...]
attr
#
The nodes that this node leads to.
class Component(item, *, name=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#
A Component
of the pipeline with
a possible item and no children.
This is the basic building block of most pipelines, it accepts
as it's item=
some function that will be
called with build_item()
to
build that one part of the pipeline.
When build_item()
is called,
The .config
on this node will be passed
to the function to build the item.
A common pattern is to use a Component
to
wrap a constructor, specifying the space=
and config=
to be used when building the
item.
from amltk.pipeline import Component
from sklearn.ensemble import RandomForestClassifier
rf = Component(
RandomForestClassifier,
config={"max_depth": 3},
space={"n_estimators": (10, 100)}
)
config = {"n_estimators": 50} # Sample from some space or something
configured_rf = rf.configure(config)
estimator = configured_rf.build_item()
╭─ Component(RandomForestClassifier) ──────╮
│ item class RandomForestClassifier(...) │
│ config {'max_depth': 3} │
│ space {'n_estimators': (10, 100)} │
╰──────────────────────────────────────────╯
RandomForestClassifier(max_depth=3, n_estimators=50)
Whenever some other node sees a function/constructor, i.e. RandomForestClassifier
,
this will automatically be converted into a Component
.
from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier
pipeline = Sequential(RandomForestClassifier, name="my_pipeline")
╭─ Sequential(my_pipeline) ──────────────────╮
│ ╭─ Component(RandomForestClassifier) ────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ ╰────────────────────────────────────────╯ │
╰────────────────────────────────────────────╯
The default .name
of a component is the name of the class/function that it will
use. You can explicitly set the name=
if you want to when constructing the
component.
Like all Node
s, a Component
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
item: Callable[..., Item]
attr
#
A node which constructs an item in the pipeline.
nodes: tuple[]
attr
#
A component has no children.
class Searchable(space=None, *, name=None, config=None, fidelities=None, config_transform=None, meta=None)
dataclass
#
A Searchable
node of the pipeline which just represents a search space, no item attached.
While not usually applicable to pipelines you want to build, this component
is useful for creating a search space, especially if the the real pipeline you
want to optimize can not be built directly. For example, if you are optimize
a script, you may wish to use a Searchable
to represent the search space
of that script.
from amltk.pipeline import Searchable
script_space = Searchable({"mode": ["orange", "blue", "red"], "n": (10, 100)})
╭─ Searchable(Searchable-CCaI9sO3) ─────────────────────────╮
│ space {'mode': ['orange', 'blue', 'red'], 'n': (10, 100)} │
╰───────────────────────────────────────────────────────────╯
A Searchable
explicitly does not allow for item=
to be set, nor can it have
any children. A Searchable
accepts an explicit
name=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
class Fixed(item, *, name=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#
A Fixed
part of the pipeline that
represents something that can not be configured and used directly as is.
It consists of an .item
that is fixed,
non-configurable and non-searchable. It also has no children.
This is useful for representing parts of the pipeline that are fixed, for example
if you have a pipeline that is a Sequential
of nodes, but you want to
fix the first component to be a PCA
with n_components=3
, you can use a Fixed
to represent that.
from amltk.pipeline import Component, Fixed, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
pca = Fixed(PCA(n_components=3))
pipeline = Sequential(pca, rf, name="my_pipeline")
╭─ Sequential(my_pipeline) ───────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ─────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰─────────────────────────────────────────╯ │
╰─────────────────────────────────────────────╯
Whenever some other node sees an instance of something, i.e. something that can't be
called, this will automatically be converted into a Fixed
.
from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
pipeline = Sequential(
PCA(n_components=3),
RandomForestClassifier(n_estimators=50),
name="my_pipeline",
)
╭─ Sequential(my_pipeline) ────────────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Fixed(RandomForestClassifier) ──────────────╮ │
│ │ item RandomForestClassifier(n_estimators=50) │ │
│ ╰──────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────╯
The default .name
of a component is the class name of the item that it will
use. You can explicitly set the name=
if you want to when constructing the
component.
A Fixed
accepts only an explicit name=
,
item=
,
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
item: Item
classvar
attr
#
The fixed item that this node represents.
space: None
classvar
attr
#
A frozen node has no search space.
fidelities: None
classvar
attr
#
A frozen node has no search space.
config: None
classvar
attr
#
A frozen node has no config.
config_transform: None
classvar
attr
#
A frozen node has no config so no transform.
nodes: tuple[]
classvar
attr
#
A component has no children.
def as_node(thing, name=None)
#
Convert a node, pipeline, set or tuple into a component, copying anything in the process and removing all linking to other nodes.
PARAMETER | DESCRIPTION |
---|---|
thing |
The thing to convert |
name |
The name of the node. If it already a node, it will be renamed to that one.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Node | Choice | Join | Sequential | Fixed[Item]
|
The component |