Pipeline
Pieces of a Pipeline#
A pipeline is a collection of Node
s
that are connected together to form a directed acylic graph, where the nodes
follow a parent-child relation ship. The purpose of these is to form some abstract
representation of what you want to search over/optimize and then build into a concrete object.
These Node
s allow you to specific the function/object that
will be used there, it's search space and any configuration you want to explicitly apply.
There are various components listed below which gives these nodes extract syntatic meaning,
e.g. a Choice
which represents some choice between it's children while
a Sequential
indicates that each child follows one after the other.
Once a pipeline is created, you can perform 3 very critical operations on it:
search_space(parser=...)
- This will return the search space of the pipeline, as defined by it's nodes. You can find the reference to the available parsers and search spaces here.configure(config=...)
- This will return a new pipeline where each node is configured correctly.build(builder=...)
- This will return some concrete object from a configured pipeline. You can find the reference to the available builders here.
Components#
You can use the various different node types to build a pipeline.
You can connect these nodes together using either the constructors explicitly, as shown in the examples. We also provide some index operators:
>>
- Connect nodes together to form aSequential
&
- Connect nodes together to form aJoin
|
- Connect nodes together to form aChoice
There is also another short-hand that you may find useful to know:
{comp1, comp2, comp3}
- This will automatically be converted into aChoice
between the given components.(comp1, comp2, comp3)
- This will automatically be converted into aJoin
between the given components.[comp1, comp2, comp3]
- This will automatically be converted into aSequential
between the given components.
For each of these components we will show examples using
the "sklearn"
builder
The components are:
Component#
A Component
of the pipeline with
a possible item and no children.
This is the basic building block of most pipelines, it accepts
as it's item=
some function that will be
called with build_item()
to
build that one part of the pipeline.
When build_item()
is called,
The .config
on this node will be passed
to the function to build the item.
A common pattern is to use a Component
to
wrap a constructor, specifying the space=
and config=
to be used when building the
item.
from amltk.pipeline import Component
from sklearn.ensemble import RandomForestClassifier
rf = Component(
RandomForestClassifier,
config={"max_depth": 3},
space={"n_estimators": (10, 100)}
)
config = {"n_estimators": 50} # Sample from some space or something
configured_rf = rf.configure(config)
estimator = configured_rf.build_item()
╭─ Component(RandomForestClassifier) ──────╮
│ item class RandomForestClassifier(...) │
│ config {'max_depth': 3} │
│ space {'n_estimators': (10, 100)} │
╰──────────────────────────────────────────╯
RandomForestClassifier(max_depth=3, n_estimators=50)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(max_depth=3, n_estimators=50)
Whenever some other node sees a function/constructor, i.e. RandomForestClassifier
,
this will automatically be converted into a Component
.
from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier
pipeline = Sequential(RandomForestClassifier, name="my_pipeline")
╭─ Sequential(my_pipeline) ──────────────────╮
│ ╭─ Component(RandomForestClassifier) ────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ ╰────────────────────────────────────────╯ │
╰────────────────────────────────────────────╯
The default .name
of a component is the name of the class/function that it will
use. You can explicitly set the name=
if you want to when constructing the
component.
Like all Node
s, a Component
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Sequential#
A Sequential
set of operations in a pipeline.
This indicates the different children in
.nodes
should act one after
another, feeding the output of one into the next.
from amltk.pipeline import Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
pipeline = Sequential(
PCA(n_components=3),
Component(RandomForestClassifier, space={"n_estimators": (10, 100)}),
name="my_pipeline"
)
space = pipeline.search_space("configspace")
configuration = space.sample_configuration()
configured_pipeline = pipeline.configure(configuration)
sklearn_pipeline = pipeline.build("sklearn")
╭─ Sequential(my_pipeline) ───────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ─────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰─────────────────────────────────────────╯ │
╰─────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
my_pipeline:RandomForestClassifier:n_estimators, Type: UniformInteger,
Range: [10, 100], Default: 55
Configuration(values={
'my_pipeline:RandomForestClassifier:n_estimators': 20,
})
╭─ Sequential(my_pipeline) ────────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ──────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ config {'n_estimators': 20} │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰──────────────────────────────────────────╯ │
╰──────────────────────────────────────────────╯
Pipeline(steps=[('PCA', PCA(n_components=3)), ('RandomForestClassifier', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('PCA', PCA(n_components=3)), ('RandomForestClassifier', RandomForestClassifier())])
PCA(n_components=3)
RandomForestClassifier()
You may also just chain together nodes using an infix operator >>
if you prefer:
from amltk.pipeline import Join, Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
pipeline = (
Sequential(name="my_pipeline")
>> PCA(n_components=3)
>> Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
)
╭─ Sequential(my_pipeline) ───────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ─────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰─────────────────────────────────────────╯ │
╰─────────────────────────────────────────────╯
Whenever some other node sees a list, i.e. [comp1, comp2, comp3]
, this
will automatically be converted into a Sequential
.
from amltk.pipeline import Choice
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
pipeline_choice = Choice(
[SimpleImputer(), RandomForestClassifier()],
[StandardScaler(), MLPClassifier()],
name="pipeline_choice"
)
╭─ Choice(pipeline_choice) ───────────────────────────────────────────────╮
│ ╭─ Sequential(Seq-Hwvgzhxa) ──╮ ╭─ Sequential(Seq-Zto1VuPG) ──────────╮ │
│ │ ╭─ Fixed(StandardScaler) ─╮ │ │ ╭─ Fixed(SimpleImputer) ─╮ │ │
│ │ │ item StandardScaler() │ │ │ │ item SimpleImputer() │ │ │
│ │ ╰─────────────────────────╯ │ │ ╰────────────────────────╯ │ │
│ │ ↓ │ │ ↓ │ │
│ │ ╭─ Fixed(MLPClassifier) ─╮ │ │ ╭─ Fixed(RandomForestClassifier) ─╮ │ │
│ │ │ item MLPClassifier() │ │ │ │ item RandomForestClassifier() │ │ │
│ │ ╰────────────────────────╯ │ │ ╰─────────────────────────────────╯ │ │
│ ╰─────────────────────────────╯ ╰─────────────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────────╯
Like all Node
s, a Sequential
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Choice#
A Choice
between different subcomponents.
This indicates that a choice should be made between the different children in
.nodes
, usually done when you
configure()
with some config
from
a search_space()
.
from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})
estimator_choice = Choice(rf, mlp, name="estimator")
space = estimator_choice.search_space("configspace")
config = space.sample_configuration()
configured_choice = estimator_choice.configure(config)
chosen_estimator = configured_choice.chosen()
estimator = chosen_estimator.build_item()
╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
│ ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮ │
│ │ item class MLPClassifier(...) │ │ item class │ │
│ │ space { │ │ RandomForestClassifier(...) │ │
│ │ 'activation': [ │ │ space {'n_estimators': (10, 100)} │ │
│ │ 'logistic', │ ╰────────────────────────────────────╯ │
│ │ 'relu', │ │
│ │ 'tanh' │ │
│ │ ] │ │
│ │ } │ │
│ ╰────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
estimator:MLPClassifier:activation, Type: Categorical, Choices: {logistic,
relu, tanh}, Default: logistic
estimator:RandomForestClassifier:n_estimators, Type: UniformInteger, Range:
[10, 100], Default: 55
estimator:__choice__, Type: Categorical, Choices: {MLPClassifier,
RandomForestClassifier}, Default: MLPClassifier
Conditions:
estimator:MLPClassifier:activation | estimator:__choice__ == 'MLPClassifier'
estimator:RandomForestClassifier:n_estimators | estimator:__choice__ ==
'RandomForestClassifier'
Configuration(values={
'estimator:MLPClassifier:activation': 'relu',
'estimator:__choice__': 'MLPClassifier',
})
╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
│ config {'__choice__': 'MLPClassifier'} │
│ ╭─ Component(MLPClassifier) ──────╮ ╭─ Component(RandomForestClassifier)─╮ │
│ │ item class MLPClassifier(...) │ │ item class │ │
│ │ config {'activation': 'relu'} │ │ RandomForestClassifier(...) │ │
│ │ space { │ │ space {'n_estimators': (10, 100)} │ │
│ │ 'activation': [ │ ╰────────────────────────────────────╯ │
│ │ 'logistic', │ │
│ │ 'relu', │ │
│ │ 'tanh' │ │
│ │ ] │ │
│ │ } │ │
│ ╰─────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Component(MLPClassifier) ──────────────────────────╮
│ item class MLPClassifier(...) │
│ config {'activation': 'relu'} │
│ space {'activation': ['logistic', 'relu', 'tanh']} │
╰─────────────────────────────────────────────────────╯
MLPClassifier()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MLPClassifier()
You may also just add nodes to a Choice
using an infix operator |
if you prefer:
from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})
estimator_choice = (
Choice(name="estimator") | mlp | rf
)
╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
│ ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮ │
│ │ item class MLPClassifier(...) │ │ item class │ │
│ │ space { │ │ RandomForestClassifier(...) │ │
│ │ 'activation': [ │ │ space {'n_estimators': (10, 100)} │ │
│ │ 'logistic', │ ╰────────────────────────────────────╯ │
│ │ 'relu', │ │
│ │ 'tanh' │ │
│ │ ] │ │
│ │ } │ │
│ ╰────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Whenever some other node sees a set, i.e. {comp1, comp2, comp3}
, this
will automatically be converted into a Choice
.
from amltk.pipeline import Choice, Component, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.impute import SimpleImputer
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})
pipeline = Sequential(
SimpleImputer(fill_value=0),
{mlp, rf},
name="my_pipeline",
)
╭─ Sequential(my_pipeline) ────────────────────────────────────────────────────╮
│ ╭─ Fixed(SimpleImputer) ───────────╮ │
│ │ item SimpleImputer(fill_value=0) │ │
│ ╰──────────────────────────────────╯ │
│ ↓ │
│ ╭─ Choice(Choice-GGzo9UAM) ────────────────────────────────────────────────╮ │
│ │ ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifie─╮ │ │
│ │ │ item class MLPClassifier(...) │ │ item class │ │ │
│ │ │ space { │ │ RandomForestClassifier(..… │ │ │
│ │ │ 'activation': [ │ │ space { │ │ │
│ │ │ 'logistic', │ │ 'n_estimators': ( │ │ │
│ │ │ 'relu', │ │ 10, │ │ │
│ │ │ 'tanh' │ │ 100 │ │ │
│ │ │ ] │ │ ) │ │ │
│ │ │ } │ │ } │ │ │
│ │ ╰────────────────────────────────╯ ╰──────────────────────────────────╯ │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Like all Node
s, a Choice
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
Order of nodes
The given nodes of a choice are always ordered according
to their name, so indexing choice.nodes
may not be reliable
if modifying the choice dynamically.
Please use choice["name"]
to access the nodes instead.
See Also
Source code in src/amltk/pipeline/components.py
Split#
A Split
of data in a pipeline.
This indicates the different children in
.nodes
should
act in parallel but on different subsets of data.
from amltk.pipeline import Component, Split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_selector
categorical_pipeline = [
SimpleImputer(strategy="constant", fill_value="missing"),
OneHotEncoder(drop="first"),
]
numerical_pipeline = Component(SimpleImputer, space={"strategy": ["mean", "median"]})
preprocessor = Split(
{
"categories": categorical_pipeline,
"numerical": numerical_pipeline,
},
config={
# This is how you would configure the split for the sklearn builder in particular
"categories": make_column_selector(dtype_include="category"),
"numerical": make_column_selector(dtype_exclude="category"),
},
name="my_split"
)
space = preprocessor.search_space("configspace")
configuration = space.sample_configuration()
configured_preprocessor = preprocessor.configure(configuration)
built_preprocessor = configured_preprocessor.build("sklearn")
╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
│ config { │
│ 'categories': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7f415b41e470>, │
│ 'numerical': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7f415b41f3a0> │
│ } │
│ ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ │
│ │ ╭─ Fixed(SimpleImputer) ────────╮ │ │ ╭─ Component(SimpleImputer) ─────╮ │ │
│ │ │ item SimpleImputer(fill_valu… │ │ │ │ item class SimpleImputer(...) │ │ │
│ │ │ strategy='constant') │ │ │ │ space { │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ 'strategy': [ │ │ │
│ │ ↓ │ │ │ 'mean', │ │ │
│ │ ╭─ Fixed(OneHotEncoder) ────────╮ │ │ │ 'median' │ │ │
│ │ │ item OneHotEncoder(drop='fir… │ │ │ │ ] │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ } │ │ │
│ ╰───────────────────────────────────╯ │ ╰────────────────────────────────╯ │ │
│ ╰────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
my_split:numerical:SimpleImputer:strategy, Type: Categorical, Choices:
{mean, median}, Default: mean
Configuration(values={
'my_split:numerical:SimpleImputer:strategy': 'mean',
})
╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
│ config { │
│ 'categories': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7f415b41e470>, │
│ 'numerical': │
│ <sklearn.compose._column_transformer.make_column_selector object at │
│ 0x7f415b41f3a0> │
│ } │
│ ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ │
│ │ ╭─ Fixed(SimpleImputer) ────────╮ │ │ ╭─ Component(SimpleImputer) ─────╮ │ │
│ │ │ item SimpleImputer(fill_valu… │ │ │ │ item class │ │ │
│ │ │ strategy='constant') │ │ │ │ SimpleImputer(...) │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ config {'strategy': 'mean'} │ │ │
│ │ ↓ │ │ │ space { │ │ │
│ │ ╭─ Fixed(OneHotEncoder) ────────╮ │ │ │ 'strategy': [ │ │ │
│ │ │ item OneHotEncoder(drop='fir… │ │ │ │ 'mean', │ │ │
│ │ ╰───────────────────────────────╯ │ │ │ 'median' │ │ │
│ ╰───────────────────────────────────╯ │ │ ] │ │ │
│ │ │ } │ │ │
│ │ ╰────────────────────────────────╯ │ │
│ ╰────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
Pipeline(steps=[('my_split', ColumnTransformer(transformers=[('categories', Pipeline(steps=[('SimpleImputer', SimpleImputer(fill_value='missing', strategy='constant')), ('OneHotEncoder', OneHotEncoder(drop='first'))]), <sklearn.compose._column_transformer.make_column_selector object at 0x7f415b41e470>), ('SimpleImputer', SimpleImputer(), <sklearn.compose._column_transformer.make_column_selector object at 0x7f415b41f3a0>)]))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('my_split', ColumnTransformer(transformers=[('categories', Pipeline(steps=[('SimpleImputer', SimpleImputer(fill_value='missing', strategy='constant')), ('OneHotEncoder', OneHotEncoder(drop='first'))]), <sklearn.compose._column_transformer.make_column_selector object at 0x7f415b41e470>), ('SimpleImputer', SimpleImputer(), <sklearn.compose._column_transformer.make_column_selector object at 0x7f415b41f3a0>)]))])
ColumnTransformer(transformers=[('categories', Pipeline(steps=[('SimpleImputer', SimpleImputer(fill_value='missing', strategy='constant')), ('OneHotEncoder', OneHotEncoder(drop='first'))]), <sklearn.compose._column_transformer.make_column_selector object at 0x7f415b41e470>), ('SimpleImputer', SimpleImputer(), <sklearn.compose._column_transformer.make_column_selector object at 0x7f415b41f3a0>)])
<sklearn.compose._column_transformer.make_column_selector object at 0x7f415b41e470>
SimpleImputer(fill_value='missing', strategy='constant')
OneHotEncoder(drop='first')
<sklearn.compose._column_transformer.make_column_selector object at 0x7f415b41f3a0>
SimpleImputer()
The split is a slight oddity when compared to the other kinds of components in that
it allows a dict
as it's first argument, where the keys are the names of the
different paths through which data will go and the values are the actual nodes that
will receive the data.
If nodes are passed in as they are for all other components, usually the name of the
first node will be important for any builder trying to make sense of how
to use the Split
Like all Node
s, a Split
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Join#
Join
together different parts of the pipeline.
This indicates the different children in
.nodes
should act in tandem with one
another, for example, concatenating the outputs of the various members of the
Join
.
from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})
join = Join(pca, kbest, name="my_feature_union")
space = join.search_space("configspace")
pipeline = join.build("sklearn")
╭─ Join(my_feature_union) ────────────────────────────────────────────╮
│ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │
│ │ item class PCA(...) │ │ item class SelectKBest(...) │ │
│ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │
│ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────╯
Configuration space object:
Hyperparameters:
my_feature_union:PCA:n_components, Type: UniformInteger, Range: [1, 3],
Default: 2
my_feature_union:SelectKBest:k, Type: UniformInteger, Range: [1, 3],
Default: 2
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())]))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())]))])
FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())])
PCA()
SelectKBest()
You may also just join together nodes using an infix operator &
if you prefer:
from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})
# Can not parametrize or name the join
join = pca & kbest
# With a parametrized join
join = (
Join(name="my_feature_union") & pca & kbest
)
item = join.build("sklearn")
╭─ Join(Join-CoYLbF9z) ───────────────────────────────────────────────╮
│ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │
│ │ item class PCA(...) │ │ item class SelectKBest(...) │ │
│ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │
│ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────╯
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())]))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('my_feature_union', FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())]))])
FeatureUnion(transformer_list=[('PCA', PCA()), ('SelectKBest', SelectKBest())])
PCA()
SelectKBest()
Whenever some other node sees a tuple, i.e. (comp1, comp2, comp3)
, this
will automatically be converted into a Join
.
from amltk.pipeline import Sequential, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.ensemble import RandomForestClassifier
pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})
# Can not parametrize or name the join
join = Sequential(
(pca, kbest),
RandomForestClassifier(n_estimators=5),
name="my_feature_union",
)
╭─ Sequential(my_feature_union) ──────────────────────────────────────────╮
│ ╭─ Join(Join-krvsYYkt) ───────────────────────────────────────────────╮ │
│ │ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │ │
│ │ │ item class PCA(...) │ │ item class SelectKBest(...) │ │ │
│ │ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │ │
│ │ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │ │
│ ╰─────────────────────────────────────────────────────────────────────╯ │
│ ↓ │
│ ╭─ Fixed(RandomForestClassifier) ─────────────╮ │
│ │ item RandomForestClassifier(n_estimators=5) │ │
│ ╰─────────────────────────────────────────────╯ │
╰─────────────────────────────────────────────────────────────────────────╯
Like all Node
s, a Join
accepts an explicit
name=
,
item=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Fixed#
A Fixed
part of the pipeline that
represents something that can not be configured and used directly as is.
It consists of an .item
that is fixed,
non-configurable and non-searchable. It also has no children.
This is useful for representing parts of the pipeline that are fixed, for example
if you have a pipeline that is a Sequential
of nodes, but you want to
fix the first component to be a PCA
with n_components=3
, you can use a Fixed
to represent that.
from amltk.pipeline import Component, Fixed, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
pca = Fixed(PCA(n_components=3))
pipeline = Sequential(pca, rf, name="my_pipeline")
╭─ Sequential(my_pipeline) ───────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Component(RandomForestClassifier) ─────╮ │
│ │ item class RandomForestClassifier(...) │ │
│ │ space {'n_estimators': (10, 100)} │ │
│ ╰─────────────────────────────────────────╯ │
╰─────────────────────────────────────────────╯
Whenever some other node sees an instance of something, i.e. something that can't be
called, this will automatically be converted into a Fixed
.
from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
pipeline = Sequential(
PCA(n_components=3),
RandomForestClassifier(n_estimators=50),
name="my_pipeline",
)
╭─ Sequential(my_pipeline) ────────────────────────╮
│ ╭─ Fixed(PCA) ─────────────╮ │
│ │ item PCA(n_components=3) │ │
│ ╰──────────────────────────╯ │
│ ↓ │
│ ╭─ Fixed(RandomForestClassifier) ──────────────╮ │
│ │ item RandomForestClassifier(n_estimators=50) │ │
│ ╰──────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────╯
The default .name
of a component is the class name of the item that it will
use. You can explicitly set the name=
if you want to when constructing the
component.
A Fixed
accepts only an explicit name=
,
item=
,
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Searchable#
A Searchable
node of the pipeline which just represents a search space, no item attached.
While not usually applicable to pipelines you want to build, this component
is useful for creating a search space, especially if the real pipeline you
want to optimize can not be built directly. For example, if you are optimize
a script, you may wish to use a Searchable
to represent the search space
of that script.
from amltk.pipeline import Searchable
script_space = Searchable({"mode": ["orange", "blue", "red"], "n": (10, 100)})
╭─ Searchable(Searchable-LujBEqnp) ─────────────────────────╮
│ space {'mode': ['orange', 'blue', 'red'], 'n': (10, 100)} │
╰───────────────────────────────────────────────────────────╯
A Searchable
explicitly does not allow for item=
to be set, nor can it have
any children. A Searchable
accepts an explicit
name=
,
config=
,
space=
,
fidelities=
,
config_transform=
and
meta=
.
See Also
Source code in src/amltk/pipeline/components.py
Node#
A pipeline consists of Node
s, which hold
the various attributes required to build a pipeline, such as the
.item
, its .space
,
its .config
and so on.
The Node
s are connected to each in a parent-child
relation ship where the children are simply the .nodes
that the parent leads to.
To give these attributes and relations meaning, there are various subclasses
of Node
which give different syntactic meanings
when you want to construct something like a
search_space()
or
build()
some concrete object out of the
pipeline.
For example, a Sequential
node
gives the meaning that each of its children in
.nodes
should follow one another while
something like a Choice
gives the meaning that only one of its children should be chosen.
You will likely never have to create a Node
directly, but instead use the various components to create the pipeline.
Hashing
When hashing a node, i.e. to put it in a set
or as a key in a dict
,
only the name of the node and the hash of its children is used.
This means that two nodes with the same connectivity will be equalling hashed,
Equality
When considering equality, this will be done by comparing all the fields
of the node. This include even the parent
and branches
fields. This
means two nodes are considered equal if they look the same and they are
connected in to nodes that also look the same.