Skip to content

Components

You can use the various different node types to build a pipeline.

You can connect these nodes together using either the constructors explicitly, as shown in the examples. We also provide some index operators:

  • >> - Connect nodes together to form a Sequential
  • & - Connect nodes together to form a Join
  • | - Connect nodes together to form a Choice

There is also another short-hand that you may find useful to know:

  • {comp1, comp2, comp3} - This will automatically be converted into a Choice between the given components.
  • (comp1, comp2, comp3) - This will automatically be converted into a Join between the given components.
  • [comp1, comp2, comp3] - This will automatically be converted into a Sequential between the given components.

For each of these components we will show examples using the "sklearn" builder

The components are:

Component#

Bases: Node[Item, Space]

A Component of the pipeline with a possible item and no children.

This is the basic building block of most pipelines, it accepts as it's item= some function that will be called with build_item() to build that one part of the pipeline.

When build_item() is called, The .config on this node will be passed to the function to build the item.

A common pattern is to use a Component to wrap a constructor, specifying the space= and config= to be used when building the item.

from amltk.pipeline import Component
from sklearn.ensemble import RandomForestClassifier

rf = Component(
    RandomForestClassifier,
    config={"max_depth": 3},
    space={"n_estimators": (10, 100)}
)

config = {"n_estimators": 50}  # Sample from some space or something
configured_rf = rf.configure(config)

estimator = configured_rf.build_item()

╭─ Component(RandomForestClassifier) ──────╮
 item   class RandomForestClassifier(...) 
 config {'max_depth': 3}                  
 space  {'n_estimators': (10, 100)}       
╰──────────────────────────────────────────╯

RandomForestClassifier(max_depth=3, n_estimators=50)

Whenever some other node sees a function/constructor, i.e. RandomForestClassifier, this will automatically be converted into a Component.

from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier

pipeline = Sequential(RandomForestClassifier, name="my_pipeline")

╭─ Sequential(my_pipeline) ──────────────────╮
 ╭─ Component(RandomForestClassifier) ────╮ 
  item class RandomForestClassifier(...)  
 ╰────────────────────────────────────────╯ 
╰────────────────────────────────────────────╯

The default .name of a component is the name of the class/function that it will use. You can explicitly set the name= if you want to when constructing the component.

Like all Nodes, a Component accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    item: Callable[..., Item],
    *,
    name: str | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    super().__init__(
        name=name if name is not None else entity_name(item),
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

Sequential#

Bases: Node[Item, Space]

A Sequential set of operations in a pipeline.

This indicates the different children in .nodes should act one after another, feeding the output of one into the next.

from amltk.pipeline import Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier

pipeline = Sequential(
    PCA(n_components=3),
    Component(RandomForestClassifier, space={"n_estimators": (10, 100)}),
    name="my_pipeline"
)

space = pipeline.search_space("configspace")

configuration = space.sample_configuration()

configured_pipeline = pipeline.configure(configuration)

sklearn_pipeline = pipeline.build("sklearn")

╭─ Sequential(my_pipeline) ───────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                
  item PCA(n_components=3)                 
 ╰──────────────────────────╯                
  
 ╭─ Component(RandomForestClassifier) ─────╮ 
  item  class RandomForestClassifier(...)  
  space {'n_estimators': (10, 100)}        
 ╰─────────────────────────────────────────╯ 
╰─────────────────────────────────────────────╯

Configuration space object:
  Hyperparameters:
    my_pipeline:RandomForestClassifier:n_estimators, Type: UniformInteger, 
Range: [10, 100], Default: 55


Configuration(values={
  'my_pipeline:RandomForestClassifier:n_estimators': 84,
})

╭─ Sequential(my_pipeline) ────────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                 
  item PCA(n_components=3)                  
 ╰──────────────────────────╯                 
  
 ╭─ Component(RandomForestClassifier) ──────╮ 
  item   class RandomForestClassifier(...)  
  config {'n_estimators': 84}               
  space  {'n_estimators': (10, 100)}        
 ╰──────────────────────────────────────────╯ 
╰──────────────────────────────────────────────╯

Pipeline(steps=[('PCA', PCA(n_components=3)),
                ('RandomForestClassifier', RandomForestClassifier())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

You may also just chain together nodes using an infix operator >> if you prefer:

from amltk.pipeline import Join, Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier

pipeline = (
    Sequential(name="my_pipeline")
    >> PCA(n_components=3)
    >> Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
)

╭─ Sequential(my_pipeline) ───────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                
  item PCA(n_components=3)                 
 ╰──────────────────────────╯                
  
 ╭─ Component(RandomForestClassifier) ─────╮ 
  item  class RandomForestClassifier(...)  
  space {'n_estimators': (10, 100)}        
 ╰─────────────────────────────────────────╯ 
╰─────────────────────────────────────────────╯

Whenever some other node sees a list, i.e. [comp1, comp2, comp3], this will automatically be converted into a Sequential.

from amltk.pipeline import Choice
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

pipeline_choice = Choice(
    [SimpleImputer(), RandomForestClassifier()],
    [StandardScaler(), MLPClassifier()],
    name="pipeline_choice"
)

╭─ Choice(pipeline_choice) ───────────────────────────────────────────────╮
 ╭─ Sequential(Seq-qizvtpNM) ──────────╮ ╭─ Sequential(Seq-sXD6ZpNr) ──╮ 
  ╭─ Fixed(SimpleImputer) ─╮            ╭─ Fixed(StandardScaler) ─╮  
   item SimpleImputer()                item StandardScaler()     
  ╰────────────────────────╯            ╰─────────────────────────╯  
       
  ╭─ Fixed(RandomForestClassifier) ─╮   ╭─ Fixed(MLPClassifier) ─╮   
   item RandomForestClassifier()       item MLPClassifier()      
  ╰─────────────────────────────────╯   ╰────────────────────────╯   
 ╰─────────────────────────────────────╯ ╰─────────────────────────────╯ 
╰─────────────────────────────────────────────────────────────────────────╯

Like all Nodes, a Sequential accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    *nodes: Node | NodeLike,
    name: str | None = None,
    item: Item | Callable[[Item], Item] | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    _nodes = tuple(as_node(n) for n in nodes)

    # Perhaps we need to do a deeper check on this...
    if not all_unique(_nodes, key=lambda node: node.name):
        raise DuplicateNamesError(self)

    if name is None:
        name = f"Seq-{randuid(8)}"

    super().__init__(
        *_nodes,
        name=name,
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

Choice#

Bases: Node[Item, Space]

A Choice between different subcomponents.

This indicates that a choice should be made between the different children in .nodes, usually done when you configure() with some config from a search_space().

from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})

estimator_choice = Choice(rf, mlp, name="estimator")

space = estimator_choice.search_space("configspace")

config = space.sample_configuration()

configured_choice = estimator_choice.configure(config)

chosen_estimator = configured_choice.chosen()

estimator = chosen_estimator.build_item()

╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
 ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮    
  item  class MLPClassifier(...)   item  class                            
  space {                                RandomForestClassifier(...)      
            'activation': [        space {'n_estimators': (10, 100)}      
                'logistic',       ╰────────────────────────────────────╯    
                'relu',                                                     
                'tanh'                                                      
            ]                                                               
        }                                                                   
 ╰────────────────────────────────╯                                           
╰──────────────────────────────────────────────────────────────────────────────╯

Configuration space object:
  Hyperparameters:
    estimator:MLPClassifier:activation, Type: Categorical, Choices: {logistic, 
relu, tanh}, Default: logistic
    estimator:RandomForestClassifier:n_estimators, Type: UniformInteger, Range: 
[10, 100], Default: 55
    estimator:__choice__, Type: Categorical, Choices: {MLPClassifier, 
RandomForestClassifier}, Default: MLPClassifier
  Conditions:
    estimator:MLPClassifier:activation | estimator:__choice__ == 'MLPClassifier'
    estimator:RandomForestClassifier:n_estimators | estimator:__choice__ == 
'RandomForestClassifier'


Configuration(values={
  'estimator:MLPClassifier:activation': 'relu',
  'estimator:__choice__': 'MLPClassifier',
})

╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
 config {'__choice__': 'MLPClassifier'}                                       
 ╭─ Component(MLPClassifier) ──────╮ ╭─ Component(RandomForestClassifier)─╮   
  item   class MLPClassifier(...)   item  class                           
  config {'activation': 'relu'}           RandomForestClassifier(...)     
  space  {                          space {'n_estimators': (10, 100)}     
             'activation': [       ╰────────────────────────────────────╯   
                 'logistic',                                                
                 'relu',                                                    
                 'tanh'                                                     
             ]                                                              
         }                                                                  
 ╰─────────────────────────────────╯                                          
╰──────────────────────────────────────────────────────────────────────────────╯

╭─ Component(MLPClassifier) ──────────────────────────╮
 item   class MLPClassifier(...)                     
 config {'activation': 'relu'}                       
 space  {'activation': ['logistic', 'relu', 'tanh']} 
╰─────────────────────────────────────────────────────╯

MLPClassifier()

You may also just add nodes to a Choice using an infix operator | if you prefer:

from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})

estimator_choice = (
    Choice(name="estimator") | mlp | rf
)

╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
 ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮    
  item  class MLPClassifier(...)   item  class                            
  space {                                RandomForestClassifier(...)      
            'activation': [        space {'n_estimators': (10, 100)}      
                'logistic',       ╰────────────────────────────────────╯    
                'relu',                                                     
                'tanh'                                                      
            ]                                                               
        }                                                                   
 ╰────────────────────────────────╯                                           
╰──────────────────────────────────────────────────────────────────────────────╯

Whenever some other node sees a set, i.e. {comp1, comp2, comp3}, this will automatically be converted into a Choice.

from amltk.pipeline import Choice, Component, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.impute import SimpleImputer

rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})

pipeline = Sequential(
    SimpleImputer(fill_value=0),
    {mlp, rf},
    name="my_pipeline",
)

╭─ Sequential(my_pipeline) ────────────────────────────────────────────────────╮
 ╭─ Fixed(SimpleImputer) ───────────╮                                         
  item SimpleImputer(fill_value=0)                                          
 ╰──────────────────────────────────╯                                         
  
 ╭─ Choice(Choice-7CmvbHZT) ────────────────────────────────────────────────╮ 
  ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifie─╮   
   item  class MLPClassifier(...)   item  class                         
   space {                                RandomForestClassifier(..…    
             'activation': [        space {                             
                 'logistic',                  'n_estimators': (         
                 'relu',                          10,                   
                 'tanh'                           100                   
             ]                                )                         
         }                                }                             
  ╰────────────────────────────────╯ ╰──────────────────────────────────╯   
 ╰──────────────────────────────────────────────────────────────────────────╯ 
╰──────────────────────────────────────────────────────────────────────────────╯

Like all Nodes, a Choice accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

Order of nodes

The given nodes of a choice are always ordered according to their name, so indexing choice.nodes may not be reliable if modifying the choice dynamically.

Please use choice["name"] to access the nodes instead.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    *nodes: Node | NodeLike,
    name: str | None = None,
    item: Item | Callable[[Item], Item] | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    _nodes: tuple[Node, ...] = tuple(
        sorted((as_node(n) for n in nodes), key=lambda n: n.name),
    )
    if not all_unique(_nodes, key=lambda node: node.name):
        raise ValueError(
            f"Can't handle nodes as we can not generate a __choice__ for {nodes=}."
            "\nAll nodes must have a unique name. Please provide a `name=` to them",
        )

    if name is None:
        name = f"Choice-{randuid(8)}"

    super().__init__(
        *_nodes,
        name=name,
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

Split#

Bases: Node[Item, Space]

A Split of data in a pipeline.

This indicates the different children in .nodes should act in parallel but on different subsets of data.

from amltk.pipeline import Component, Split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_selector

categorical_pipeline = [
    SimpleImputer(strategy="constant", fill_value="missing"),
    OneHotEncoder(drop="first"),
]
numerical_pipeline = Component(SimpleImputer, space={"strategy": ["mean", "median"]})

preprocessor = Split(
    {
        "categories": categorical_pipeline,
        "numerical": numerical_pipeline,
    },
    config={
        # This is how you would configure the split for the sklearn builder in particular
        "categories": make_column_selector(dtype_include="category"),
        "numerical": make_column_selector(dtype_exclude="category"),
    },
    name="my_split"
)

space = preprocessor.search_space("configspace")

configuration = space.sample_configuration()

configured_preprocessor = preprocessor.configure(configuration)

built_preprocessor = configured_preprocessor.build("sklearn")

╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
 config {                                                                     
            'categories':                                                     
        <sklearn.compose._column_transformer.make_column_selector object at   
        0x7fdbd19051e0>,                                                      
            'numerical':                                                      
        <sklearn.compose._column_transformer.make_column_selector object at   
        0x7fdbd1906320>                                                       
        }                                                                     
 ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ 
  ╭─ Fixed(SimpleImputer) ────────╮   ╭─ Component(SimpleImputer) ─────╮  
   item SimpleImputer(fill_valu…     item  class SimpleImputer(...)   
        strategy='constant')         space {                          
  ╰───────────────────────────────╯              'strategy': [          
                    'mean',            
  ╭─ Fixed(OneHotEncoder) ────────╮                  'median'           
   item OneHotEncoder(drop='fir…               ]                      
  ╰───────────────────────────────╯          }                          
 ╰───────────────────────────────────╯  ╰────────────────────────────────╯  
                                       ╰────────────────────────────────────╯ 
╰──────────────────────────────────────────────────────────────────────────────╯

Configuration space object:
  Hyperparameters:
    my_split:numerical:SimpleImputer:strategy, Type: Categorical, Choices: 
{mean, median}, Default: mean


Configuration(values={
  'my_split:numerical:SimpleImputer:strategy': 'median',
})

╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
 config {                                                                     
            'categories':                                                     
        <sklearn.compose._column_transformer.make_column_selector object at   
        0x7fdbd19051e0>,                                                      
            'numerical':                                                      
        <sklearn.compose._column_transformer.make_column_selector object at   
        0x7fdbd1906320>                                                       
        }                                                                     
 ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ 
  ╭─ Fixed(SimpleImputer) ────────╮   ╭─ Component(SimpleImputer) ─────╮  
   item SimpleImputer(fill_valu…     item   class                     
        strategy='constant')                SimpleImputer(...)        
  ╰───────────────────────────────╯    config {'strategy': 'median'}    
      space  {                         
  ╭─ Fixed(OneHotEncoder) ────────╮               'strategy': [         
   item OneHotEncoder(drop='fir…                    'mean',           
  ╰───────────────────────────────╯                   'median'          
 ╰───────────────────────────────────╯              ]                     
                                                }                         
                                        ╰────────────────────────────────╯  
                                       ╰────────────────────────────────────╯ 
╰──────────────────────────────────────────────────────────────────────────────╯

Pipeline(steps=[('my_split',
                 ColumnTransformer(transformers=[('categories',
                                                  Pipeline(steps=[('SimpleImputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('OneHotEncoder',
                                                                   OneHotEncoder(drop='first'))]),
                                                  <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd19051e0>),
                                                 ('SimpleImputer',
                                                  SimpleImputer(strategy='median'),
                                                  <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1906320>)]))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The split is a slight oddity when compared to the other kinds of components in that it allows a dict as it's first argument, where the keys are the names of the different paths through which data will go and the values are the actual nodes that will receive the data.

If nodes are passed in as they are for all other components, usually the name of the first node will be important for any builder trying to make sense of how to use the Split

Like all Nodes, a Split accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    *nodes: Node | NodeLike | dict[str, Node | NodeLike],
    name: str | None = None,
    item: Item | Callable[[Item], Item] | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    if any(isinstance(n, dict) for n in nodes):
        if len(nodes) > 1:
            raise ValueError(
                "Can't handle multiple nodes with a dictionary as a node.\n"
                f"{nodes=}",
            )
        _node = nodes[0]
        assert isinstance(_node, dict)

        def _construct(key: str, value: Node | NodeLike) -> Node:
            match value:
                case list():
                    return Sequential(*value, name=key)
                case set() | tuple():
                    return as_node(value, name=key)
                case _:
                    return Sequential(value, name=key)

        _nodes = tuple(_construct(key, value) for key, value in _node.items())
    else:
        _nodes = tuple(as_node(n) for n in nodes)

    if not all_unique(_nodes, key=lambda node: node.name):
        raise ValueError(
            f"Can't handle nodes they do not all contain unique names, {nodes=}."
            "\nAll nodes must have a unique name. Please provide a `name=` to them",
        )

    if name is None:
        name = f"Split-{randuid(8)}"

    super().__init__(
        *_nodes,
        name=name,
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

Join#

Bases: Node[Item, Space]

Join together different parts of the pipeline.

This indicates the different children in .nodes should act in tandem with one another, for example, concatenating the outputs of the various members of the Join.

from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})

join = Join(pca, kbest, name="my_feature_union")

space = join.search_space("configspace")

pipeline = join.build("sklearn")

╭─ Join(my_feature_union) ────────────────────────────────────────────╮
 ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ 
  item  class PCA(...)             item  class SelectKBest(...)  
  space {'n_components': (1, 3)}   space {'k': (1, 3)}           
 ╰────────────────────────────────╯ ╰──────────────────────────────╯ 
╰─────────────────────────────────────────────────────────────────────╯

Configuration space object:
  Hyperparameters:
    my_feature_union:PCA:n_components, Type: UniformInteger, Range: [1, 3], 
Default: 2
    my_feature_union:SelectKBest:k, Type: UniformInteger, Range: [1, 3], 
Default: 2


Pipeline(steps=[('my_feature_union',
                 FeatureUnion(transformer_list=[('PCA', PCA()),
                                                ('SelectKBest',
                                                 SelectKBest())]))])

You may also just join together nodes using an infix operator & if you prefer:

from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})

# Can not parametrize or name the join
join = pca & kbest

# With a parametrized join
join = (
    Join(name="my_feature_union") & pca & kbest
)
item = join.build("sklearn")

╭─ Join(Join-I2f1QH4p) ───────────────────────────────────────────────╮
 ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ 
  item  class PCA(...)             item  class SelectKBest(...)  
  space {'n_components': (1, 3)}   space {'k': (1, 3)}           
 ╰────────────────────────────────╯ ╰──────────────────────────────╯ 
╰─────────────────────────────────────────────────────────────────────╯

Pipeline(steps=[('my_feature_union',
                 FeatureUnion(transformer_list=[('PCA', PCA()),
                                                ('SelectKBest',
                                                 SelectKBest())]))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Whenever some other node sees a tuple, i.e. (comp1, comp2, comp3), this will automatically be converted into a Join.

from amltk.pipeline import Sequential, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.ensemble import RandomForestClassifier

pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})

# Can not parametrize or name the join
join = Sequential(
    (pca, kbest),
    RandomForestClassifier(n_estimators=5),
    name="my_feature_union",
)

╭─ Sequential(my_feature_union) ──────────────────────────────────────────╮ │ ╭─ Join(Join-sAhetug0) ───────────────────────────────────────────────╮ │ │ │ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │ │ │ │ │ item class PCA(...) │ │ item class SelectKBest(...) │ │ │ │ │ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │ │ │ │ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │ │ │ ╰─────────────────────────────────────────────────────────────────────╯ │ │ ↓ │ │ ╭─ Fixed(RandomForestClassifier) ─────────────╮ │ │ │ item RandomForestClassifier(n_estimators=5) │ │ │ ╰─────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────╯

Like all Nodes, a Join accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    *nodes: Node | NodeLike,
    name: str | None = None,
    item: Item | Callable[[Item], Item] | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    _nodes = tuple(as_node(n) for n in nodes)
    if not all_unique(_nodes, key=lambda node: node.name):
        raise ValueError(
            f"Can't handle nodes they do not all contain unique names, {nodes=}."
            "\nAll nodes must have a unique name. Please provide a `name=` to them",
        )

    if name is None:
        name = f"Join-{randuid(8)}"

    super().__init__(
        *_nodes,
        name=name,
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

Fixed#

Bases: Node[Item, None]

A Fixed part of the pipeline that represents something that can not be configured and used directly as is.

It consists of an .item that is fixed, non-configurable and non-searchable. It also has no children.

This is useful for representing parts of the pipeline that are fixed, for example if you have a pipeline that is a Sequential of nodes, but you want to fix the first component to be a PCA with n_components=3, you can use a Fixed to represent that.

from amltk.pipeline import Component, Fixed, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA

rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
pca = Fixed(PCA(n_components=3))

pipeline = Sequential(pca, rf, name="my_pipeline")

╭─ Sequential(my_pipeline) ───────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                
  item PCA(n_components=3)                 
 ╰──────────────────────────╯                
  
 ╭─ Component(RandomForestClassifier) ─────╮ 
  item  class RandomForestClassifier(...)  
  space {'n_estimators': (10, 100)}        
 ╰─────────────────────────────────────────╯ 
╰─────────────────────────────────────────────╯

Whenever some other node sees an instance of something, i.e. something that can't be called, this will automatically be converted into a Fixed.

from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA

pipeline = Sequential(
    PCA(n_components=3),
    RandomForestClassifier(n_estimators=50),
    name="my_pipeline",
)

╭─ Sequential(my_pipeline) ────────────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                     
  item PCA(n_components=3)                      
 ╰──────────────────────────╯                     
  
 ╭─ Fixed(RandomForestClassifier) ──────────────╮ 
  item RandomForestClassifier(n_estimators=50)  
 ╰──────────────────────────────────────────────╯ 
╰──────────────────────────────────────────────────╯

The default .name of a component is the class name of the item that it will use. You can explicitly set the name= if you want to when constructing the component.

A Fixed accepts only an explicit name=, item=, meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    item: Item,
    *,
    name: str | None = None,
    config: None = None,
    space: None = None,
    fidelities: None = None,
    config_transform: None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    super().__init__(
        name=name if name is not None else entity_name(item),
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

Searchable#

Bases: Node[None, Space]

A Searchable node of the pipeline which just represents a search space, no item attached.

While not usually applicable to pipelines you want to build, this component is useful for creating a search space, especially if the the real pipeline you want to optimize can not be built directly. For example, if you are optimize a script, you may wish to use a Searchable to represent the search space of that script.

from amltk.pipeline import Searchable

script_space = Searchable({"mode": ["orange", "blue", "red"], "n": (10, 100)})

╭─ Searchable(Searchable-0jLpL6X0) ─────────────────────────╮
 space {'mode': ['orange', 'blue', 'red'], 'n': (10, 100)} 
╰───────────────────────────────────────────────────────────╯

A Searchable explicitly does not allow for item= to be set, nor can it have any children. A Searchable accepts an explicit name=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    space: Space | None = None,
    *,
    name: str | None = None,
    config: Config | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    if name is None:
        name = f"Searchable-{randuid(8)}"

    super().__init__(
        name=name,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

class Join(*nodes, name=None, item=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#

Bases: Node[Item, Space]

Join together different parts of the pipeline.

This indicates the different children in .nodes should act in tandem with one another, for example, concatenating the outputs of the various members of the Join.

from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})

join = Join(pca, kbest, name="my_feature_union")

space = join.search_space("configspace")

pipeline = join.build("sklearn")

╭─ Join(my_feature_union) ────────────────────────────────────────────╮
 ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ 
  item  class PCA(...)             item  class SelectKBest(...)  
  space {'n_components': (1, 3)}   space {'k': (1, 3)}           
 ╰────────────────────────────────╯ ╰──────────────────────────────╯ 
╰─────────────────────────────────────────────────────────────────────╯

Configuration space object:
  Hyperparameters:
    my_feature_union:PCA:n_components, Type: UniformInteger, Range: [1, 3], 
Default: 2
    my_feature_union:SelectKBest:k, Type: UniformInteger, Range: [1, 3], 
Default: 2


Pipeline(steps=[('my_feature_union',
                 FeatureUnion(transformer_list=[('PCA', PCA()),
                                                ('SelectKBest',
                                                 SelectKBest())]))])

You may also just join together nodes using an infix operator & if you prefer:

from amltk.pipeline import Join, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})

# Can not parametrize or name the join
join = pca & kbest

# With a parametrized join
join = (
    Join(name="my_feature_union") & pca & kbest
)
item = join.build("sklearn")

╭─ Join(Join-7ayXUEwB) ───────────────────────────────────────────────╮
 ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ 
  item  class PCA(...)             item  class SelectKBest(...)  
  space {'n_components': (1, 3)}   space {'k': (1, 3)}           
 ╰────────────────────────────────╯ ╰──────────────────────────────╯ 
╰─────────────────────────────────────────────────────────────────────╯

Pipeline(steps=[('my_feature_union',
                 FeatureUnion(transformer_list=[('PCA', PCA()),
                                                ('SelectKBest',
                                                 SelectKBest())]))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Whenever some other node sees a tuple, i.e. (comp1, comp2, comp3), this will automatically be converted into a Join.

from amltk.pipeline import Sequential, Component
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.ensemble import RandomForestClassifier

pca = Component(PCA, space={"n_components": (1, 3)})
kbest = Component(SelectKBest, space={"k": (1, 3)})

# Can not parametrize or name the join
join = Sequential(
    (pca, kbest),
    RandomForestClassifier(n_estimators=5),
    name="my_feature_union",
)

╭─ Sequential(my_feature_union) ──────────────────────────────────────────╮ │ ╭─ Join(Join-gfZDjsg6) ───────────────────────────────────────────────╮ │ │ │ ╭─ Component(PCA) ───────────────╮ ╭─ Component(SelectKBest) ─────╮ │ │ │ │ │ item class PCA(...) │ │ item class SelectKBest(...) │ │ │ │ │ │ space {'n_components': (1, 3)} │ │ space {'k': (1, 3)} │ │ │ │ │ ╰────────────────────────────────╯ ╰──────────────────────────────╯ │ │ │ ╰─────────────────────────────────────────────────────────────────────╯ │ │ ↓ │ │ ╭─ Fixed(RandomForestClassifier) ─────────────╮ │ │ │ item RandomForestClassifier(n_estimators=5) │ │ │ ╰─────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────╯

Like all Nodes, a Join accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    *nodes: Node | NodeLike,
    name: str | None = None,
    item: Item | Callable[[Item], Item] | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    _nodes = tuple(as_node(n) for n in nodes)
    if not all_unique(_nodes, key=lambda node: node.name):
        raise ValueError(
            f"Can't handle nodes they do not all contain unique names, {nodes=}."
            "\nAll nodes must have a unique name. Please provide a `name=` to them",
        )

    if name is None:
        name = f"Join-{randuid(8)}"

    super().__init__(
        *_nodes,
        name=name,
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

nodes: tuple[Node, ...]
attr
#

The nodes that this node leads to.

class Choice(*nodes, name=None, item=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#

Bases: Node[Item, Space]

A Choice between different subcomponents.

This indicates that a choice should be made between the different children in .nodes, usually done when you configure() with some config from a search_space().

from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})

estimator_choice = Choice(rf, mlp, name="estimator")

space = estimator_choice.search_space("configspace")

config = space.sample_configuration()

configured_choice = estimator_choice.configure(config)

chosen_estimator = configured_choice.chosen()

estimator = chosen_estimator.build_item()

╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
 ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮    
  item  class MLPClassifier(...)   item  class                            
  space {                                RandomForestClassifier(...)      
            'activation': [        space {'n_estimators': (10, 100)}      
                'logistic',       ╰────────────────────────────────────╯    
                'relu',                                                     
                'tanh'                                                      
            ]                                                               
        }                                                                   
 ╰────────────────────────────────╯                                           
╰──────────────────────────────────────────────────────────────────────────────╯

Configuration space object:
  Hyperparameters:
    estimator:MLPClassifier:activation, Type: Categorical, Choices: {logistic, 
relu, tanh}, Default: logistic
    estimator:RandomForestClassifier:n_estimators, Type: UniformInteger, Range: 
[10, 100], Default: 55
    estimator:__choice__, Type: Categorical, Choices: {MLPClassifier, 
RandomForestClassifier}, Default: MLPClassifier
  Conditions:
    estimator:MLPClassifier:activation | estimator:__choice__ == 'MLPClassifier'
    estimator:RandomForestClassifier:n_estimators | estimator:__choice__ == 
'RandomForestClassifier'


Configuration(values={
  'estimator:MLPClassifier:activation': 'logistic',
  'estimator:__choice__': 'MLPClassifier',
})

╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
 config {'__choice__': 'MLPClassifier'}                                       
 ╭─ Component(MLPClassifier) ────────╮ ╭─ Component(RandomForestClassifier)─╮ 
  item   class MLPClassifier(...)     item  class                         
  config {'activation': 'logistic'}         RandomForestClassifier(...)   
  space  {                            space {'n_estimators': (10, 100)}   
             'activation': [         ╰────────────────────────────────────╯ 
                 'logistic',                                                
                 'relu',                                                    
                 'tanh'                                                     
             ]                                                              
         }                                                                  
 ╰───────────────────────────────────╯                                        
╰──────────────────────────────────────────────────────────────────────────────╯

╭─ Component(MLPClassifier) ──────────────────────────╮
 item   class MLPClassifier(...)                     
 config {'activation': 'logistic'}                   
 space  {'activation': ['logistic', 'relu', 'tanh']} 
╰─────────────────────────────────────────────────────╯

MLPClassifier(activation='logistic')

You may also just add nodes to a Choice using an infix operator | if you prefer:

from amltk.pipeline import Choice, Component
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})

estimator_choice = (
    Choice(name="estimator") | mlp | rf
)

╭─ Choice(estimator) ──────────────────────────────────────────────────────────╮
 ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifier)─╮    
  item  class MLPClassifier(...)   item  class                            
  space {                                RandomForestClassifier(...)      
            'activation': [        space {'n_estimators': (10, 100)}      
                'logistic',       ╰────────────────────────────────────╯    
                'relu',                                                     
                'tanh'                                                      
            ]                                                               
        }                                                                   
 ╰────────────────────────────────╯                                           
╰──────────────────────────────────────────────────────────────────────────────╯

Whenever some other node sees a set, i.e. {comp1, comp2, comp3}, this will automatically be converted into a Choice.

from amltk.pipeline import Choice, Component, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.impute import SimpleImputer

rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
mlp = Component(MLPClassifier, space={"activation": ["logistic", "relu", "tanh"]})

pipeline = Sequential(
    SimpleImputer(fill_value=0),
    {mlp, rf},
    name="my_pipeline",
)

╭─ Sequential(my_pipeline) ────────────────────────────────────────────────────╮
 ╭─ Fixed(SimpleImputer) ───────────╮                                         
  item SimpleImputer(fill_value=0)                                          
 ╰──────────────────────────────────╯                                         
  
 ╭─ Choice(Choice-Rsb4oblH) ────────────────────────────────────────────────╮ 
  ╭─ Component(MLPClassifier) ─────╮ ╭─ Component(RandomForestClassifie─╮   
   item  class MLPClassifier(...)   item  class                         
   space {                                RandomForestClassifier(..…    
             'activation': [        space {                             
                 'logistic',                  'n_estimators': (         
                 'relu',                          10,                   
                 'tanh'                           100                   
             ]                                )                         
         }                                }                             
  ╰────────────────────────────────╯ ╰──────────────────────────────────╯   
 ╰──────────────────────────────────────────────────────────────────────────╯ 
╰──────────────────────────────────────────────────────────────────────────────╯

Like all Nodes, a Choice accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

Order of nodes

The given nodes of a choice are always ordered according to their name, so indexing choice.nodes may not be reliable if modifying the choice dynamically.

Please use choice["name"] to access the nodes instead.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    *nodes: Node | NodeLike,
    name: str | None = None,
    item: Item | Callable[[Item], Item] | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    _nodes: tuple[Node, ...] = tuple(
        sorted((as_node(n) for n in nodes), key=lambda n: n.name),
    )
    if not all_unique(_nodes, key=lambda node: node.name):
        raise ValueError(
            f"Can't handle nodes as we can not generate a __choice__ for {nodes=}."
            "\nAll nodes must have a unique name. Please provide a `name=` to them",
        )

    if name is None:
        name = f"Choice-{randuid(8)}"

    super().__init__(
        *_nodes,
        name=name,
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

nodes: tuple[Node, ...]
attr
#

The nodes that this node leads to.

def chosen() #

The chosen branch.

RETURNS DESCRIPTION
Node

The chosen branch

Source code in src/amltk/pipeline/components.py
def chosen(self) -> Node:
    """The chosen branch.

    Returns:
        The chosen branch
    """
    match self.config:
        case {"__choice__": choice}:
            chosen = first_true(
                self.nodes,
                pred=lambda node: node.name == choice,
                default=None,
            )
            if chosen is None:
                raise NodeNotFoundError(choice, self.name)

            return chosen
        case _:
            raise NoChoiceMadeError(self.name)

class Sequential(*nodes, name=None, item=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#

Bases: Node[Item, Space]

A Sequential set of operations in a pipeline.

This indicates the different children in .nodes should act one after another, feeding the output of one into the next.

from amltk.pipeline import Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier

pipeline = Sequential(
    PCA(n_components=3),
    Component(RandomForestClassifier, space={"n_estimators": (10, 100)}),
    name="my_pipeline"
)

space = pipeline.search_space("configspace")

configuration = space.sample_configuration()

configured_pipeline = pipeline.configure(configuration)

sklearn_pipeline = pipeline.build("sklearn")

╭─ Sequential(my_pipeline) ───────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                
  item PCA(n_components=3)                 
 ╰──────────────────────────╯                
  
 ╭─ Component(RandomForestClassifier) ─────╮ 
  item  class RandomForestClassifier(...)  
  space {'n_estimators': (10, 100)}        
 ╰─────────────────────────────────────────╯ 
╰─────────────────────────────────────────────╯

Configuration space object:
  Hyperparameters:
    my_pipeline:RandomForestClassifier:n_estimators, Type: UniformInteger, 
Range: [10, 100], Default: 55


Configuration(values={
  'my_pipeline:RandomForestClassifier:n_estimators': 28,
})

╭─ Sequential(my_pipeline) ────────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                 
  item PCA(n_components=3)                  
 ╰──────────────────────────╯                 
  
 ╭─ Component(RandomForestClassifier) ──────╮ 
  item   class RandomForestClassifier(...)  
  config {'n_estimators': 28}               
  space  {'n_estimators': (10, 100)}        
 ╰──────────────────────────────────────────╯ 
╰──────────────────────────────────────────────╯

Pipeline(steps=[('PCA', PCA(n_components=3)),
                ('RandomForestClassifier', RandomForestClassifier())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

You may also just chain together nodes using an infix operator >> if you prefer:

from amltk.pipeline import Join, Component, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier

pipeline = (
    Sequential(name="my_pipeline")
    >> PCA(n_components=3)
    >> Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
)

╭─ Sequential(my_pipeline) ───────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                
  item PCA(n_components=3)                 
 ╰──────────────────────────╯                
  
 ╭─ Component(RandomForestClassifier) ─────╮ 
  item  class RandomForestClassifier(...)  
  space {'n_estimators': (10, 100)}        
 ╰─────────────────────────────────────────╯ 
╰─────────────────────────────────────────────╯

Whenever some other node sees a list, i.e. [comp1, comp2, comp3], this will automatically be converted into a Sequential.

from amltk.pipeline import Choice
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

pipeline_choice = Choice(
    [SimpleImputer(), RandomForestClassifier()],
    [StandardScaler(), MLPClassifier()],
    name="pipeline_choice"
)

╭─ Choice(pipeline_choice) ───────────────────────────────────────────────╮
 ╭─ Sequential(Seq-2X0w2zYN) ──────────╮ ╭─ Sequential(Seq-Ma80bjp8) ──╮ 
  ╭─ Fixed(SimpleImputer) ─╮            ╭─ Fixed(StandardScaler) ─╮  
   item SimpleImputer()                item StandardScaler()     
  ╰────────────────────────╯            ╰─────────────────────────╯  
       
  ╭─ Fixed(RandomForestClassifier) ─╮   ╭─ Fixed(MLPClassifier) ─╮   
   item RandomForestClassifier()       item MLPClassifier()      
  ╰─────────────────────────────────╯   ╰────────────────────────╯   
 ╰─────────────────────────────────────╯ ╰─────────────────────────────╯ 
╰─────────────────────────────────────────────────────────────────────────╯

Like all Nodes, a Sequential accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    *nodes: Node | NodeLike,
    name: str | None = None,
    item: Item | Callable[[Item], Item] | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    _nodes = tuple(as_node(n) for n in nodes)

    # Perhaps we need to do a deeper check on this...
    if not all_unique(_nodes, key=lambda node: node.name):
        raise DuplicateNamesError(self)

    if name is None:
        name = f"Seq-{randuid(8)}"

    super().__init__(
        *_nodes,
        name=name,
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

nodes: tuple[Node, ...]
attr
#

The nodes in series.

tail: Node
prop
#

The last step in the pipeline.

def __len__() #

Get the number of nodes in the pipeline.

Source code in src/amltk/pipeline/components.py
def __len__(self) -> int:
    """Get the number of nodes in the pipeline."""
    return len(self.nodes)

def walk(path=None) #

Walk the nodes in this chain.

PARAMETER DESCRIPTION
path

The current path to this node

TYPE: Sequence[Node] | None DEFAULT: None

YIELDS DESCRIPTION
list[Node]

The parents of the node and the node itself

Source code in src/amltk/pipeline/components.py
@override
def walk(
    self,
    path: Sequence[Node] | None = None,
) -> Iterator[tuple[list[Node], Node]]:
    """Walk the nodes in this chain.

    Args:
        path: The current path to this node

    Yields:
        The parents of the node and the node itself
    """
    path = list(path) if path is not None else []
    yield path, self

    path = [*path, self]
    for node in self.nodes:
        yield from node.walk(path=path)

        # Append the previous node so that the next node in the sequence is
        # lead to from the previous node
        path = [*path, node]

class Split(*nodes, name=None, item=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#

Bases: Node[Item, Space]

A Split of data in a pipeline.

This indicates the different children in .nodes should act in parallel but on different subsets of data.

from amltk.pipeline import Component, Split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_selector

categorical_pipeline = [
    SimpleImputer(strategy="constant", fill_value="missing"),
    OneHotEncoder(drop="first"),
]
numerical_pipeline = Component(SimpleImputer, space={"strategy": ["mean", "median"]})

preprocessor = Split(
    {
        "categories": categorical_pipeline,
        "numerical": numerical_pipeline,
    },
    config={
        # This is how you would configure the split for the sklearn builder in particular
        "categories": make_column_selector(dtype_include="category"),
        "numerical": make_column_selector(dtype_exclude="category"),
    },
    name="my_split"
)

space = preprocessor.search_space("configspace")

configuration = space.sample_configuration()

configured_preprocessor = preprocessor.configure(configuration)

built_preprocessor = configured_preprocessor.build("sklearn")

╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
 config {                                                                     
            'categories':                                                     
        <sklearn.compose._column_transformer.make_column_selector object at   
        0x7fdbd1944af0>,                                                      
            'numerical':                                                      
        <sklearn.compose._column_transformer.make_column_selector object at   
        0x7fdbd1946c20>                                                       
        }                                                                     
 ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ 
  ╭─ Fixed(SimpleImputer) ────────╮   ╭─ Component(SimpleImputer) ─────╮  
   item SimpleImputer(fill_valu…     item  class SimpleImputer(...)   
        strategy='constant')         space {                          
  ╰───────────────────────────────╯              'strategy': [          
                    'mean',            
  ╭─ Fixed(OneHotEncoder) ────────╮                  'median'           
   item OneHotEncoder(drop='fir…               ]                      
  ╰───────────────────────────────╯          }                          
 ╰───────────────────────────────────╯  ╰────────────────────────────────╯  
                                       ╰────────────────────────────────────╯ 
╰──────────────────────────────────────────────────────────────────────────────╯

Configuration space object:
  Hyperparameters:
    my_split:numerical:SimpleImputer:strategy, Type: Categorical, Choices: 
{mean, median}, Default: mean


Configuration(values={
  'my_split:numerical:SimpleImputer:strategy': 'mean',
})

╭─ Split(my_split) ────────────────────────────────────────────────────────────╮
 config {                                                                     
            'categories':                                                     
        <sklearn.compose._column_transformer.make_column_selector object at   
        0x7fdbd1944af0>,                                                      
            'numerical':                                                      
        <sklearn.compose._column_transformer.make_column_selector object at   
        0x7fdbd1946c20>                                                       
        }                                                                     
 ╭─ Sequential(categories) ──────────╮ ╭─ Sequential(numerical) ────────────╮ 
  ╭─ Fixed(SimpleImputer) ────────╮   ╭─ Component(SimpleImputer) ─────╮  
   item SimpleImputer(fill_valu…     item   class                     
        strategy='constant')                SimpleImputer(...)        
  ╰───────────────────────────────╯    config {'strategy': 'mean'}      
      space  {                         
  ╭─ Fixed(OneHotEncoder) ────────╮               'strategy': [         
   item OneHotEncoder(drop='fir…                    'mean',           
  ╰───────────────────────────────╯                   'median'          
 ╰───────────────────────────────────╯              ]                     
                                                }                         
                                        ╰────────────────────────────────╯  
                                       ╰────────────────────────────────────╯ 
╰──────────────────────────────────────────────────────────────────────────────╯

Pipeline(steps=[('my_split',
                 ColumnTransformer(transformers=[('categories',
                                                  Pipeline(steps=[('SimpleImputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('OneHotEncoder',
                                                                   OneHotEncoder(drop='first'))]),
                                                  <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1944af0>),
                                                 ('SimpleImputer',
                                                  SimpleImputer(),
                                                  <sklearn.compose._column_transformer.make_column_selector object at 0x7fdbd1946c20>)]))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The split is a slight oddity when compared to the other kinds of components in that it allows a dict as it's first argument, where the keys are the names of the different paths through which data will go and the values are the actual nodes that will receive the data.

If nodes are passed in as they are for all other components, usually the name of the first node will be important for any builder trying to make sense of how to use the Split

Like all Nodes, a Split accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    *nodes: Node | NodeLike | dict[str, Node | NodeLike],
    name: str | None = None,
    item: Item | Callable[[Item], Item] | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    if any(isinstance(n, dict) for n in nodes):
        if len(nodes) > 1:
            raise ValueError(
                "Can't handle multiple nodes with a dictionary as a node.\n"
                f"{nodes=}",
            )
        _node = nodes[0]
        assert isinstance(_node, dict)

        def _construct(key: str, value: Node | NodeLike) -> Node:
            match value:
                case list():
                    return Sequential(*value, name=key)
                case set() | tuple():
                    return as_node(value, name=key)
                case _:
                    return Sequential(value, name=key)

        _nodes = tuple(_construct(key, value) for key, value in _node.items())
    else:
        _nodes = tuple(as_node(n) for n in nodes)

    if not all_unique(_nodes, key=lambda node: node.name):
        raise ValueError(
            f"Can't handle nodes they do not all contain unique names, {nodes=}."
            "\nAll nodes must have a unique name. Please provide a `name=` to them",
        )

    if name is None:
        name = f"Split-{randuid(8)}"

    super().__init__(
        *_nodes,
        name=name,
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

nodes: tuple[Node, ...]
attr
#

The nodes that this node leads to.

class Component(item, *, name=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#

Bases: Node[Item, Space]

A Component of the pipeline with a possible item and no children.

This is the basic building block of most pipelines, it accepts as it's item= some function that will be called with build_item() to build that one part of the pipeline.

When build_item() is called, The .config on this node will be passed to the function to build the item.

A common pattern is to use a Component to wrap a constructor, specifying the space= and config= to be used when building the item.

from amltk.pipeline import Component
from sklearn.ensemble import RandomForestClassifier

rf = Component(
    RandomForestClassifier,
    config={"max_depth": 3},
    space={"n_estimators": (10, 100)}
)

config = {"n_estimators": 50}  # Sample from some space or something
configured_rf = rf.configure(config)

estimator = configured_rf.build_item()

╭─ Component(RandomForestClassifier) ──────╮
 item   class RandomForestClassifier(...) 
 config {'max_depth': 3}                  
 space  {'n_estimators': (10, 100)}       
╰──────────────────────────────────────────╯

RandomForestClassifier(max_depth=3, n_estimators=50)

Whenever some other node sees a function/constructor, i.e. RandomForestClassifier, this will automatically be converted into a Component.

from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier

pipeline = Sequential(RandomForestClassifier, name="my_pipeline")

╭─ Sequential(my_pipeline) ──────────────────╮
 ╭─ Component(RandomForestClassifier) ────╮ 
  item class RandomForestClassifier(...)  
 ╰────────────────────────────────────────╯ 
╰────────────────────────────────────────────╯

The default .name of a component is the name of the class/function that it will use. You can explicitly set the name= if you want to when constructing the component.

Like all Nodes, a Component accepts an explicit name=, item=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    item: Callable[..., Item],
    *,
    name: str | None = None,
    config: Config | None = None,
    space: Space | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    super().__init__(
        name=name if name is not None else entity_name(item),
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

item: Callable[..., Item]
attr
#

A node which constructs an item in the pipeline.

nodes: tuple[]
attr
#

A component has no children.

def build_item(**kwargs) #

Build the item attached to this component.

PARAMETER DESCRIPTION
**kwargs

Any additional arguments to pass to the item

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Item

Item The built item

Source code in src/amltk/pipeline/components.py
def build_item(self, **kwargs: Any) -> Item:
    """Build the item attached to this component.

    Args:
        **kwargs: Any additional arguments to pass to the item

    Returns:
        Item
            The built item
    """
    config = self.config or {}
    return self.item(**{**config, **kwargs})

class Searchable(space=None, *, name=None, config=None, fidelities=None, config_transform=None, meta=None)
dataclass
#

Bases: Node[None, Space]

A Searchable node of the pipeline which just represents a search space, no item attached.

While not usually applicable to pipelines you want to build, this component is useful for creating a search space, especially if the the real pipeline you want to optimize can not be built directly. For example, if you are optimize a script, you may wish to use a Searchable to represent the search space of that script.

from amltk.pipeline import Searchable

script_space = Searchable({"mode": ["orange", "blue", "red"], "n": (10, 100)})

╭─ Searchable(Searchable-CCaI9sO3) ─────────────────────────╮
 space {'mode': ['orange', 'blue', 'red'], 'n': (10, 100)} 
╰───────────────────────────────────────────────────────────╯

A Searchable explicitly does not allow for item= to be set, nor can it have any children. A Searchable accepts an explicit name=, config=, space=, fidelities=, config_transform= and meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    space: Space | None = None,
    *,
    name: str | None = None,
    config: Config | None = None,
    fidelities: Mapping[str, Any] | None = None,
    config_transform: Callable[[Config, Any], Config] | None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    if name is None:
        name = f"Searchable-{randuid(8)}"

    super().__init__(
        name=name,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

item: None
classvar attr
#

A searchable has no item.

nodes: tuple[]
classvar attr
#

A component has no children.

class Fixed(item, *, name=None, config=None, space=None, fidelities=None, config_transform=None, meta=None)
dataclass
#

Bases: Node[Item, None]

A Fixed part of the pipeline that represents something that can not be configured and used directly as is.

It consists of an .item that is fixed, non-configurable and non-searchable. It also has no children.

This is useful for representing parts of the pipeline that are fixed, for example if you have a pipeline that is a Sequential of nodes, but you want to fix the first component to be a PCA with n_components=3, you can use a Fixed to represent that.

from amltk.pipeline import Component, Fixed, Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA

rf = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
pca = Fixed(PCA(n_components=3))

pipeline = Sequential(pca, rf, name="my_pipeline")

╭─ Sequential(my_pipeline) ───────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                
  item PCA(n_components=3)                 
 ╰──────────────────────────╯                
  
 ╭─ Component(RandomForestClassifier) ─────╮ 
  item  class RandomForestClassifier(...)  
  space {'n_estimators': (10, 100)}        
 ╰─────────────────────────────────────────╯ 
╰─────────────────────────────────────────────╯

Whenever some other node sees an instance of something, i.e. something that can't be called, this will automatically be converted into a Fixed.

from amltk.pipeline import Sequential
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA

pipeline = Sequential(
    PCA(n_components=3),
    RandomForestClassifier(n_estimators=50),
    name="my_pipeline",
)

╭─ Sequential(my_pipeline) ────────────────────────╮
 ╭─ Fixed(PCA) ─────────────╮                     
  item PCA(n_components=3)                      
 ╰──────────────────────────╯                     
  
 ╭─ Fixed(RandomForestClassifier) ──────────────╮ 
  item RandomForestClassifier(n_estimators=50)  
 ╰──────────────────────────────────────────────╯ 
╰──────────────────────────────────────────────────╯

The default .name of a component is the class name of the item that it will use. You can explicitly set the name= if you want to when constructing the component.

A Fixed accepts only an explicit name=, item=, meta=.

See Also
Source code in src/amltk/pipeline/components.py
def __init__(
    self,
    item: Item,
    *,
    name: str | None = None,
    config: None = None,
    space: None = None,
    fidelities: None = None,
    config_transform: None = None,
    meta: Mapping[str, Any] | None = None,
):
    """See [`Node`][amltk.pipeline.node.Node] for details."""
    super().__init__(
        name=name if name is not None else entity_name(item),
        item=item,
        config=config,
        space=space,
        fidelities=fidelities,
        config_transform=config_transform,
        meta=meta,
    )

item: Item
classvar attr
#

The fixed item that this node represents.

space: None
classvar attr
#

A frozen node has no search space.

fidelities: None
classvar attr
#

A frozen node has no search space.

config: None
classvar attr
#

A frozen node has no config.

config_transform: None
classvar attr
#

A frozen node has no config so no transform.

nodes: tuple[]
classvar attr
#

A component has no children.

def as_node(thing, name=None) #

Convert a node, pipeline, set or tuple into a component, copying anything in the process and removing all linking to other nodes.

PARAMETER DESCRIPTION
thing

The thing to convert

TYPE: Node | NodeLike[Item]

name

The name of the node. If it already a node, it will be renamed to that one.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
Node | Choice | Join | Sequential | Fixed[Item]

The component

Source code in src/amltk/pipeline/components.py
def as_node(  # noqa: PLR0911
    thing: Node | NodeLike[Item],
    name: str | None = None,
) -> Node | Choice | Join | Sequential | Fixed[Item]:
    """Convert a node, pipeline, set or tuple into a component, copying anything
    in the process and removing all linking to other nodes.

    Args:
        thing: The thing to convert
        name: The name of the node. If it already a node, it will be renamed to that
            one.

    Returns:
        The component
    """
    match thing:
        case set():
            return Choice(*thing, name=name)
        case tuple():
            return Join(*thing, name=name)
        case list():
            return Sequential(*thing, name=name)
        case Node():
            name = thing.name if name is None else name
            return thing.mutate(name=name)
        case type():
            return Component(thing, name=name)
        case thing if (inspect.isfunction(thing) or inspect.ismethod(thing)):
            return Component(thing, name=name)
        case _:
            return Fixed(thing, name=name)