Configspace

amltk.pipeline.parsers.configspace #

ConfigSpace is a library for representing and sampling configurations for hyperparameter optimization. It features a straightforward API for defining hyperparameters, their ranges and even conditional dependencies.

It is generally flexible enough for more complex use cases, even handling the complex pipelines of AutoSklearn and AutoPyTorch, large scale hyperparameter spaces over which to optimize entire pipelines at a time.

Requirements

This requires ConfigSpace which can be installed with:

pip install "amltk[configspace]"

# Or directly
pip install ConfigSpace

In general, you should have the ConfigSpace documentation ready to consult for a full understanding of how to construct hyperparameter spaces with AMLTK.

Basic Usage#

You can directly us the parser() function and pass that into the search_space() method of a Node, however you can also simply provide search_space(parser="configspace", ...) for simplicity.

from amltk.pipeline import Component, Choice, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC

my_pipeline = (
    Sequential(name="Pipeline")
    >> Component(PCA, space={"n_components": (1, 3)})
    >> Choice(
        Component(
            SVC,
            space={"C": (0.1, 10.0)}
        ),
        Component(
            RandomForestClassifier,
            space={"n_estimators": (10, 100), "criterion": ["gini", "log_loss"]},
        ),
        Component(
            MLPClassifier,
            space={
                "activation": ["identity", "logistic", "relu"],
                "alpha": (0.0001, 0.1),
                "learning_rate": ["constant", "invscaling", "adaptive"],
            },
        ),
        name="estimator"
    )
)

space = my_pipeline.search_space("configspace")
print(space)

Configuration space object:
  Hyperparameters:
    Pipeline:PCA:n_components, Type: UniformInteger, Range: [1, 3], Default: 2
    Pipeline:estimator:MLPClassifier:activation, Type: Categorical, Choices: {identity, logistic, relu}, Default: identity
    Pipeline:estimator:MLPClassifier:alpha, Type: UniformFloat, Range: [0.0001, 0.1], Default: 0.05005
    Pipeline:estimator:MLPClassifier:learning_rate, Type: Categorical, Choices: {constant, invscaling, adaptive}, Default: constant
    Pipeline:estimator:RandomForestClassifier:criterion, Type: Categorical, Choices: {gini, log_loss}, Default: gini
    Pipeline:estimator:RandomForestClassifier:n_estimators, Type: UniformInteger, Range: [10, 100], Default: 55
    Pipeline:estimator:SVC:C, Type: UniformFloat, Range: [0.1, 10.0], Default: 5.05
    Pipeline:estimator:__choice__, Type: Categorical, Choices: {MLPClassifier, RandomForestClassifier, SVC}, Default: MLPClassifier
  Conditions:
    Pipeline:estimator:MLPClassifier:activation | Pipeline:estimator:__choice__ == 'MLPClassifier'
    Pipeline:estimator:MLPClassifier:alpha | Pipeline:estimator:__choice__ == 'MLPClassifier'
    Pipeline:estimator:MLPClassifier:learning_rate | Pipeline:estimator:__choice__ == 'MLPClassifier'
    Pipeline:estimator:RandomForestClassifier:criterion | Pipeline:estimator:__choice__ == 'RandomForestClassifier'
    Pipeline:estimator:RandomForestClassifier:n_estimators | Pipeline:estimator:__choice__ == 'RandomForestClassifier'
    Pipeline:estimator:SVC:C | Pipeline:estimator:__choice__ == 'SVC'

Here we have an example of a few different kinds of hyperparmeters,

PCA:n_components is a integer with a range of 1 to 3, uniform distribution, as specified by it's integer bounds in a tuple.
SVC:C is a float with a range of 0.1 to 10.0, uniform distribution, as specified by it's float bounds in a tuple.
RandomForestClassifier:criterion is a categorical hyperparameter, with two choices, "gini" and "log_loss".

There is also a Choice node, which is a special node that indicates that we could choose from one of these estimators. This leads to the conditionals that you can see in the printed out space.

You may wish to remove all conditionals if an Optimizer does not support them, or you may wish to remove them for other reasons. You can do this by passing conditionals=False to the parser() function.

print(my_pipeline.search_space("configspace", conditionals=False))

Configuration space object:
  Hyperparameters:
    Pipeline:PCA:n_components, Type: UniformInteger, Range: [1, 3], Default: 2
    Pipeline:estimator:MLPClassifier:activation, Type: Categorical, Choices: {identity, logistic, relu}, Default: identity
    Pipeline:estimator:MLPClassifier:alpha, Type: UniformFloat, Range: [0.0001, 0.1], Default: 0.05005
    Pipeline:estimator:MLPClassifier:learning_rate, Type: Categorical, Choices: {constant, invscaling, adaptive}, Default: constant
    Pipeline:estimator:RandomForestClassifier:criterion, Type: Categorical, Choices: {gini, log_loss}, Default: gini
    Pipeline:estimator:RandomForestClassifier:n_estimators, Type: UniformInteger, Range: [10, 100], Default: 55
    Pipeline:estimator:SVC:C, Type: UniformFloat, Range: [0.1, 10.0], Default: 5.05
    Pipeline:estimator:__choice__, Type: Categorical, Choices: {MLPClassifier, RandomForestClassifier, SVC}, Default: MLPClassifier

Likewise, you can also remove all heirarchy from the space which may make downstream tasks easier, by passing flat=True to the parser() function.

print(my_pipeline.search_space("configspace", flat=True))

Configuration space object:
  Hyperparameters:
    MLPClassifier:activation, Type: Categorical, Choices: {identity, logistic, relu}, Default: identity
    MLPClassifier:alpha, Type: UniformFloat, Range: [0.0001, 0.1], Default: 0.05005
    MLPClassifier:learning_rate, Type: Categorical, Choices: {constant, invscaling, adaptive}, Default: constant
    PCA:n_components, Type: UniformInteger, Range: [1, 3], Default: 2
    RandomForestClassifier:criterion, Type: Categorical, Choices: {gini, log_loss}, Default: gini
    RandomForestClassifier:n_estimators, Type: UniformInteger, Range: [10, 100], Default: 55
    SVC:C, Type: UniformFloat, Range: [0.1, 10.0], Default: 5.05
    estimator:__choice__, Type: Categorical, Choices: {MLPClassifier, RandomForestClassifier, SVC}, Default: MLPClassifier
  Conditions:
    MLPClassifier:activation | estimator:__choice__ == 'MLPClassifier'
    MLPClassifier:alpha | estimator:__choice__ == 'MLPClassifier'
    MLPClassifier:learning_rate | estimator:__choice__ == 'MLPClassifier'
    RandomForestClassifier:criterion | estimator:__choice__ == 'RandomForestClassifier'
    RandomForestClassifier:n_estimators | estimator:__choice__ == 'RandomForestClassifier'
    SVC:C | estimator:__choice__ == 'SVC'

More Specific Hyperparameters#

You'll often want to be a bit more specific with your hyperparameters, here we just show a few examples of how you'd couple your pipelines a bit more towards ConfigSpace.

from ConfigSpace import Float, Categorical, Normal
from amltk.pipeline import Searchable

s = Searchable(
    space={
        "lr": Float("lr", bounds=(1e-5, 1.), log=True, default=0.3),
        "balance": Float("balance", bounds=(-1.0, 1.0), distribution=Normal(0.0, 0.5)),
        "color": Categorical("color", ["red", "green", "blue"], weights=[2, 1, 1], default="blue"),
    },
    name="Something-To-Search",
)
print(s.search_space("configspace"))

Configuration space object:
  Hyperparameters:
    Something-To-Search:balance, Type: NormalFloat, Mu: 0.0, Sigma: 0.5, Range: [-1.0, 1.0], Default: 0.0
    Something-To-Search:color, Type: Categorical, Choices: {red, green, blue}, Default: blue, Probabilities: [0.5  0.25 0.25]
    Something-To-Search:lr, Type: UniformFloat, Range: [1e-05, 1.0], Default: 0.3, on log-scale

Conditional ands Advanced Usage#

We will refer you to the ConfigSpace documentation for the construction of these. However once you've constructed a ConfigurationSpace and added any forbiddens and conditionals, you may simply set that as the .space attribute.

from amltk.pipeline import Component, Choice, Sequential
from ConfigSpace import ConfigurationSpace, EqualsCondition, InCondition

myspace = ConfigurationSpace({"A": ["red", "green", "blue"], "B": (1, 10), "C": (-100.0, 0.0)})
myspace.add_conditions([
    EqualsCondition(myspace["B"], myspace["A"], "red"),  # B is active when A is red
    InCondition(myspace["C"], myspace["A"], ["green", "blue"]), # C is active when A is green or blue
])

component = Component(object, space=myspace, name="MyThing")

parsed_space = component.search_space("configspace")
print(parsed_space)

Configuration space object:
  Hyperparameters:
    MyThing:A, Type: Categorical, Choices: {red, green, blue}, Default: red
    MyThing:B, Type: UniformInteger, Range: [1, 10], Default: 6
    MyThing:C, Type: UniformFloat, Range: [-100.0, 0.0], Default: -50.0
  Conditions:
    MyThing:B | MyThing:A == 'red'
    MyThing:C | MyThing:A in {'green', 'blue'}

parser #

parser(
    node: Node,
    *,
    seed: int | None = None,
    flat: bool = False,
    conditionals: bool = True,
    delim: str = ":"
) -> ConfigurationSpace

Parse a Node and its children into a ConfigurationSpace.

PARAMETER	DESCRIPTION
`node`	The Node to parse TYPE: `Node`
`seed`	The seed to use for the ConfigurationSpace TYPE: `int \| None` DEFAULT: `None`
`flat`	Whether to have a heirarchical naming scheme for nodes and their children. TYPE: `bool` DEFAULT: `False`
`conditionals`	Whether to include conditionals in the space from a `Choice`. If this is `False`, this will also remove all forbidden clauses and other conditional clauses. The primary use of this functionality is that some optimizers do not support these features. TYPE: `bool` DEFAULT: `True`
`delim`	The delimiter to use for the names of the hyperparameters TYPE: `str` DEFAULT: `':'`

Source code in src/amltk/pipeline/parsers/configspace.py

def parser(
    node: Node,
    *,
    seed: int | None = None,
    flat: bool = False,
    conditionals: bool = True,
    delim: str = ":",
) -> ConfigurationSpace:
    """Parse a Node and its children into a ConfigurationSpace.

    Args:
        node: The Node to parse
        seed: The seed to use for the ConfigurationSpace
        flat: Whether to have a heirarchical naming scheme for nodes and their children.
        conditionals: Whether to include conditionals in the space from a
            [`Choice`][amltk.pipeline.Choice]. If this is `False`, this will
            also remove all forbidden clauses and other conditional clauses.
            The primary use of this functionality is that some optimizers do not
            support these features.
        delim: The delimiter to use for the names of the hyperparameters
    """
    space = ConfigurationSpace(seed=seed)
    space.add_configuration_space(
        prefix=node.name,
        delimiter=delim,
        configuration_space=_parse_space(node, seed=seed, conditionals=conditionals),
    )

    children = node.nodes

    choice = None
    if isinstance(node, Choice) and any(children):
        choice = Categorical(
            name=f"{node.name}{delim}__choice__",
            items=[child.name for child in children],
        )
        space.add_hyperparameter(choice)

    for child in children:
        space.add_configuration_space(
            prefix=node.name if not flat else "",
            delimiter=delim if not flat else "",
            configuration_space=parser(
                child,
                seed=seed,
                flat=flat,
                conditionals=conditionals,
                delim=delim,
            ),
            parent_hyperparameter=(
                {"parent": choice, "value": child.name}
                if choice and conditionals
                else None
            ),
        )

    return space