Configspace
amltk.pipeline.parsers.configspace
#
ConfigSpace is a library for representing and sampling configurations for hyperparameter optimization. It features a straightforward API for defining hyperparameters, their ranges and even conditional dependencies.
It is generally flexible enough for more complex use cases, even handling the complex pipelines of AutoSklearn and AutoPyTorch, large scale hyperparameter spaces over which to optimize entire pipelines at a time.
Requirements
This requires ConfigSpace
which can be installed with:
In general, you should have the ConfigSpace documentation ready to consult for a full understanding of how to construct hyperparameter spaces with AMLTK.
Basic Usage#
You can directly us the parser()
function and pass that into the search_space()
method of a Node
, however you can also simply provide
search_space(parser="configspace", ...)
for simplicity.
from amltk.pipeline import Component, Choice, Sequential
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
my_pipeline = (
Sequential(name="Pipeline")
>> Component(PCA, space={"n_components": (1, 3)})
>> Choice(
Component(
SVC,
space={"C": (0.1, 10.0)}
),
Component(
RandomForestClassifier,
space={"n_estimators": (10, 100), "criterion": ["gini", "log_loss"]},
),
Component(
MLPClassifier,
space={
"activation": ["identity", "logistic", "relu"],
"alpha": (0.0001, 0.1),
"learning_rate": ["constant", "invscaling", "adaptive"],
},
),
name="estimator"
)
)
space = my_pipeline.search_space("configspace")
print(space)
Configuration space object:
Hyperparameters:
Pipeline:PCA:n_components, Type: UniformInteger, Range: [1, 3], Default: 2
Pipeline:estimator:MLPClassifier:activation, Type: Categorical, Choices: {identity, logistic, relu}, Default: identity
Pipeline:estimator:MLPClassifier:alpha, Type: UniformFloat, Range: [0.0001, 0.1], Default: 0.05005
Pipeline:estimator:MLPClassifier:learning_rate, Type: Categorical, Choices: {constant, invscaling, adaptive}, Default: constant
Pipeline:estimator:RandomForestClassifier:criterion, Type: Categorical, Choices: {gini, log_loss}, Default: gini
Pipeline:estimator:RandomForestClassifier:n_estimators, Type: UniformInteger, Range: [10, 100], Default: 55
Pipeline:estimator:SVC:C, Type: UniformFloat, Range: [0.1, 10.0], Default: 5.05
Pipeline:estimator:__choice__, Type: Categorical, Choices: {MLPClassifier, RandomForestClassifier, SVC}, Default: MLPClassifier
Conditions:
Pipeline:estimator:MLPClassifier:activation | Pipeline:estimator:__choice__ == 'MLPClassifier'
Pipeline:estimator:MLPClassifier:alpha | Pipeline:estimator:__choice__ == 'MLPClassifier'
Pipeline:estimator:MLPClassifier:learning_rate | Pipeline:estimator:__choice__ == 'MLPClassifier'
Pipeline:estimator:RandomForestClassifier:criterion | Pipeline:estimator:__choice__ == 'RandomForestClassifier'
Pipeline:estimator:RandomForestClassifier:n_estimators | Pipeline:estimator:__choice__ == 'RandomForestClassifier'
Pipeline:estimator:SVC:C | Pipeline:estimator:__choice__ == 'SVC'
Here we have an example of a few different kinds of hyperparmeters,
PCA:n_components
is a integer with a range of 1 to 3, uniform distribution, as specified by it's integer bounds in a tuple.SVC:C
is a float with a range of 0.1 to 10.0, uniform distribution, as specified by it's float bounds in a tuple.RandomForestClassifier:criterion
is a categorical hyperparameter, with two choices,"gini"
and"log_loss"
.
There is also a Choice
node, which is a special node that indicates that
we could choose from one of these estimators. This leads to the conditionals that you
can see in the printed out space.
You may wish to remove all conditionals if an Optimizer
does not support them, or
you may wish to remove them for other reasons. You can do this by passing
conditionals=False
to the parser()
function.
Configuration space object:
Hyperparameters:
Pipeline:PCA:n_components, Type: UniformInteger, Range: [1, 3], Default: 2
Pipeline:estimator:MLPClassifier:activation, Type: Categorical, Choices: {identity, logistic, relu}, Default: identity
Pipeline:estimator:MLPClassifier:alpha, Type: UniformFloat, Range: [0.0001, 0.1], Default: 0.05005
Pipeline:estimator:MLPClassifier:learning_rate, Type: Categorical, Choices: {constant, invscaling, adaptive}, Default: constant
Pipeline:estimator:RandomForestClassifier:criterion, Type: Categorical, Choices: {gini, log_loss}, Default: gini
Pipeline:estimator:RandomForestClassifier:n_estimators, Type: UniformInteger, Range: [10, 100], Default: 55
Pipeline:estimator:SVC:C, Type: UniformFloat, Range: [0.1, 10.0], Default: 5.05
Pipeline:estimator:__choice__, Type: Categorical, Choices: {MLPClassifier, RandomForestClassifier, SVC}, Default: MLPClassifier
Likewise, you can also remove all heirarchy from the space which may make downstream tasks easier,
by passing flat=True
to the parser()
function.
Configuration space object:
Hyperparameters:
MLPClassifier:activation, Type: Categorical, Choices: {identity, logistic, relu}, Default: identity
MLPClassifier:alpha, Type: UniformFloat, Range: [0.0001, 0.1], Default: 0.05005
MLPClassifier:learning_rate, Type: Categorical, Choices: {constant, invscaling, adaptive}, Default: constant
PCA:n_components, Type: UniformInteger, Range: [1, 3], Default: 2
RandomForestClassifier:criterion, Type: Categorical, Choices: {gini, log_loss}, Default: gini
RandomForestClassifier:n_estimators, Type: UniformInteger, Range: [10, 100], Default: 55
SVC:C, Type: UniformFloat, Range: [0.1, 10.0], Default: 5.05
estimator:__choice__, Type: Categorical, Choices: {MLPClassifier, RandomForestClassifier, SVC}, Default: MLPClassifier
Conditions:
MLPClassifier:activation | estimator:__choice__ == 'MLPClassifier'
MLPClassifier:alpha | estimator:__choice__ == 'MLPClassifier'
MLPClassifier:learning_rate | estimator:__choice__ == 'MLPClassifier'
RandomForestClassifier:criterion | estimator:__choice__ == 'RandomForestClassifier'
RandomForestClassifier:n_estimators | estimator:__choice__ == 'RandomForestClassifier'
SVC:C | estimator:__choice__ == 'SVC'
More Specific Hyperparameters#
You'll often want to be a bit more specific with your hyperparameters, here we just
show a few examples of how you'd couple your pipelines a bit more towards ConfigSpace
.
from ConfigSpace import Float, Categorical, Normal
from amltk.pipeline import Searchable
s = Searchable(
space={
"lr": Float("lr", bounds=(1e-5, 1.), log=True, default=0.3),
"balance": Float("balance", bounds=(-1.0, 1.0), distribution=Normal(0.0, 0.5)),
"color": Categorical("color", ["red", "green", "blue"], weights=[2, 1, 1], default="blue"),
},
name="Something-To-Search",
)
print(s.search_space("configspace"))
Configuration space object:
Hyperparameters:
Something-To-Search:balance, Type: NormalFloat, Mu: 0.0, Sigma: 0.5, Range: [-1.0, 1.0], Default: 0.0
Something-To-Search:color, Type: Categorical, Choices: {red, green, blue}, Default: blue, Probabilities: [0.5 0.25 0.25]
Something-To-Search:lr, Type: UniformFloat, Range: [1e-05, 1.0], Default: 0.3, on log-scale
Conditional ands Advanced Usage#
We will refer you to the
ConfigSpace documentation for the construction
of these. However once you've constructed a ConfigurationSpace
and added any forbiddens and
conditionals, you may simply set that as the .space
attribute.
from amltk.pipeline import Component, Choice, Sequential
from ConfigSpace import ConfigurationSpace, EqualsCondition, InCondition
myspace = ConfigurationSpace({"A": ["red", "green", "blue"], "B": (1, 10), "C": (-100.0, 0.0)})
myspace.add_conditions([
EqualsCondition(myspace["B"], myspace["A"], "red"), # B is active when A is red
InCondition(myspace["C"], myspace["A"], ["green", "blue"]), # C is active when A is green or blue
])
component = Component(object, space=myspace, name="MyThing")
parsed_space = component.search_space("configspace")
print(parsed_space)
Configuration space object:
Hyperparameters:
MyThing:A, Type: Categorical, Choices: {red, green, blue}, Default: red
MyThing:B, Type: UniformInteger, Range: [1, 10], Default: 6
MyThing:C, Type: UniformFloat, Range: [-100.0, 0.0], Default: -50.0
Conditions:
MyThing:B | MyThing:A == 'red'
MyThing:C | MyThing:A in {'green', 'blue'}
parser
#
parser(
node: Node,
*,
seed: int | None = None,
flat: bool = False,
conditionals: bool = True,
delim: str = ":"
) -> ConfigurationSpace
Parse a Node and its children into a ConfigurationSpace.
PARAMETER | DESCRIPTION |
---|---|
node |
The Node to parse
TYPE:
|
seed |
The seed to use for the ConfigurationSpace
TYPE:
|
flat |
Whether to have a heirarchical naming scheme for nodes and their children.
TYPE:
|
conditionals |
Whether to include conditionals in the space from a
TYPE:
|
delim |
The delimiter to use for the names of the hyperparameters
TYPE:
|