Optuna

Optuna parser for parsing out a search_space(). from a pipeline.

Requirements

This requires Optuna which can be installed with:

pip install amltk[optuna]

# Or directly
pip install optuna

Limitations

Optuna feature a very dynamic search space (define-by-run), where people typically sample from some trial object and use traditional python control flow to define conditionality.

This means we can not trivially represent this conditionality in a static search space. While band-aids are possible, it naturally does not sit well with the static output of a parser.

As such, our parser does not support conditionals or choices!. Users may still use the define-by-run within their optimization function itself.

If you have experience with Optuna and have any suggestions, please feel free to open an issue or PR on GitHub!

Usage#

The typical way to represent a search space for Optuna is just to use a dictionary, where the keys are the names of the hyperparameters and the values are either integer/float tuples indicating boundaries or some discrete set of values. It is possible to have the value directly be a BaseDistribution, an optuna type, when you need to customize the distribution more.

from amltk.pipeline import Component
from optuna.distributions import FloatDistribution

c = Component(
    object,
    space={
        "myint": (1, 10),
        "myfloat": (1.0, 10.0),
        "mycategorical": ["a", "b", "c"],
        "log-scale-custom": FloatDistribution(1e-10, 1e-2, log=True),
    },
    name="name",
)

space = c.search_space(parser="optuna")

{
    'name:myint': IntDistribution(high=10, log=False, low=1, step=1),
    'name:myfloat': FloatDistribution(high=10.0, log=False, low=1.0, step=None),
    'name:mycategorical': CategoricalDistribution(choices=('a', 'b', 'c')),
    'name:log-scale-custom': FloatDistribution(high=0.01, log=True, low=1e-10, 
step=None)
}

You may also just pass the parser= function directly if preferred

from amltk.pipeline.parsers.optuna import parser as optuna_parser

space = c.search_space(parser=optuna_parser)

{
    'name:myint': IntDistribution(high=10, log=False, low=1, step=1),
    'name:myfloat': FloatDistribution(high=10.0, log=False, low=1.0, step=None),
    'name:mycategorical': CategoricalDistribution(choices=('a', 'b', 'c')),
    'name:log-scale-custom': FloatDistribution(high=0.01, log=True, low=1e-10, 
step=None)
}

When using search_space() on a some nested structures, you may want to flatten the names of the hyperparameters. For this you can use flat=

from amltk.pipeline import Searchable, Sequential

seq = Sequential(
    Searchable({"myint": (1, 10)}, name="nested_1"),
    Searchable({"myfloat": (1.0, 10.0)}, name="nested_2"),
    name="seq"
)

hierarchical_space = seq.search_space(parser="optuna", flat=False)  # Default

flat_space = seq.search_space(parser="optuna", flat=False)  # Default

{
    'seq:nested_1:myint': IntDistribution(high=10, log=False, low=1, step=1),
    'seq:nested_2:myfloat': FloatDistribution(high=10.0, log=False, low=1.0, 
step=None)
}

{
    'seq:nested_1:myint': IntDistribution(high=10, log=False, low=1, step=1),
    'seq:nested_2:myfloat': FloatDistribution(high=10.0, log=False, low=1.0, 
step=None)
}

`def parser(node, *, flat=False, conditionals=False, delim=':')` #

Parse a Node and its children into a ConfigurationSpace.

PARAMETER	DESCRIPTION
`node`	The Node to parse TYPE: `Node`
`flat`	Whether to have a hierarchical naming scheme for nodes and their children. TYPE: `bool` DEFAULT: `False`
`conditionals`	Whether to include conditionals in the space from a `Choice`. If this is `False`, this will also remove all forbidden clauses and other conditional clauses. The primary use of this functionality is that some optimizers do not support these features. Not yet supported This functionality is not yet supported as we can't encode this into a static Optuna search space. TYPE: `bool` DEFAULT: `False`
`delim`	The delimiter to use for the names of the hyperparameters. TYPE: `str` DEFAULT: `':'`

Source code in src/amltk/pipeline/parsers/optuna.py

def parser(
    node: Node,
    *,
    flat: bool = False,
    conditionals: bool = False,
    delim: str = ":",
) -> OptunaSearchSpace:
    """Parse a Node and its children into a ConfigurationSpace.

    Args:
        node: The Node to parse
        flat: Whether to have a hierarchical naming scheme for nodes and their children.
        conditionals: Whether to include conditionals in the space from a
            [`Choice`][amltk.pipeline.Choice]. If this is `False`, this will
            also remove all forbidden clauses and other conditional clauses.
            The primary use of this functionality is that some optimizers do not
            support these features.

            !!! TODO "Not yet supported"

                This functionality is not yet supported as we can't encode this into
                a static Optuna search space.

        delim: The delimiter to use for the names of the hyperparameters.
    """
    if conditionals:
        raise NotImplementedError("Conditionals are not yet supported with Optuna.")

    space = prefix_keys(_parse_space(node), prefix=f"{node.name}{delim}")

    for child in node.nodes:
        subspace = parser(child, flat=flat, conditionals=conditionals, delim=delim)
        if not flat:
            subspace = prefix_keys(subspace, prefix=f"{node.name}{delim}")

        for name, hp in subspace.items():
            if name in space:
                raise ValueError(
                    f"Duplicate name {name} already in space from space of {node.name}",
                    f"\nCurrently parsed space: {space}",
                )
            space[name] = hp

    return space

Optuna

Usage#

def parser(node, *, flat=False, conditionals=False, delim=':') #

`def parser(node, *, flat=False, conditionals=False, delim=':')` #