Optimizers

Optimizers#

An Optimizer's goal is to achieve the optimal value for a given Metric or Metrics using repeated Trials.

What differentiates AMLTK from other optimization libraries is that we rely solely on optimizers that support an "Ask-and-Tell" interface. This means we can "Ask" and optimizer for its next suggested Trial, and we can "Tell" it a Report when we have one. In fact, here's the required interface.

class Optimizer:

    def tell(self, report: Trial.Report) -> None: ...

    def ask(self) -> Trial: ...

Now we do require optimizers to implement these ask() and tell() methods, correctly filling in a Trial with appropriate parsing out results from the Report, as this will be different for every optimizer.

Why only Ask and Tell Optimizers?

Easy Parallelization: Many optimizers handle running the function to optimize and hence roll out their own parallelization schemes and store data in all various different ways. By taking this repsonsiblity away from an optimzer and giving it to the user, we can easily parallelize how we wish
API maintenance: Many optimziers are research code and hence a bit unstable with resepct to their API so wrapping around them can be difficult. By requiring this "Ask-and-Tell" interface, we reduce the complexity of what is required of both the "Optimizer" and wrapping it.
Full Integration: We can fully hook into the life cycle of a running optimizer. We are not relying on the optimizer to support callbacks at every step of their hot-loop and as such, we can fully leverage all the other systems of AutoML-toolkit
Easy Integration: it makes developing and integrating new optimizers easy. You only have to worry that the internal state of the optimizer is updated accordingly to these two "Ask" and "Tell" events and that's it.

For a reference on implementing an optimizer you can refer to any of the following:

SMAC#

The SMACOptimizer, is a wrapper around the smac optimizer.

Requirements

This requires smac which can be installed with:

pip install amltk[smac]

# Or directly
pip install smac

This uses ConfigSpace as its search_space() to optimize.

Users should report results using trial.success().

Visit their documentation for what you can pass to SMACOptimizer.create().

The below example shows how you can use SMAC to optimize an sklearn pipeline.

from __future__ import annotations

import logging

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from amltk.optimization.optimizers.smac import SMACOptimizer
from amltk.scheduling import Scheduler
from amltk.optimization import History, Trial, Metric
from amltk.pipeline import Component, Node

logging.basicConfig(level=logging.INFO)


def target_function(trial: Trial, pipeline: Node) -> Trial.Report:
    X, y = load_iris(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    clf = pipeline.configure(trial.config).build("sklearn")

    with trial.begin():
        clf.fit(X_train, y_train)
        y_pred = clf.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        return trial.success(accuracy=accuracy)

    return trial.fail()

pipeline = Component(RandomForestClassifier, space={"n_estimators": (10, 100), "max_samples": (0.1, 0.9)})

metric = Metric("accuracy", minimize=False, bounds=(0, 1))
optimizer = SMACOptimizer.create(space=pipeline, metrics=metric, bucket="smac-doc-example")

N_WORKERS = 2
scheduler = Scheduler.with_processes(N_WORKERS)
task = scheduler.task(target_function)

history = History()

@scheduler.on_start(repeat=N_WORKERS)
def on_start():
    trial = optimizer.ask()
    task.submit(trial, pipeline)

@task.on_result
def tell_and_launch_trial(_, report: Trial.Report):
    if scheduler.running():
        optimizer.tell(report)
        trial = optimizer.ask()
        task.submit(trial, pipeline)

@task.on_result
def add_to_history(_, report: Trial.Report):
    history.add(report)

scheduler.run(timeout=3, wait=False)

print(history.df())

                                                     status  ...  time:unit
name                                                         ...           
config_id=2_seed=1740526444_budget=None_instanc...  success  ...    seconds
config_id=3_seed=1740526444_budget=None_instanc...  success  ...    seconds
config_id=1_seed=1740526444_budget=None_instanc...  success  ...    seconds
config_id=4_seed=1740526444_budget=None_instanc...  success  ...    seconds
config_id=5_seed=1740526444_budget=None_instanc...  success  ...    seconds
config_id=6_seed=1740526444_budget=None_instanc...  success  ...    seconds
config_id=8_seed=1740526444_budget=None_instanc...  success  ...    seconds
config_id=7_seed=1740526444_budget=None_instanc...  success  ...    seconds
config_id=9_seed=1740526444_budget=None_instanc...  success  ...    seconds
config_id=10_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=12_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=11_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=14_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=13_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=16_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=15_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=18_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=17_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=20_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=19_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=21_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=22_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=23_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=24_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=25_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=26_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=27_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=28_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=29_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=30_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=31_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=32_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=33_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=34_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=35_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=36_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=37_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=38_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=39_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=40_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=41_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=43_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=42_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=44_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=45_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=46_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=47_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=48_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=49_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=50_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=51_seed=1740526444_budget=None_instan...  success  ...    seconds
config_id=52_seed=1740526444_budget=None_instan...  success  ...    seconds

[52 rows x 20 columns]

NePs#

The NEPSOptimizer, is a wrapper around the NePs optimizer.

Requirements

This requires smac which can be installed with:

pip install amltk[neps]

# Or directly
pip install neural-pipeline-search

NePs is still in development

NePs is still in development and is not yet stable. There are likely going to be issues. Please report any issues to NePs or in AMLTK.

This uses ConfigSpace as its search_space() to optimize.

Users should report results using trial.success(loss=...) where loss= is a scaler value to minimize. Optionally, you can also return a cost= which is used for more budget aware algorithms. Again, please see NeP's documentation for more.

Conditionals in ConfigSpace

NePs does not support conditionals in its search space. This is account for when using the preferred_parser(). during search space creation. In this case, it will simply remove all conditionals from the search space, which may not be ideal for the given problem at hand.

Visit their documentation for what you can pass to NEPSOptimizer.create().

The below example shows how you can use neps to optimize an sklearn pipeline.

from __future__ import annotations

import logging

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from amltk.optimization.optimizers.neps import NEPSOptimizer
from amltk.scheduling import Scheduler
from amltk.optimization import History, Trial, Metric
from amltk.pipeline import Component

logging.basicConfig(level=logging.INFO)


def target_function(trial: Trial, pipeline: Pipeline) -> Trial.Report:
    X, y = load_iris(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    clf = pipeline.configure(trial.config).build("sklearn")

    with trial.begin():
        clf.fit(X_train, y_train)
        y_pred = clf.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        loss = 1 - accuracy
        return trial.success(loss=loss, accuracy=accuracy)

    return trial.fail()
from amltk._doc import make_picklable; make_picklable(target_function)  # markdown-exec: hide

pipeline = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})

metric = Metric("accuracy", minimize=False, bounds=(0, 1))
optimizer = NEPSOptimizer.create(space=pipeline, metrics=metric, bucket="neps-doc-example")

N_WORKERS = 2
scheduler = Scheduler.with_processes(N_WORKERS)
task = scheduler.task(target_function)

history = History()

@scheduler.on_start(repeat=N_WORKERS)
def on_start():
    trial = optimizer.ask()
    task.submit(trial, pipeline)

@task.on_result
def tell_and_launch_trial(_, report: Trial.Report):
    if scheduler.running():
        optimizer.tell(report)
        trial = optimizer.ask()
        task.submit(trial, pipeline)

@task.on_result
def add_to_history(_, report: Trial.Report):
    history.add(report)

scheduler.run(timeout=3, wait=False)

print(history.df())
optimizer.bucket.rmdir()  # markdown-exec: hide

Deep Learning

Write an example demonstrating NEPS with continuations

Graph Search Spaces

Write an example demonstrating NEPS with its graph search spaces

Optuna#

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning.

Requirements

This requires Optuna which can be installed with:

pip install amltk[optuna]

# Or directly
pip install optuna

We provide a thin wrapper called OptunaOptimizer from which you can integrate Optuna into your workflow.

This uses an Optuna-like search_space() for its optimization.

Users should report results using trial.success() with either cost= or values= depending on any optimization directions given to the underyling optimizer created. Please see their documentation for more.

Visit their documentation for what you can pass to OptunaOptimizer.create(), which is forward to optun.create_study().

from __future__ import annotations

import logging

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from amltk.optimization.optimizers.optuna import OptunaOptimizer
from amltk.scheduling import Scheduler
from amltk.optimization import History, Trial, Metric
from amltk.pipeline import Component

logging.basicConfig(level=logging.INFO)


def target_function(trial: Trial, pipeline: Pipeline) -> Trial.Report:
    X, y = load_iris(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    clf = pipeline.configure(trial.config).build("sklearn")

    with trial.begin():
        clf.fit(X_train, y_train)
        y_pred = clf.predict(X_test)
        return trial.success(accuracy=accuracy_score(y_test, y_pred))

    return trial.fail()

pipeline = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})

accuracy_metric = Metric("accuracy", minimize=False, bounds=(0, 1))
optimizer = OptunaOptimizer.create(space=pipeline, metrics=accuracy_metric, bucket="optuna-doc-example")

N_WORKERS = 2
scheduler = Scheduler.with_processes(N_WORKERS)
task = scheduler.task(target_function)

history = History()

@scheduler.on_start(repeat=N_WORKERS)
def on_start():
    trial = optimizer.ask()
    task.submit(trial, pipeline)

@task.on_result
def tell_and_launch_trial(_, report: Trial.Report):
    if scheduler.running():
        optimizer.tell(report)
        trial = optimizer.ask()
        task.submit(trial, pipeline)


@task.on_result
def add_to_history(_, report: Trial.Report):
    history.add(report)

scheduler.run(timeout=3, wait=False)

print(history.df())

                  status  trial_seed  ... time:kind time:unit
name                                  ...                    
trial_number=1   success   220526275  ...      wall   seconds
trial_number=0   success   220526275  ...      wall   seconds
trial_number=3   success   220526275  ...      wall   seconds
trial_number=2   success   220526275  ...      wall   seconds
trial_number=4   success   220526275  ...      wall   seconds
...                  ...         ...  ...       ...       ...
trial_number=75  success   220526275  ...      wall   seconds
trial_number=76  success   220526275  ...      wall   seconds
trial_number=77  success   220526275  ...      wall   seconds
trial_number=79  success   220526275  ...      wall   seconds
trial_number=78  success   220526275  ...      wall   seconds

[80 rows x 19 columns]

Some more documentation

Sorry!

Integrating your own#

The base Optimizer class, defines the API we require optimizers to implement.

ask() - Ask the optimizer for a new Trial to evaluate.
tell() - Tell the optimizer the result of the sampled config. This comes in the form of a Trial.Report.

Additionally, to aid users from switching between optimizers, the preferred_parser() method should return either a parser function or a string that can be used with node.search_space(parser=..._) to extract the search space for the optimizer.