Optimizers
Optimizers#
An Optimizer
's goal is to achieve the optimal
value for a given Metric
or Metrics
using
repeated Trials
.
What differentiates AMLTK from other optimization libraries is that we rely solely
on optimizers that support an "Ask-and-Tell" interface.
This means we can "Ask" and optimizer for its next suggested Trial
,
and we can "Tell" it a Report
when we have one.
In fact, here's the required interface.
Now we do require optimizers to implement these ask()
and tell()
methods, correctly filling
in a Trial
with appropriate parsing out results from
the Report
, as this will be different for every optimizer.
Why only Ask and Tell Optimizers?
-
Easy Parallelization: Many optimizers handle running the function to optimize and hence roll out their own parallelization schemes and store data in all various different ways. By taking this repsonsiblity away from an optimzer and giving it to the user, we can easily parallelize how we wish
-
API maintenance: Many optimziers are research code and hence a bit unstable with resepct to their API so wrapping around them can be difficult. By requiring this "Ask-and-Tell" interface, we reduce the complexity of what is required of both the "Optimizer" and wrapping it.
-
Full Integration: We can fully hook into the life cycle of a running optimizer. We are not relying on the optimizer to support callbacks at every step of their hot-loop and as such, we can fully leverage all the other systems of AutoML-toolkit
-
Easy Integration: it makes developing and integrating new optimizers easy. You only have to worry that the internal state of the optimizer is updated accordingly to these two "Ask" and "Tell" events and that's it.
For a reference on implementing an optimizer you can refer to any of the following:
SMAC#
The SMACOptimizer
,
is a wrapper around the smac
optimizer.
Requirements
This requires smac
which can be installed with:
This uses ConfigSpace
as its search_space()
to
optimize.
Users should report results using
trial.success()
.
Visit their documentation for what you can pass to
SMACOptimizer.create()
.
The below example shows how you can use SMAC to optimize an sklearn pipeline.
from __future__ import annotations
import logging
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from amltk.optimization.optimizers.smac import SMACOptimizer
from amltk.scheduling import Scheduler
from amltk.optimization import History, Trial, Metric
from amltk.pipeline import Component, Node
logging.basicConfig(level=logging.INFO)
def target_function(trial: Trial, pipeline: Node) -> Trial.Report:
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
clf = pipeline.configure(trial.config).build("sklearn")
with trial.begin():
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
return trial.success(accuracy=accuracy)
return trial.fail()
pipeline = Component(RandomForestClassifier, space={"n_estimators": (10, 100), "max_samples": (0.1, 0.9)})
metric = Metric("accuracy", minimize=False, bounds=(0, 1))
optimizer = SMACOptimizer.create(space=pipeline, metrics=metric, bucket="smac-doc-example")
N_WORKERS = 2
scheduler = Scheduler.with_processes(N_WORKERS)
task = scheduler.task(target_function)
history = History()
@scheduler.on_start(repeat=N_WORKERS)
def on_start():
trial = optimizer.ask()
task.submit(trial, pipeline)
@task.on_result
def tell_and_launch_trial(_, report: Trial.Report):
if scheduler.running():
optimizer.tell(report)
trial = optimizer.ask()
task.submit(trial, pipeline)
@task.on_result
def add_to_history(_, report: Trial.Report):
history.add(report)
scheduler.run(timeout=3, wait=False)
print(history.df())
status ... time:unit
name ...
config_id=2_seed=1740526444_budget=None_instanc... success ... seconds
config_id=3_seed=1740526444_budget=None_instanc... success ... seconds
config_id=1_seed=1740526444_budget=None_instanc... success ... seconds
config_id=4_seed=1740526444_budget=None_instanc... success ... seconds
config_id=5_seed=1740526444_budget=None_instanc... success ... seconds
config_id=6_seed=1740526444_budget=None_instanc... success ... seconds
config_id=8_seed=1740526444_budget=None_instanc... success ... seconds
config_id=7_seed=1740526444_budget=None_instanc... success ... seconds
config_id=9_seed=1740526444_budget=None_instanc... success ... seconds
config_id=10_seed=1740526444_budget=None_instan... success ... seconds
config_id=12_seed=1740526444_budget=None_instan... success ... seconds
config_id=11_seed=1740526444_budget=None_instan... success ... seconds
config_id=14_seed=1740526444_budget=None_instan... success ... seconds
config_id=13_seed=1740526444_budget=None_instan... success ... seconds
config_id=16_seed=1740526444_budget=None_instan... success ... seconds
config_id=15_seed=1740526444_budget=None_instan... success ... seconds
config_id=18_seed=1740526444_budget=None_instan... success ... seconds
config_id=17_seed=1740526444_budget=None_instan... success ... seconds
config_id=20_seed=1740526444_budget=None_instan... success ... seconds
config_id=19_seed=1740526444_budget=None_instan... success ... seconds
config_id=21_seed=1740526444_budget=None_instan... success ... seconds
config_id=22_seed=1740526444_budget=None_instan... success ... seconds
config_id=23_seed=1740526444_budget=None_instan... success ... seconds
config_id=24_seed=1740526444_budget=None_instan... success ... seconds
config_id=25_seed=1740526444_budget=None_instan... success ... seconds
config_id=26_seed=1740526444_budget=None_instan... success ... seconds
config_id=27_seed=1740526444_budget=None_instan... success ... seconds
config_id=28_seed=1740526444_budget=None_instan... success ... seconds
config_id=29_seed=1740526444_budget=None_instan... success ... seconds
config_id=30_seed=1740526444_budget=None_instan... success ... seconds
config_id=31_seed=1740526444_budget=None_instan... success ... seconds
config_id=32_seed=1740526444_budget=None_instan... success ... seconds
config_id=33_seed=1740526444_budget=None_instan... success ... seconds
config_id=34_seed=1740526444_budget=None_instan... success ... seconds
config_id=35_seed=1740526444_budget=None_instan... success ... seconds
config_id=36_seed=1740526444_budget=None_instan... success ... seconds
config_id=37_seed=1740526444_budget=None_instan... success ... seconds
config_id=38_seed=1740526444_budget=None_instan... success ... seconds
config_id=39_seed=1740526444_budget=None_instan... success ... seconds
config_id=40_seed=1740526444_budget=None_instan... success ... seconds
config_id=41_seed=1740526444_budget=None_instan... success ... seconds
config_id=43_seed=1740526444_budget=None_instan... success ... seconds
config_id=42_seed=1740526444_budget=None_instan... success ... seconds
config_id=44_seed=1740526444_budget=None_instan... success ... seconds
config_id=45_seed=1740526444_budget=None_instan... success ... seconds
config_id=46_seed=1740526444_budget=None_instan... success ... seconds
config_id=47_seed=1740526444_budget=None_instan... success ... seconds
config_id=48_seed=1740526444_budget=None_instan... success ... seconds
config_id=49_seed=1740526444_budget=None_instan... success ... seconds
config_id=50_seed=1740526444_budget=None_instan... success ... seconds
config_id=51_seed=1740526444_budget=None_instan... success ... seconds
config_id=52_seed=1740526444_budget=None_instan... success ... seconds
[52 rows x 20 columns]
NePs#
The NEPSOptimizer
,
is a wrapper around the NePs
optimizer.
Requirements
This requires smac
which can be installed with:
NePs is still in development
NePs is still in development and is not yet stable. There are likely going to be issues. Please report any issues to NePs or in AMLTK.
This uses ConfigSpace
as its search_space()
to
optimize.
Users should report results using
trial.success(loss=...)
where loss=
is a scaler value to minimize. Optionally,
you can also return a cost=
which is used for more budget aware algorithms.
Again, please see NeP's documentation for more.
Conditionals in ConfigSpace
NePs does not support conditionals in its search space. This is account
for when using the
preferred_parser()
.
during search space creation. In this case, it will simply remove all conditionals
from the search space, which may not be ideal for the given problem at hand.
Visit their documentation for what you can pass to
NEPSOptimizer.create()
.
The below example shows how you can use neps to optimize an sklearn pipeline.
from __future__ import annotations
import logging
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from amltk.optimization.optimizers.neps import NEPSOptimizer
from amltk.scheduling import Scheduler
from amltk.optimization import History, Trial, Metric
from amltk.pipeline import Component
logging.basicConfig(level=logging.INFO)
def target_function(trial: Trial, pipeline: Pipeline) -> Trial.Report:
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
clf = pipeline.configure(trial.config).build("sklearn")
with trial.begin():
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
loss = 1 - accuracy
return trial.success(loss=loss, accuracy=accuracy)
return trial.fail()
from amltk._doc import make_picklable; make_picklable(target_function) # markdown-exec: hide
pipeline = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
metric = Metric("accuracy", minimize=False, bounds=(0, 1))
optimizer = NEPSOptimizer.create(space=pipeline, metrics=metric, bucket="neps-doc-example")
N_WORKERS = 2
scheduler = Scheduler.with_processes(N_WORKERS)
task = scheduler.task(target_function)
history = History()
@scheduler.on_start(repeat=N_WORKERS)
def on_start():
trial = optimizer.ask()
task.submit(trial, pipeline)
@task.on_result
def tell_and_launch_trial(_, report: Trial.Report):
if scheduler.running():
optimizer.tell(report)
trial = optimizer.ask()
task.submit(trial, pipeline)
@task.on_result
def add_to_history(_, report: Trial.Report):
history.add(report)
scheduler.run(timeout=3, wait=False)
print(history.df())
optimizer.bucket.rmdir() # markdown-exec: hide
Deep Learning
Write an example demonstrating NEPS with continuations
Graph Search Spaces
Write an example demonstrating NEPS with its graph search spaces
Optuna#
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning.
Requirements
This requires Optuna
which can be installed with:
We provide a thin wrapper called
OptunaOptimizer
from which
you can integrate Optuna
into your workflow.
This uses an Optuna-like search_space()
for
its optimization.
Users should report results using
trial.success()
with either cost=
or values=
depending on any optimization directions
given to the underyling optimizer created. Please see their documentation
for more.
Visit their documentation for what you can pass to
OptunaOptimizer.create()
,
which is forward to optun.create_study()
.
from __future__ import annotations
import logging
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from amltk.optimization.optimizers.optuna import OptunaOptimizer
from amltk.scheduling import Scheduler
from amltk.optimization import History, Trial, Metric
from amltk.pipeline import Component
logging.basicConfig(level=logging.INFO)
def target_function(trial: Trial, pipeline: Pipeline) -> Trial.Report:
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
clf = pipeline.configure(trial.config).build("sklearn")
with trial.begin():
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
return trial.success(accuracy=accuracy_score(y_test, y_pred))
return trial.fail()
pipeline = Component(RandomForestClassifier, space={"n_estimators": (10, 100)})
accuracy_metric = Metric("accuracy", minimize=False, bounds=(0, 1))
optimizer = OptunaOptimizer.create(space=pipeline, metrics=accuracy_metric, bucket="optuna-doc-example")
N_WORKERS = 2
scheduler = Scheduler.with_processes(N_WORKERS)
task = scheduler.task(target_function)
history = History()
@scheduler.on_start(repeat=N_WORKERS)
def on_start():
trial = optimizer.ask()
task.submit(trial, pipeline)
@task.on_result
def tell_and_launch_trial(_, report: Trial.Report):
if scheduler.running():
optimizer.tell(report)
trial = optimizer.ask()
task.submit(trial, pipeline)
@task.on_result
def add_to_history(_, report: Trial.Report):
history.add(report)
scheduler.run(timeout=3, wait=False)
print(history.df())
status trial_seed ... time:kind time:unit
name ...
trial_number=1 success 220526275 ... wall seconds
trial_number=0 success 220526275 ... wall seconds
trial_number=3 success 220526275 ... wall seconds
trial_number=2 success 220526275 ... wall seconds
trial_number=4 success 220526275 ... wall seconds
... ... ... ... ... ...
trial_number=75 success 220526275 ... wall seconds
trial_number=76 success 220526275 ... wall seconds
trial_number=77 success 220526275 ... wall seconds
trial_number=79 success 220526275 ... wall seconds
trial_number=78 success 220526275 ... wall seconds
[80 rows x 19 columns]
Some more documentation
Sorry!
Integrating your own#
The base Optimizer
class,
defines the API we require optimizers to implement.
ask()
- Ask the optimizer for a newTrial
to evaluate.tell()
- Tell the optimizer the result of the sampled config. This comes in the form of aTrial.Report
.
Additionally, to aid users from switching between optimizers, the
preferred_parser()
method should return either a parser
function or a string that can be used
with node.search_space(parser=..._)
to
extract the search space for the optimizer.