Trials
Trial#
A Trial
is
typically the output of
Optimizer.ask()
, indicating
what the optimizer would like to evaluate next. We provide a host
of convenience methods attached to the Trial
to make it easy to
save results, store artifacts, and more.
Paired with the Trial
is the Trial.Report
,
class, providing an easy way to report back to the optimizer's
tell()
with
a simple trial.success(cost=...)
or
trial.fail(cost=...)
call..
Trial#
Bases: RichRenderable
, Generic[I]
A Trial
encapsulates some configuration
that needs to be evaluated. Typically this is what is generated by an
Optimizer.ask()
call.
Usage
To begin a trial, you can use the
trial.begin()
, which will catch
exceptions/traceback and profile the block of code.
If all went smooth, your trial was successful and you can use
trial.success()
to generate
a success Report
, typically
passing what your chosen optimizer expects, e.g. "loss"
or "cost"
.
If your trial failed, you can instead use the
trial.fail()
to generate a
failure Report
, where
any caught exception will be attached to it. Each
Optimizer
will take care of what to do
from here.
from amltk.optimization import Trial, Metric
from amltk.store import PathBucket
cost = Metric("cost", minimize=True)
def target_function(trial: Trial) -> Trial.Report:
x = trial.config["x"]
y = trial.config["y"]
with trial.begin():
cost = x**2 - y
if trial.exception:
return trial.fail()
return trial.success(cost=cost)
# ... usually obtained from an optimizer
trial = Trial(name="some-unique-name", config={"x": 1, "y": 2}, metrics=[cost])
report = target_function(trial)
print(report.df())
status trial_seed ... time:kind time:unit
name ...
some-unique-name success
What you can return with trial.success()
or trial.fail()
depends on the
metrics
of the trial. Typically
an optimizer will provide the trial with the list of metrics.
Metrics
A metric with a given name, optimal direction, and possible bounds.
Some important properties is that they have a unique
.name
given the optimization run,
a candidate .config
' to evaluate,
a possible .seed
to use,
and an .info
object which is the optimizer
specific information, if required by you.
If using Plugins
, they may insert
some extra objects in the .extra
dict.
To profile your trial, you can wrap the logic you'd like to check with
trial.begin()
, which will automatically
catch any errors, record the traceback, and profile the block of code, in
terms of time and memory.
You can access the profiled time and memory using the
.time
and
.memory
attributes.
If you've profile()
'ed any other intervals,
you can access them by name through
trial.profiles
.
Please see the Profiler
for more.
Profiling with a trial.
You can also record anything you'd like into the
.summary
, a plain dict
or use trial.store()
to store artifacts
related to the trial.
What to put in .summary
?
For large items, e.g. predictions or models, these are highly advised to
.store()
to disk, especially if using
a Task
for multiprocessing.
Further, if serializing the report using the
report.df()
,
returning a single row,
or a History
with history.df()
for a dataframe consisting
of many of the reports, then you'd likely only want to store things
that are scalar and can be serialised to disk by a pandas DataFrame.
Report#
Bases: RichRenderable
, Generic[I2]
The Trial.Report
encapsulates
a Trial
, its status and any metrics/exceptions
that may have occured.
Typically you will not create these yourself, but instead use
trial.success()
or
trial.fail()
to generate them.
from amltk.optimization import Trial, Metric
loss = Metric("loss", minimize=True)
trial = Trial(name="trial", config={"x": 1}, metrics=[loss])
with trial.begin():
# Do some work
# ...
report: Trial.Report = trial.success(loss=1)
print(report.df())
These reports are used to report back metrics to an
Optimizer
with Optimizer.tell()
but can also be
stored for your own uses.
You can access the original trial with the
.trial
attribute, and the
Status
of the trial with the
.status
attribute.
You may also want to check out the History
class
for storing a collection of Report
s, allowing for an easier time to convert
them to a dataframe or perform some common Hyperparameter optimization parsing
of metrics.
class Trial
dataclass
#
Bases: RichRenderable
, Generic[I]
A Trial
encapsulates some configuration
that needs to be evaluated. Typically this is what is generated by an
Optimizer.ask()
call.
Usage
To begin a trial, you can use the
trial.begin()
, which will catch
exceptions/traceback and profile the block of code.
If all went smooth, your trial was successful and you can use
trial.success()
to generate
a success Report
, typically
passing what your chosen optimizer expects, e.g. "loss"
or "cost"
.
If your trial failed, you can instead use the
trial.fail()
to generate a
failure Report
, where
any caught exception will be attached to it. Each
Optimizer
will take care of what to do
from here.
from amltk.optimization import Trial, Metric
from amltk.store import PathBucket
cost = Metric("cost", minimize=True)
def target_function(trial: Trial) -> Trial.Report:
x = trial.config["x"]
y = trial.config["y"]
with trial.begin():
cost = x**2 - y
if trial.exception:
return trial.fail()
return trial.success(cost=cost)
# ... usually obtained from an optimizer
trial = Trial(name="some-unique-name", config={"x": 1, "y": 2}, metrics=[cost])
report = target_function(trial)
print(report.df())
status trial_seed ... time:kind time:unit
name ...
some-unique-name success
What you can return with trial.success()
or trial.fail()
depends on the
metrics
of the trial. Typically
an optimizer will provide the trial with the list of metrics.
Metrics
A metric with a given name, optimal direction, and possible bounds.
Some important properties is that they have a unique
.name
given the optimization run,
a candidate .config
' to evaluate,
a possible .seed
to use,
and an .info
object which is the optimizer
specific information, if required by you.
If using Plugins
, they may insert
some extra objects in the .extra
dict.
To profile your trial, you can wrap the logic you'd like to check with
trial.begin()
, which will automatically
catch any errors, record the traceback, and profile the block of code, in
terms of time and memory.
You can access the profiled time and memory using the
.time
and
.memory
attributes.
If you've profile()
'ed any other intervals,
you can access them by name through
trial.profiles
.
Please see the Profiler
for more.
Profiling with a trial.
You can also record anything you'd like into the
.summary
, a plain dict
or use trial.store()
to store artifacts
related to the trial.
What to put in .summary
?
For large items, e.g. predictions or models, these are highly advised to
.store()
to disk, especially if using
a Task
for multiprocessing.
Further, if serializing the report using the
report.df()
,
returning a single row,
or a History
with history.df()
for a dataframe consisting
of many of the reports, then you'd likely only want to store things
that are scalar and can be serialised to disk by a pandas DataFrame.
name: str
attr
#
The unique name of the trial.
config: Mapping[str, Any]
attr
#
The config of the trial provided by the optimizer.
bucket: PathBucket
classvar
attr
#
The bucket to store trial related output to.
info: I | None
classvar
attr
#
The info of the trial provided by the optimizer.
metrics: Sequence[Metric]
classvar
attr
#
The metrics associated with the trial.
seed: int | None
classvar
attr
#
The seed to use if suggested by the optimizer.
fidelities: dict[str, Any] | None
classvar
attr
#
The fidelities at which to evaluate the trial, if any.
time: Timer.Interval
classvar
attr
#
The time taken by the trial, once ended.
memory: Memory.Interval
classvar
attr
#
The memory used by the trial, once ended.
profiler: Profiler
classvar
attr
#
A profiler for this trial.
summary: dict[str, Any]
classvar
attr
#
The summary of the trial. These are for summary statistics of a trial and are single values.
exception: BaseException | None
classvar
attr
#
The exception raised by the trial, if any.
traceback: str | None
classvar
attr
#
The traceback of the exception, if any.
storage: set[Any]
classvar
attr
#
Anything stored in the trial, the elements of the list are keys that can be used to retrieve them later, such as a Path.
extras: dict[str, Any]
classvar
attr
#
Any extras attached to the trial.
profiles: Mapping[str, Profile.Interval]
prop
#
The profiles of the trial.
class Status
#
class Report
dataclass
#
Bases: RichRenderable
, Generic[I2]
The Trial.Report
encapsulates
a Trial
, its status and any metrics/exceptions
that may have occured.
Typically you will not create these yourself, but instead use
trial.success()
or
trial.fail()
to generate them.
from amltk.optimization import Trial, Metric
loss = Metric("loss", minimize=True)
trial = Trial(name="trial", config={"x": 1}, metrics=[loss])
with trial.begin():
# Do some work
# ...
report: Trial.Report = trial.success(loss=1)
print(report.df())
These reports are used to report back metrics to an
Optimizer
with Optimizer.tell()
but can also be
stored for your own uses.
You can access the original trial with the
.trial
attribute, and the
Status
of the trial with the
.status
attribute.
You may also want to check out the History
class
for storing a collection of Report
s, allowing for an easier time to convert
them to a dataframe or perform some common Hyperparameter optimization parsing
of metrics.
trial: Trial[I2]
attr
#
The trial that was run.
status: Trial.Status
attr
#
The status of the trial.
metrics: dict[str, float]
classvar
attr
#
The metric values of the trial.
metric_values: tuple[Metric.Value, ...]
classvar
attr
#
The metrics of the trial, linked to the metrics.
metric_defs: dict[str, Metric]
classvar
attr
#
A lookup to the metric definitions
metric_names: tuple[str, ...]
classvar
attr
#
The names of the metrics.
exception: BaseException | None
prop
#
The exception of the trial, if any.
traceback: str | None
prop
#
The traceback of the trial, if any.
name: str
prop
#
The name of the trial.
config: Mapping[str, Any]
prop
#
The config of the trial.
profiles: Mapping[str, Profile.Interval]
prop
#
The profiles of the trial.
summary: dict[str, Any]
prop
#
The summary of the trial.
storage: set[str]
prop
#
The storage of the trial.
time: Timer.Interval
prop
#
The time of the trial.
memory: Memory.Interval
prop
#
The memory of the trial.
bucket: PathBucket
prop
#
The bucket attached to the trial.
info: I2 | None
prop
#
The info of the trial, specific to the optimizer that issued it.
def df(*, profiles=True, configs=True, summary=True, metrics=True)
#
Get a dataframe of the trial.
Prefixes
summary
: Entries will be prefixed with"summary:"
config
: Entries will be prefixed with"config:"
storage
: Entries will be prefixed with"storage:"
metrics
: Entries will be prefixed with"metrics:"
profile:<name>
: Entries will be prefixed with"profile:<name>:"
PARAMETER | DESCRIPTION |
---|---|
profiles |
Whether to include the profiles.
TYPE:
|
configs |
Whether to include the configs.
TYPE:
|
summary |
Whether to include the summary.
TYPE:
|
metrics |
Whether to include the metrics.
TYPE:
|
Source code in src/amltk/optimization/trial.py
def retrieve(key, *, where=None, check=None)
#
Retrieve items related to the trial.
Same argument for where=
Use the same argument for where=
as you did for store()
.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
with trial.begin():
report = trial.success()
config = report.retrieve("config.json")
print(config)
You could also create a Bucket and use that instead.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
with trial.begin():
report = trial.success()
config = report.retrieve("config.json")
print(config)
PARAMETER | DESCRIPTION |
---|---|
key |
The key of the item to retrieve as said in
TYPE:
|
check |
If provided, will check that the retrieved item is of the
provided type. If not, will raise a
TYPE:
|
where |
Where to retrieve the items from.
|
RETURNS | DESCRIPTION |
---|---|
R | Any
|
The retrieved item. |
RAISES | DESCRIPTION |
---|---|
TypeError
|
If |
Source code in src/amltk/optimization/trial.py
1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 |
|
def store(items, *, where=None)
#
def from_df(df)
classmethod
#
Create a report from a dataframe.
See Also
Source code in src/amltk/optimization/trial.py
def from_dict(d)
classmethod
#
Create a report from a dictionary.
Prefixes
Please see .df()
for information on what the prefixes should be for certain fields.
PARAMETER | DESCRIPTION |
---|---|
d |
The dictionary to create the report from. |
RETURNS | DESCRIPTION |
---|---|
Report
|
The created report. |
Source code in src/amltk/optimization/trial.py
1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 |
|
def rich_renderables()
#
The renderables for rich for this report.
Source code in src/amltk/optimization/trial.py
def begin(time=None, memory_unit=None)
#
Begin the trial with a contextmanager
.
Will begin timing the trial in the with
block, attaching the profiled time and memory
to the trial once completed, under .profile.time
and .profile.memory
attributes.
If an exception is raised, it will be attached to the trial under .exception
with the traceback attached to the actual error message, such that it can
be pickled and sent back to the main process loop.
from amltk.optimization import Trial
trial = Trial(name="trial", config={"x": 1})
with trial.begin():
# Do some work
pass
print(trial.memory)
print(trial.time)
from amltk.optimization import Trial
trial = Trial(name="trial", config={"x": -1})
with trial.begin():
raise ValueError("x must be positive")
print(trial.exception)
print(trial.traceback)
print(trial.memory)
print(trial.time)
x must be positive
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/amltk/optimization/trial.py", line 301, in begin
yield
File "<code block: n173; title begin-fail>", line 6, in <module>
ValueError: x must be positive
Memory.Interval(start_vms=1993273344.0, start_rss=414703616.0, end_vms=1993273344, end_rss=414703616, unit=bytes)
Timer.Interval(start=1704900066.335906, end=1704900066.3360069, kind=wall, unit=seconds)
PARAMETER | DESCRIPTION |
---|---|
time |
The timer kind to use for the trial. Defaults to the default timer kind of the profiler.
TYPE:
|
memory_unit |
The memory unit to use for the trial. Defaults to the default memory unit of the profiler.
TYPE:
|
Source code in src/amltk/optimization/trial.py
def profile(name, *, time=None, memory_unit=None, summary=False)
#
Measure some interval in the trial.
The results of the profiling will be available in the .summary
attribute
with the name of the interval as the key.
from amltk.optimization import Trial
import time
trial = Trial(name="trial", config={"x": 1})
with trial.profile("some_interval"):
# Do some work
time.sleep(1)
print(trial.profiler["some_interval"].time)
PARAMETER | DESCRIPTION |
---|---|
name |
The name of the interval.
TYPE:
|
time |
The timer kind to use for the trial. Defaults to the default timer kind of the profiler.
TYPE:
|
memory_unit |
The memory unit to use for the trial. Defaults to the default memory unit of the profiler.
TYPE:
|
summary |
Whether to add the interval to the summary.
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
Iterator[None]
|
The interval measured. Values will be nan until the with block is finished. |
Source code in src/amltk/optimization/trial.py
def success(**metrics)
#
Generate a success report.
from amltk.optimization import Trial, Metric
loss_metric = Metric("loss", minimize=True)
trial = Trial(name="trial", config={"x": 1}, metrics=[loss_metric])
with trial.begin():
# Do some work
report = trial.success(loss=1)
print(report)
Trial.Report(trial=Trial(name='trial', config={'x': 1}, bucket=PathBucket(PosixPath('unknown-trial-bucket')), metrics=[Metric(name='loss', minimize=True, bounds=None)], seed=None, fidelities=None, summary={}, exception=None, storage=set(), extras={}), status=<Status.SUCCESS: 'success'>, metrics={'loss': 1.0}, metric_values=(Metric.Value(metric=Metric(name='loss', minimize=True, bounds=None), value=1.0),), metric_defs={'loss': Metric(name='loss', minimize=True, bounds=None)}, metric_names=('loss',))
PARAMETER | DESCRIPTION |
---|---|
**metrics |
The metrics of the trial, where the key is the name of the metrics and the value is the metric. |
RETURNS | DESCRIPTION |
---|---|
Report[I]
|
The report of the trial. |
Source code in src/amltk/optimization/trial.py
def fail(**metrics)
#
Generate a failure report.
Non specifed metrics
If you do not specify metrics, this will use
the .metrics
to determine
the .worst
value of the metric,
using that as the reported result
from amltk.optimization import Trial, Metric
loss = Metric("loss", minimize=True, bounds=(0, 1_000))
trial = Trial(name="trial", config={"x": 1}, metrics=[loss])
with trial.begin():
raise ValueError("This is an error") # Something went wrong
if trial.exception: # You can check for an exception of the trial here
report = trial.fail()
print(report.metrics)
print(report)
{'loss': 1000.0}
Trial.Report(trial=Trial(name='trial', config={'x': 1}, bucket=PathBucket(PosixPath('unknown-trial-bucket')), metrics=[Metric(name='loss', minimize=True, bounds=(0.0, 1000.0))], seed=None, fidelities=None, summary={}, exception=ValueError('This is an error'), storage=set(), extras={}), status=<Status.FAIL: 'fail'>, metrics={'loss': 1000.0}, metric_values=(Metric.Value(metric=Metric(name='loss', minimize=True, bounds=(0.0, 1000.0)), value=1000.0),), metric_defs={'loss': Metric(name='loss', minimize=True, bounds=(0.0, 1000.0))}, metric_names=('loss',))
RETURNS | DESCRIPTION |
---|---|
Report[I]
|
The result of the trial. |
Source code in src/amltk/optimization/trial.py
def crashed(exception=None, traceback=None)
#
Generate a crash report.
Note
You will typically not create these manually, but instead if we don't recieve a report from a target function evaluation, but only an error, we assume something crashed and generate a crash report for you.
Non specifed metrics
We will use the .metrics
to determine
the .worst
value of the metric,
using that as the reported metrics
PARAMETER | DESCRIPTION |
---|---|
exception |
The exception that caused the crash. If not provided, the
exception will be taken from the trial. If this is still
TYPE:
|
traceback |
The traceback of the exception. If not provided, the traceback will be taken from the trial if there is one there.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Report[I]
|
The report of the trial. |
Source code in src/amltk/optimization/trial.py
def store(items, *, where=None)
#
Store items related to the trial.
from amltk.optimization import Trial
from amltk.store import PathBucket
trial = Trial(name="trial", config={"x": 1}, bucket=PathBucket("results"))
trial.store({"config.json": trial.config})
print(trial.storage)
You could also specify where=
exactly to store the thing
from amltk.optimization import Trial
trial = Trial(name="trial", config={"x": 1})
trial.store({"config.json": trial.config}, where="./results")
print(trial.storage)
PARAMETER | DESCRIPTION |
---|---|
items |
The items to store, a dict from the key to store it under
to the item itself.If using a |
where |
Where to store the items.
TYPE:
|
Source code in src/amltk/optimization/trial.py
def delete_from_storage(items, *, where=None)
#
Delete items related to the trial.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial(name="trial", config={"x": 1}, info={}, bucket=bucket)
trial.store({"config.json": trial.config})
trial.delete_from_storage(items=["config.json"])
print(trial.storage)
You could also create a Bucket and use that instead.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
trial.delete_from_storage(items=["config.json"])
print(trial.storage)
PARAMETER | DESCRIPTION |
---|---|
items |
The items to delete, an iterable of keys |
where |
Where the items are stored
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, bool]
|
A dict from the key to whether it was deleted or not. |
Source code in src/amltk/optimization/trial.py
def copy()
#
def retrieve(key, *, where=None, check=None)
#
Retrieve items related to the trial.
Same argument for where=
Use the same argument for where=
as you did for store()
.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
# Create a trial, normally done by an optimizer
trial = Trial(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
config = trial.retrieve("config.json")
print(config)
You could also manually specify where something get's stored and retrieved
from amltk.optimization import Trial
from amltk.store import PathBucket
path = "./config_path"
trial = Trial(name="trial", config={"x": 1})
trial.store({"config.json": trial.config}, where=path)
config = trial.retrieve("config.json", where=path)
print(config)
PARAMETER | DESCRIPTION |
---|---|
key |
The key of the item to retrieve as said in
TYPE:
|
check |
If provided, will check that the retrieved item is of the
provided type. If not, will raise a
TYPE:
|
where |
Where to retrieve the items from.
|
RETURNS | DESCRIPTION |
---|---|
R | Any
|
The retrieved item. |
RAISES | DESCRIPTION |
---|---|
TypeError
|
If |
Source code in src/amltk/optimization/trial.py
661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 |
|
def attach_extra(name, plugin_item)
#
def rich_renderables()
#
The renderables for rich for this report.
Source code in src/amltk/optimization/trial.py
options: members: False
History#
The History
is
used to keep a structured record of what occured with
Trial
s and their associated
Report
s.
Usage
from amltk.optimization import Trial, History, Metric
from amltk.store import PathBucket
loss = Metric("loss", minimize=True)
def target_function(trial: Trial) -> Trial.Report:
x = trial.config["x"]
y = trial.config["y"]
trial.store({"config.json": trial.config})
with trial.begin():
loss = x**2 - y
if trial.exception:
return trial.fail()
return trial.success(loss=loss)
# ... usually obtained from an optimizer
bucket = PathBucket("all-trial-results")
history = History()
for x, y in zip([1, 2, 3], [4, 5, 6]):
trial = Trial(name="some-unique-name", config={"x": x, "y": y}, bucket=bucket, metrics=[loss])
report = target_function(trial)
history.add(report)
print(history.df())
bucket.rmdir() # markdon-exec: hide
status trial_seed ... time:kind time:unit
name ...
some-unique-name success
You'll often need to perform some operations on a
History
so we provide some utility functions here:
filter(key=...)
- Filters the history by some predicate, e.g.history.filter(lambda report: report.status == "success")
groupby(key=...)
- Groups the history by some key, e.g.history.groupby(lambda report: report.config["x"] < 5)
sortby(key=...)
- Sorts the history by some key, e.g.history.sortby(lambda report: report.time.end)
There is also some serialization capabilities built in, to allow you to store your reports and load them back in later:
df(...)
- Output apd.DataFrame
of all the information available.from_df(...)
- Create aHistory
from apd.DataFrame
.
You can also retrieve individual reports from the history by using their
name, e.g. history["some-unique-name"]
or iterate through
the history with for report in history: ...
.