Trial
amltk.optimization.trial
#
A Trial
is
typically the output of
Optimizer.ask()
, indicating
what the optimizer would like to evaluate next. We provide a host
of convenience methods attached to the Trial
to make it easy to
save results, store artifacts, and more.
Paired with the Trial
is the Trial.Report
,
class, providing an easy way to report back to the optimizer's
tell()
with
a simple trial.success(cost=...)
or
trial.fail(cost=...)
call..
Trial#
amltk.optimization.trial.Trial
dataclass
#
Bases: RichRenderable
, Generic[I]
A Trial
encapsulates some configuration
that needs to be evaluated. Typically, this is what is generated by an
Optimizer.ask()
call.
Usage
To begin a trial, you can use the
trial.begin()
, which will catch
exceptions/traceback and profile the block of code.
If all went smooth, your trial was successful and you can use
trial.success()
to generate
a success Report
, typically
passing what your chosen optimizer expects, e.g., "loss"
or "cost"
.
If your trial failed, you can instead use the
trial.fail()
to generate a
failure Report
, where
any caught exception will be attached to it. Each
Optimizer
will take care of what to do
from here.
from amltk.optimization import Trial, Metric
from amltk.store import PathBucket
cost = Metric("cost", minimize=True)
def target_function(trial: Trial) -> Trial.Report:
x = trial.config["x"]
y = trial.config["y"]
with trial.begin():
cost = x**2 - y
if trial.exception:
return trial.fail()
return trial.success(cost=cost)
# ... usually obtained from an optimizer
trial = Trial(name="some-unique-name", config={"x": 1, "y": 2}, metrics=[cost])
report = target_function(trial)
print(report.df())
status trial_seed ... time:kind time:unit
name ...
some-unique-name success
What you can return with trial.success()
or trial.fail()
depends on the
metrics
of the trial. Typically,
an optimizer will provide the trial with the list of metrics.
Metrics
amltk.optimization.metric.Metric
dataclass
#
A metric with a given name, optimal direction, and possible bounds.
Some important properties are that they have a unique
.name
given the optimization run,
a candidate .config
to evaluate,
a possible .seed
to use,
and an .info
object, which is the optimizer
specific information, if required by you.
Reporting success (or failure)
When using the success()
or fail()
method, make sure to
provide values for all metrics specified in the
.metrics
attribute. Usually these are
set by the optimizer generating the Trial
.
Each metric has a unique name, and it's crucial to use the correct names when reporting success, otherwise an error will occur.
Reporting success for metrics
For example:
from amltk.optimization import Trial, Metric
# Gotten from some optimizer usually, i.e. via `optimizer.ask()`
trial = Trial(
name="example_trial",
config={"param": 42},
metrics=[Metric(name="accuracy", minimize=False)]
)
# Incorrect usage (will raise an error)
try:
report = trial.success(invalid_metric=0.95)
except ValueError as error:
print(error)
# Correct usage
report = trial.success(accuracy=0.95)
If using Plugins
, they may insert
some extra objects in the .extra
dict.
To profile your trial, you can wrap the logic you'd like to check with
trial.begin()
, which will automatically
catch any errors, record the traceback, and profile the block of code, in
terms of time and memory.
You can access the profiled time and memory using the
.time
and
.memory
attributes.
If you've profile()
'ed any other intervals,
you can access them by name through
trial.profiles
.
Please see the Profiler
for more.
Profiling with a trial.
You can also record anything you'd like into the
.summary
, a plain dict
or use trial.store()
to store artifacts
related to the trial.
What to put in .summary
?
For large items, e.g. predictions or models, these are highly advised to
.store()
to disk, especially if using
a Task
for multiprocessing.
Further, if serializing the report using the
report.df()
,
returning a single row,
or a History
with history.df()
for a dataframe consisting
of many of the reports, then you'd likely only want to store things
that are scalar and can be serialised to disk by a pandas DataFrame.
Report#
amltk.optimization.trial.Trial.Report
dataclass
#
Bases: RichRenderable
, Generic[I2]
The Trial.Report
encapsulates
a Trial
, its status and any metrics/exceptions
that may have occured.
Typically you will not create these yourself, but instead use
trial.success()
or
trial.fail()
to generate them.
from amltk.optimization import Trial, Metric
loss = Metric("loss", minimize=True)
trial = Trial(name="trial", config={"x": 1}, metrics=[loss])
with trial.begin():
# Do some work
# ...
report: Trial.Report = trial.success(loss=1)
print(report.df())
These reports are used to report back metrics to an
Optimizer
with Optimizer.tell()
but can also be
stored for your own uses.
You can access the original trial with the
.trial
attribute, and the
Status
of the trial with the
.status
attribute.
You may also want to check out the History
class
for storing a collection of Report
s, allowing for an easier time to convert
them to a dataframe or perform some common Hyperparameter optimization parsing
of metrics.
Trial
dataclass
#
Bases: RichRenderable
, Generic[I]
A Trial
encapsulates some configuration
that needs to be evaluated. Typically, this is what is generated by an
Optimizer.ask()
call.
Usage
To begin a trial, you can use the
trial.begin()
, which will catch
exceptions/traceback and profile the block of code.
If all went smooth, your trial was successful and you can use
trial.success()
to generate
a success Report
, typically
passing what your chosen optimizer expects, e.g., "loss"
or "cost"
.
If your trial failed, you can instead use the
trial.fail()
to generate a
failure Report
, where
any caught exception will be attached to it. Each
Optimizer
will take care of what to do
from here.
from amltk.optimization import Trial, Metric
from amltk.store import PathBucket
cost = Metric("cost", minimize=True)
def target_function(trial: Trial) -> Trial.Report:
x = trial.config["x"]
y = trial.config["y"]
with trial.begin():
cost = x**2 - y
if trial.exception:
return trial.fail()
return trial.success(cost=cost)
# ... usually obtained from an optimizer
trial = Trial(name="some-unique-name", config={"x": 1, "y": 2}, metrics=[cost])
report = target_function(trial)
print(report.df())
status trial_seed ... time:kind time:unit
name ...
some-unique-name success
What you can return with trial.success()
or trial.fail()
depends on the
metrics
of the trial. Typically,
an optimizer will provide the trial with the list of metrics.
Metrics
amltk.optimization.metric.Metric
dataclass
#
A metric with a given name, optimal direction, and possible bounds.
Some important properties are that they have a unique
.name
given the optimization run,
a candidate .config
to evaluate,
a possible .seed
to use,
and an .info
object, which is the optimizer
specific information, if required by you.
Reporting success (or failure)
When using the success()
or fail()
method, make sure to
provide values for all metrics specified in the
.metrics
attribute. Usually these are
set by the optimizer generating the Trial
.
Each metric has a unique name, and it's crucial to use the correct names when reporting success, otherwise an error will occur.
Reporting success for metrics
For example:
from amltk.optimization import Trial, Metric
# Gotten from some optimizer usually, i.e. via `optimizer.ask()`
trial = Trial(
name="example_trial",
config={"param": 42},
metrics=[Metric(name="accuracy", minimize=False)]
)
# Incorrect usage (will raise an error)
try:
report = trial.success(invalid_metric=0.95)
except ValueError as error:
print(error)
# Correct usage
report = trial.success(accuracy=0.95)
If using Plugins
, they may insert
some extra objects in the .extra
dict.
To profile your trial, you can wrap the logic you'd like to check with
trial.begin()
, which will automatically
catch any errors, record the traceback, and profile the block of code, in
terms of time and memory.
You can access the profiled time and memory using the
.time
and
.memory
attributes.
If you've profile()
'ed any other intervals,
you can access them by name through
trial.profiles
.
Please see the Profiler
for more.
Profiling with a trial.
You can also record anything you'd like into the
.summary
, a plain dict
or use trial.store()
to store artifacts
related to the trial.
What to put in .summary
?
For large items, e.g. predictions or models, these are highly advised to
.store()
to disk, especially if using
a Task
for multiprocessing.
Further, if serializing the report using the
report.df()
,
returning a single row,
or a History
with history.df()
for a dataframe consisting
of many of the reports, then you'd likely only want to store things
that are scalar and can be serialised to disk by a pandas DataFrame.
bucket
class-attribute
instance-attribute
#
bucket: PathBucket = field(
default_factory=lambda: PathBucket(
"unknown-trial-bucket"
)
)
The bucket to store trial related output to.
config
instance-attribute
#
The config of the trial provided by the optimizer.
exception
class-attribute
instance-attribute
#
exception: BaseException | None = field(
repr=True, default=None
)
The exception raised by the trial, if any.
extras
class-attribute
instance-attribute
#
Any extras attached to the trial.
fidelities
class-attribute
instance-attribute
#
The fidelities at which to evaluate the trial, if any.
info
class-attribute
instance-attribute
#
info: I | None = field(default=None, repr=False)
The info of the trial provided by the optimizer.
memory
class-attribute
instance-attribute
#
The memory used by the trial, once ended.
metrics
class-attribute
instance-attribute
#
The metrics associated with the trial.
profiler
class-attribute
instance-attribute
#
profiler: Profiler = field(
repr=False,
default_factory=lambda: Profiler(
memory_unit="B", time_kind="wall"
),
)
A profiler for this trial.
seed
class-attribute
instance-attribute
#
seed: int | None = None
The seed to use if suggested by the optimizer.
storage
class-attribute
instance-attribute
#
Anything stored in the trial, the elements of the list are keys that can be used to retrieve them later, such as a Path.
summary
class-attribute
instance-attribute
#
The summary of the trial. These are for summary statistics of a trial and are single values.
time
class-attribute
instance-attribute
#
The time taken by the trial, once ended.
traceback
class-attribute
instance-attribute
#
The traceback of the exception, if any.
Report
dataclass
#
Bases: RichRenderable
, Generic[I2]
The Trial.Report
encapsulates
a Trial
, its status and any metrics/exceptions
that may have occured.
Typically you will not create these yourself, but instead use
trial.success()
or
trial.fail()
to generate them.
from amltk.optimization import Trial, Metric
loss = Metric("loss", minimize=True)
trial = Trial(name="trial", config={"x": 1}, metrics=[loss])
with trial.begin():
# Do some work
# ...
report: Trial.Report = trial.success(loss=1)
print(report.df())
These reports are used to report back metrics to an
Optimizer
with Optimizer.tell()
but can also be
stored for your own uses.
You can access the original trial with the
.trial
attribute, and the
Status
of the trial with the
.status
attribute.
You may also want to check out the History
class
for storing a collection of Report
s, allowing for an easier time to convert
them to a dataframe or perform some common Hyperparameter optimization parsing
of metrics.
metric_defs
class-attribute
instance-attribute
#
A lookup to the metric definitions
metric_names
class-attribute
instance-attribute
#
The names of the metrics.
metric_values
class-attribute
instance-attribute
#
The metrics of the trial, linked to the metrics.
metrics
class-attribute
instance-attribute
#
The metric values of the trial.
df
#
df(
*,
profiles: bool = True,
configs: bool = True,
summary: bool = True,
metrics: bool = True
) -> DataFrame
Get a dataframe of the trial.
Prefixes
summary
: Entries will be prefixed with"summary:"
config
: Entries will be prefixed with"config:"
storage
: Entries will be prefixed with"storage:"
metrics
: Entries will be prefixed with"metrics:"
profile:<name>
: Entries will be prefixed with"profile:<name>:"
PARAMETER | DESCRIPTION |
---|---|
profiles |
Whether to include the profiles.
TYPE:
|
configs |
Whether to include the configs.
TYPE:
|
summary |
Whether to include the summary.
TYPE:
|
metrics |
Whether to include the metrics.
TYPE:
|
Source code in src/amltk/optimization/trial.py
from_df
classmethod
#
Create a report from a dataframe.
See Also
Source code in src/amltk/optimization/trial.py
from_dict
classmethod
#
Create a report from a dictionary.
Prefixes
Please see .df()
for information on what the prefixes should be for certain fields.
PARAMETER | DESCRIPTION |
---|---|
d |
The dictionary to create the report from. |
RETURNS | DESCRIPTION |
---|---|
Report
|
The created report. |
Source code in src/amltk/optimization/trial.py
1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 |
|
retrieve
#
retrieve(
key: str,
*,
where: str | Path | Bucket[str, Any] | None = None,
check: type[R] | None = None
) -> R | Any
Retrieve items related to the trial.
Same argument for where=
Use the same argument for where=
as you did for store()
.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
with trial.begin():
report = trial.success()
config = report.retrieve("config.json")
print(config)
You could also create a Bucket and use that instead.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
with trial.begin():
report = trial.success()
config = report.retrieve("config.json")
print(config)
PARAMETER | DESCRIPTION |
---|---|
key |
The key of the item to retrieve as said in
TYPE:
|
check |
If provided, will check that the retrieved item is of the
provided type. If not, will raise a
TYPE:
|
where |
Where to retrieve the items from.
|
RETURNS | DESCRIPTION |
---|---|
R | Any
|
The retrieved item. |
RAISES | DESCRIPTION |
---|---|
TypeError
|
If |
Source code in src/amltk/optimization/trial.py
1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 |
|
rich_renderables
#
rich_renderables() -> Iterable[RenderableType]
The renderables for rich for this report.
Source code in src/amltk/optimization/trial.py
Status
#
The status of a trial.
UNKNOWN
class-attribute
instance-attribute
#
The status of the trial is unknown.
attach_extra
#
begin
#
begin(
time: (
Kind | Literal["wall", "cpu", "process"] | None
) = None,
memory_unit: (
Unit | Literal["B", "KB", "MB", "GB"] | None
) = None,
) -> Iterator[None]
Begin the trial with a contextmanager
.
Will begin timing the trial in the with
block, attaching the profiled time and memory
to the trial once completed, under .profile.time
and .profile.memory
attributes.
If an exception is raised, it will be attached to the trial under .exception
with the traceback attached to the actual error message, such that it can
be pickled and sent back to the main process loop.
from amltk.optimization import Trial
trial = Trial(name="trial", config={"x": 1})
with trial.begin():
# Do some work
pass
print(trial.memory)
print(trial.time)
from amltk.optimization import Trial
trial = Trial(name="trial", config={"x": -1})
with trial.begin():
raise ValueError("x must be positive")
print(trial.exception)
print(trial.traceback)
print(trial.memory)
print(trial.time)
x must be positive
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/amltk/optimization/trial.py", line 336, in begin
yield
File "<code block: n33; title begin-fail>", line 6, in <module>
ValueError: x must be positive
Memory.Interval(start_vms=1550761984.0, start_rss=297996288.0, end_vms=1550761984, end_rss=297996288, unit=bytes)
Timer.Interval(start=1706255424.1034796, end=1706255424.1037235, kind=wall, unit=seconds)
PARAMETER | DESCRIPTION |
---|---|
time |
The timer kind to use for the trial. Defaults to the default timer kind of the profiler.
TYPE:
|
memory_unit |
The memory unit to use for the trial. Defaults to the default memory unit of the profiler.
TYPE:
|
Source code in src/amltk/optimization/trial.py
copy
#
crashed
#
crashed(
exception: BaseException | None = None,
traceback: str | None = None,
) -> Report[I]
Generate a crash report.
Note
You will typically not create these manually, but instead if we don't recieve a report from a target function evaluation, but only an error, we assume something crashed and generate a crash report for you.
Non specifed metrics
We will use the .metrics
to determine
the .worst
value of the metric,
using that as the reported metrics
PARAMETER | DESCRIPTION |
---|---|
exception |
The exception that caused the crash. If not provided, the
exception will be taken from the trial. If this is still
TYPE:
|
traceback |
The traceback of the exception. If not provided, the traceback will be taken from the trial if there is one there.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Report[I]
|
The report of the trial. |
Source code in src/amltk/optimization/trial.py
delete_from_storage
#
delete_from_storage(
items: Iterable[str],
*,
where: (
str
| Path
| Bucket
| Callable[[str, Iterable[str]], dict[str, bool]]
| None
) = None
) -> dict[str, bool]
Delete items related to the trial.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial(name="trial", config={"x": 1}, info={}, bucket=bucket)
trial.store({"config.json": trial.config})
trial.delete_from_storage(items=["config.json"])
print(trial.storage)
You could also create a Bucket and use that instead.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
trial.delete_from_storage(items=["config.json"])
print(trial.storage)
PARAMETER | DESCRIPTION |
---|---|
items |
The items to delete, an iterable of keys |
where |
Where the items are stored
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, bool]
|
A dict from the key to whether it was deleted or not. |
Source code in src/amltk/optimization/trial.py
fail
#
Generate a failure report.
Non specifed metrics
If you do not specify metrics, this will use
the .metrics
to determine
the .worst
value of the metric,
using that as the reported result
from amltk.optimization import Trial, Metric
loss = Metric("loss", minimize=True, bounds=(0, 1_000))
trial = Trial(name="trial", config={"x": 1}, metrics=[loss])
with trial.begin():
raise ValueError("This is an error") # Something went wrong
if trial.exception: # You can check for an exception of the trial here
report = trial.fail()
print(report.metrics)
print(report)
{'loss': 1000.0}
Trial.Report(trial=Trial(name='trial', config={'x': 1}, bucket=PathBucket(PosixPath('unknown-trial-bucket')), metrics=[Metric(name='loss', minimize=True, bounds=(0.0, 1000.0))], seed=None, fidelities=None, summary={}, exception=ValueError('This is an error'), storage=set(), extras={}), status=<Status.FAIL: 'fail'>, metrics={'loss': 1000.0}, metric_values=(Metric.Value(metric=Metric(name='loss', minimize=True, bounds=(0.0, 1000.0)), value=1000.0),), metric_defs={'loss': Metric(name='loss', minimize=True, bounds=(0.0, 1000.0))}, metric_names=('loss',))
RETURNS | DESCRIPTION |
---|---|
Report[I]
|
The result of the trial. |
Source code in src/amltk/optimization/trial.py
profile
#
profile(
name: str,
*,
time: (
Kind | Literal["wall", "cpu", "process"] | None
) = None,
memory_unit: (
Unit | Literal["B", "KB", "MB", "GB"] | None
) = None,
summary: bool = False
) -> Iterator[None]
Measure some interval in the trial.
The results of the profiling will be available in the .summary
attribute
with the name of the interval as the key.
from amltk.optimization import Trial
import time
trial = Trial(name="trial", config={"x": 1})
with trial.profile("some_interval"):
# Do some work
time.sleep(1)
print(trial.profiler["some_interval"].time)
PARAMETER | DESCRIPTION |
---|---|
name |
The name of the interval.
TYPE:
|
time |
The timer kind to use for the trial. Defaults to the default timer kind of the profiler.
TYPE:
|
memory_unit |
The memory unit to use for the trial. Defaults to the default memory unit of the profiler.
TYPE:
|
summary |
Whether to add the interval to the summary.
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
Iterator[None]
|
The interval measured. Values will be nan until the with block is finished. |
Source code in src/amltk/optimization/trial.py
retrieve
#
retrieve(
key: str,
*,
where: str | Path | Bucket[str, Any] | None = None,
check: type[R] | None = None
) -> R | Any
Retrieve items related to the trial.
Same argument for where=
Use the same argument for where=
as you did for store()
.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
# Create a trial, normally done by an optimizer
trial = Trial(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
config = trial.retrieve("config.json")
print(config)
You could also manually specify where something get's stored and retrieved
from amltk.optimization import Trial
from amltk.store import PathBucket
path = "./config_path"
trial = Trial(name="trial", config={"x": 1})
trial.store({"config.json": trial.config}, where=path)
config = trial.retrieve("config.json", where=path)
print(config)
PARAMETER | DESCRIPTION |
---|---|
key |
The key of the item to retrieve as said in
TYPE:
|
check |
If provided, will check that the retrieved item is of the
provided type. If not, will raise a
TYPE:
|
where |
Where to retrieve the items from.
|
RETURNS | DESCRIPTION |
---|---|
R | Any
|
The retrieved item. |
RAISES | DESCRIPTION |
---|---|
TypeError
|
If |
Source code in src/amltk/optimization/trial.py
700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 |
|
rich_renderables
#
rich_renderables() -> Iterable[RenderableType]
The renderables for rich for this report.
Source code in src/amltk/optimization/trial.py
store
#
store(
items: Mapping[str, T],
*,
where: (
str
| Path
| Bucket
| Callable[[str, Mapping[str, T]], None]
| None
) = None
) -> None
Store items related to the trial.
from amltk.optimization import Trial
from amltk.store import PathBucket
trial = Trial(name="trial", config={"x": 1}, bucket=PathBucket("results"))
trial.store({"config.json": trial.config})
print(trial.storage)
You could also specify where=
exactly to store the thing
from amltk.optimization import Trial
trial = Trial(name="trial", config={"x": 1})
trial.store({"config.json": trial.config}, where="./results")
print(trial.storage)
PARAMETER | DESCRIPTION |
---|---|
items |
The items to store, a dict from the key to store it under
to the item itself.If using a |
where |
Where to store the items.
TYPE:
|
Source code in src/amltk/optimization/trial.py
success
#
Generate a success report.
from amltk.optimization import Trial, Metric
loss_metric = Metric("loss", minimize=True)
trial = Trial(name="trial", config={"x": 1}, metrics=[loss_metric])
with trial.begin():
# Do some work
report = trial.success(loss=1)
print(report)
Trial.Report(trial=Trial(name='trial', config={'x': 1}, bucket=PathBucket(PosixPath('unknown-trial-bucket')), metrics=[Metric(name='loss', minimize=True, bounds=None)], seed=None, fidelities=None, summary={}, exception=None, storage=set(), extras={}), status=<Status.SUCCESS: 'success'>, metrics={'loss': 1.0}, metric_values=(Metric.Value(metric=Metric(name='loss', minimize=True, bounds=None), value=1.0),), metric_defs={'loss': Metric(name='loss', minimize=True, bounds=None)}, metric_names=('loss',))
PARAMETER | DESCRIPTION |
---|---|
**metrics |
The metrics of the trial, where the key is the name of the metrics and the value is the metric. |
RETURNS | DESCRIPTION |
---|---|
Report[I]
|
The report of the trial. |