Trials
Trial#
amltk.optimization.trial
#
A Trial
is
typically the output of
Optimizer.ask()
, indicating
what the optimizer would like to evaluate next. We provide a host
of convenience methods attached to the Trial
to make it easy to
save results, store artifacts, and more.
Paired with the Trial
is the Trial.Report
,
class, providing an easy way to report back to the optimizer's
tell()
with
a simple trial.success(cost=...)
or
trial.fail(cost=...)
call..
Trial#
amltk.optimization.trial.Trial
dataclass
#
Trial(
*,
name: str,
config: Mapping[str, Any],
bucket: PathBucket,
info: I | None,
metrics: MetricCollection,
created_at: datetime,
seed: int | None = None,
fidelities: Mapping[str, Any],
profiler: Profiler,
summary: MutableMapping[str, Any],
storage: set[Any],
extras: MutableMapping[str, Any]
)
Bases: RichRenderable
, Generic[I]
A Trial
encapsulates some configuration
that needs to be evaluated. Typically, this is what is generated by an
Optimizer.ask()
call.
Usage
If all went smooth, your trial was successful and you can use
trial.success()
to generate
a success Report
, typically
passing what your chosen optimizer expects, e.g., "loss"
or "cost"
.
If your trial failed, you can instead use the
trial.fail()
to generate a
failure Report
. If use pass
in an exception to fail()
, it will be attached to the report along
with any traceback it can deduce.
Each Optimizer
will take
care of what to do from here.
from amltk.optimization import Trial, Metric
from amltk.store import PathBucket
cost = Metric("cost", minimize=True)
def target_function(trial: Trial) -> Trial.Report:
x = trial.config["x"]
y = trial.config["y"]
with trial.profile("expensive-calculation"):
cost = x**2 - y
return trial.success(cost=cost)
# ... usually obtained from an optimizer
trial = Trial.create(
name="some-unique-name",
config={"x": 1, "y": 2},
metrics=[cost]
)
report = target_function(trial)
print(report.df())
status ... profile:expensive-calculation:time:unit name ... some-unique-name success ... seconds [1 rows x 22 columns]
What you can return with trial.success()
or trial.fail()
depends on the
metrics
of the trial. Typically,
an optimizer will provide the trial with the list of metrics.
Metrics
Some important properties are that they have a unique
.name
given the optimization run,
a candidate .config
to evaluate,
a possible .seed
to use,
and an .info
object, which is the optimizer
specific information, if required by you.
Reporting success (or failure)
When using the success()
method, make sure to provide values for all metrics specified in the
.metrics
attribute.
Usually these are set by the optimizer generating the Trial
.
If you instead report using fail()
,
any metric not specified will be set to the
.worst
value of the metric.
Each metric has a unique name, and it's crucial to use the correct names when reporting success, otherwise an error will occur.
Reporting success for metrics
For example:
from amltk.optimization import Trial, Metric
# Gotten from some optimizer usually, i.e. via `optimizer.ask()`
trial = Trial.create(
name="example_trial",
config={"param": 42},
metrics=[Metric(name="accuracy", minimize=False)]
)
# Incorrect usage (will raise an error)
try:
report = trial.success(invalid_metric=0.95)
except ValueError as error:
print(error)
# Correct usage
report = trial.success(accuracy=0.95)
If using Plugins
, they may insert
some extra objects in the .extra
dict.
To profile your trial, you can wrap the logic you'd like to check with
trial.profile()
, which will automatically
profile the block of code for memory before and after as well as time taken.
If you've profile()
'ed any intervals,
you can access them by name through
trial.profiles
.
Please see the Profiler
for more.
Profiling with a trial.
You can also record anything you'd like into the
.summary
, a plain dict
or use trial.store()
to store artifacts
related to the trial.
What to put in .summary
?
For large items, e.g. predictions or models, these are highly advised to
.store()
to disk, especially if using
a Task
for multiprocessing.
Further, if serializing the report using the
report.df()
,
returning a single row,
or a History
with history.df()
for a dataframe consisting
of many of the reports, then you'd likely only want to store things
that are scalar and can be serialised to disk by a pandas DataFrame.
Report#
amltk.optimization.trial.Trial.Report
dataclass
#
Report(
trial: Trial[I2],
status: Status,
reported_at: datetime = datetime.now(),
exception: BaseException | None = None,
traceback: str | None = None,
values: Mapping[str, float] = dict(),
)
Bases: RichRenderable
, Generic[I2]
The Trial.Report
encapsulates
a Trial
, its status and any metrics/exceptions
that may have occured.
Typically you will not create these yourself, but instead use
trial.success()
or
trial.fail()
to generate them.
from amltk.optimization import Trial, Metric
loss = Metric("loss", minimize=True)
trial = Trial.create(name="trial", config={"x": 1}, metrics=[loss])
with trial.profile("fitting"):
# Do some work
# ...
report = trial.success(loss=1)
print(report.df())
These reports are used to report back metrics to an
Optimizer
with Optimizer.tell()
but can also be
stored for your own uses.
You can access the original trial with the
.trial
attribute, and the
Status
of the trial with the
.status
attribute.
You may also want to check out the History
class
for storing a collection of Report
s, allowing for an easier time to convert
them to a dataframe or perform some common Hyperparameter optimization parsing
of metrics.
Trial
dataclass
#
Trial(
*,
name: str,
config: Mapping[str, Any],
bucket: PathBucket,
info: I | None,
metrics: MetricCollection,
created_at: datetime,
seed: int | None = None,
fidelities: Mapping[str, Any],
profiler: Profiler,
summary: MutableMapping[str, Any],
storage: set[Any],
extras: MutableMapping[str, Any]
)
Bases: RichRenderable
, Generic[I]
A Trial
encapsulates some configuration
that needs to be evaluated. Typically, this is what is generated by an
Optimizer.ask()
call.
Usage
If all went smooth, your trial was successful and you can use
trial.success()
to generate
a success Report
, typically
passing what your chosen optimizer expects, e.g., "loss"
or "cost"
.
If your trial failed, you can instead use the
trial.fail()
to generate a
failure Report
. If use pass
in an exception to fail()
, it will be attached to the report along
with any traceback it can deduce.
Each Optimizer
will take
care of what to do from here.
from amltk.optimization import Trial, Metric
from amltk.store import PathBucket
cost = Metric("cost", minimize=True)
def target_function(trial: Trial) -> Trial.Report:
x = trial.config["x"]
y = trial.config["y"]
with trial.profile("expensive-calculation"):
cost = x**2 - y
return trial.success(cost=cost)
# ... usually obtained from an optimizer
trial = Trial.create(
name="some-unique-name",
config={"x": 1, "y": 2},
metrics=[cost]
)
report = target_function(trial)
print(report.df())
status ... profile:expensive-calculation:time:unit name ... some-unique-name success ... seconds [1 rows x 22 columns]
What you can return with trial.success()
or trial.fail()
depends on the
metrics
of the trial. Typically,
an optimizer will provide the trial with the list of metrics.
Metrics
Some important properties are that they have a unique
.name
given the optimization run,
a candidate .config
to evaluate,
a possible .seed
to use,
and an .info
object, which is the optimizer
specific information, if required by you.
Reporting success (or failure)
When using the success()
method, make sure to provide values for all metrics specified in the
.metrics
attribute.
Usually these are set by the optimizer generating the Trial
.
If you instead report using fail()
,
any metric not specified will be set to the
.worst
value of the metric.
Each metric has a unique name, and it's crucial to use the correct names when reporting success, otherwise an error will occur.
Reporting success for metrics
For example:
from amltk.optimization import Trial, Metric
# Gotten from some optimizer usually, i.e. via `optimizer.ask()`
trial = Trial.create(
name="example_trial",
config={"param": 42},
metrics=[Metric(name="accuracy", minimize=False)]
)
# Incorrect usage (will raise an error)
try:
report = trial.success(invalid_metric=0.95)
except ValueError as error:
print(error)
# Correct usage
report = trial.success(accuracy=0.95)
If using Plugins
, they may insert
some extra objects in the .extra
dict.
To profile your trial, you can wrap the logic you'd like to check with
trial.profile()
, which will automatically
profile the block of code for memory before and after as well as time taken.
If you've profile()
'ed any intervals,
you can access them by name through
trial.profiles
.
Please see the Profiler
for more.
Profiling with a trial.
You can also record anything you'd like into the
.summary
, a plain dict
or use trial.store()
to store artifacts
related to the trial.
What to put in .summary
?
For large items, e.g. predictions or models, these are highly advised to
.store()
to disk, especially if using
a Task
for multiprocessing.
Further, if serializing the report using the
report.df()
,
returning a single row,
or a History
with history.df()
for a dataframe consisting
of many of the reports, then you'd likely only want to store things
that are scalar and can be serialised to disk by a pandas DataFrame.
config
instance-attribute
#
The config of the trial provided by the optimizer.
fidelities
instance-attribute
#
The fidelities at which to evaluate the trial, if any.
info
class-attribute
instance-attribute
#
info: I | None = field(repr=False)
The info of the trial provided by the optimizer.
metrics
instance-attribute
#
metrics: MetricCollection
The metrics associated with the trial.
You can access the metrics by name, e.g. trial.metrics["loss"]
.
profiler
class-attribute
instance-attribute
#
A profiler for this trial.
profiles
property
#
The profiles of the trial.
These are indexed by the name of the profile indicated by:
The values are a
Profile.Interval
,
which contain a
Memory.Interval
and a
Timer.Interval
.
Please see the respective documentation for more.
seed
class-attribute
instance-attribute
#
seed: int | None = None
The seed to use if suggested by the optimizer.
storage
instance-attribute
#
Anything stored in the trial, the elements of the list are keys that can be used to retrieve them later, such as a Path.
summary
instance-attribute
#
summary: MutableMapping[str, Any]
The summary of the trial. These are for summary statistics of a trial and are single values.
Report
dataclass
#
Report(
trial: Trial[I2],
status: Status,
reported_at: datetime = datetime.now(),
exception: BaseException | None = None,
traceback: str | None = None,
values: Mapping[str, float] = dict(),
)
Bases: RichRenderable
, Generic[I2]
The Trial.Report
encapsulates
a Trial
, its status and any metrics/exceptions
that may have occured.
Typically you will not create these yourself, but instead use
trial.success()
or
trial.fail()
to generate them.
from amltk.optimization import Trial, Metric
loss = Metric("loss", minimize=True)
trial = Trial.create(name="trial", config={"x": 1}, metrics=[loss])
with trial.profile("fitting"):
# Do some work
# ...
report = trial.success(loss=1)
print(report.df())
These reports are used to report back metrics to an
Optimizer
with Optimizer.tell()
but can also be
stored for your own uses.
You can access the original trial with the
.trial
attribute, and the
Status
of the trial with the
.status
attribute.
You may also want to check out the History
class
for storing a collection of Report
s, allowing for an easier time to convert
them to a dataframe or perform some common Hyperparameter optimization parsing
of metrics.
exception
class-attribute
instance-attribute
#
exception: BaseException | None = None
The exception reported if any.
reported_at
class-attribute
instance-attribute
#
When this Report was generated.
This will primarily be None
if there was no corresponding key
when loading this report from a serialized form, such as
with from_df()
or from_dict()
.
traceback
class-attribute
instance-attribute
#
The traceback reported if any.
values
class-attribute
instance-attribute
#
The reported metric values of the trial.
df
#
df(
*,
profiles: bool = True,
configs: bool = True,
summary: bool = True,
metrics: bool = True
) -> DataFrame
Get a dataframe of the trial.
Prefixes
summary
: Entries will be prefixed with"summary:"
config
: Entries will be prefixed with"config:"
storage
: Entries will be prefixed with"storage:"
metrics
: Entries will be prefixed with"metrics:"
profile:<name>
: Entries will be prefixed with"profile:<name>:"
PARAMETER | DESCRIPTION |
---|---|
profiles |
Whether to include the profiles.
TYPE:
|
configs |
Whether to include the configs.
TYPE:
|
summary |
Whether to include the summary.
TYPE:
|
metrics |
Whether to include the metrics.
TYPE:
|
Source code in src/amltk/optimization/trial.py
from_df
classmethod
#
Create a report from a dataframe.
See Also
Source code in src/amltk/optimization/trial.py
from_dict
classmethod
#
Create a report from a dictionary.
Prefixes
Please see .df()
for information on what the prefixes should be for certain fields.
PARAMETER | DESCRIPTION |
---|---|
d |
The dictionary to create the report from. |
RETURNS | DESCRIPTION |
---|---|
Report
|
The created report. |
Source code in src/amltk/optimization/trial.py
997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 |
|
retrieve
#
Retrieve items related to the trial.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial.create(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
report = trial.success()
config = report.retrieve("config.json")
print(config)
PARAMETER | DESCRIPTION |
---|---|
key |
The key of the item to retrieve as said in
TYPE:
|
check |
If provided, will check that the retrieved item is of the
provided type. If not, will raise a
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
R | Any
|
The retrieved item. |
RAISES | DESCRIPTION |
---|---|
TypeError
|
If |
Source code in src/amltk/optimization/trial.py
rich_renderables
#
rich_renderables() -> Iterable[RenderableType]
The renderables for rich for this report.
Source code in src/amltk/optimization/trial.py
store
#
Store items related to the trial.
See Also
Status
#
The status of a trial.
UNKNOWN
class-attribute
instance-attribute
#
The status of the trial is unknown.
attach_extra
#
copy
#
crashed
#
Generate a crash report.
Note
You will typically not create these manually, but instead if we don't recieve a report from a target function evaluation, but only an error, we assume something crashed and generate a crash report for you.
Non specifed metrics
We will use the .metrics
to determine
the .worst
value of the metric,
using that as the reported metrics
PARAMETER | DESCRIPTION |
---|---|
exception |
The exception that caused the crash. If not provided, the
exception will be taken from the trial. If this is still
TYPE:
|
traceback |
The traceback of the exception. If not provided, the traceback will be taken from the trial if there is one there.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Report[I]
|
The report of the trial. |
Source code in src/amltk/optimization/trial.py
create
classmethod
#
create(
name: str,
config: Mapping[str, Any] | None = None,
*,
metrics: (
Metric
| Iterable[Metric]
| Mapping[str, Metric]
| None
) = None,
info: I | None = None,
seed: int | None = None,
fidelities: Mapping[str, Any] | None = None,
created_at: datetime | None = None,
profiler: Profiler | None = None,
bucket: str | Path | PathBucket | None = None,
summary: MutableMapping[str, Any] | None = None,
storage: set[Hashable] | None = None,
extras: MutableMapping[str, Any] | None = None
) -> Trial[I]
Create a trial.
PARAMETER | DESCRIPTION |
---|---|
name |
The name of the trial.
TYPE:
|
metrics |
The metrics of the trial.
TYPE:
|
config |
The config of the trial. |
info |
The info of the trial.
TYPE:
|
seed |
The seed of the trial.
TYPE:
|
fidelities |
The fidelities of the trial. |
bucket |
The bucket of the trial.
TYPE:
|
created_at |
When the trial was created.
TYPE:
|
profiler |
The profiler of the trial.
TYPE:
|
summary |
The summary of the trial.
TYPE:
|
storage |
The storage of the trial. |
extras |
The extras of the trial.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Trial[I]
|
The trial. |
Source code in src/amltk/optimization/trial.py
delete_from_storage
#
Delete items related to the trial.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
trial = Trial.create(name="trial", config={"x": 1}, info={}, bucket=bucket)
trial.store({"config.json": trial.config})
trial.delete_from_storage(items=["config.json"])
print(trial.storage)
PARAMETER | DESCRIPTION |
---|---|
items |
The items to delete, an iterable of keys |
RETURNS | DESCRIPTION |
---|---|
dict[str, bool]
|
A dict from the key to whether it was deleted or not. |
Source code in src/amltk/optimization/trial.py
dump_exception
#
dump_exception(
exception: BaseException, *, name: str | None = None
) -> None
Dump an exception to the trial.
PARAMETER | DESCRIPTION |
---|---|
exception |
The exception to dump.
TYPE:
|
name |
The name of the file to dump to. If
TYPE:
|
Source code in src/amltk/optimization/trial.py
fail
#
fail(
exception: Exception | None = None,
traceback: str | None = None,
/,
**metrics: float | int,
) -> Report[I]
Generate a failure report.
Non specifed metrics
If you do not specify metrics, this will use
the .metrics
to determine
the .worst
value of the metric,
using that as the reported result
from amltk.optimization import Trial, Metric
loss = Metric("loss", minimize=True, bounds=(0, 1_000))
trial = Trial.create(name="trial", config={"x": 1}, metrics=[loss])
try:
raise ValueError("This is an error") # Something went wrong
except Exception as error:
report = trial.fail(error)
print(report.values)
print(report)
{}
Trial.Report(trial=Trial(name='trial', config={'x': 1}, bucket=PathBucket(PosixPath('trial-trial-2024-04-24T14:12:08.904852')), metrics=MetricCollection(metrics={'loss': Metric(name='loss', minimize=True, bounds=(0.0, 1000.0), fn=None)}), created_at=datetime.datetime(2024, 4, 24, 14, 12, 8, 904849), seed=None, fidelities={}, summary={}, storage=set(), extras={}), status=<Status.FAIL: 'fail'>, reported_at=datetime.datetime(2024, 4, 24, 14, 12, 8, 905024), exception=ValueError('This is an error'), values={})
RETURNS | DESCRIPTION |
---|---|
Report[I]
|
The result of the trial. |
Source code in src/amltk/optimization/trial.py
profile
#
profile(
name: str,
*,
time: (
Kind | Literal["wall", "cpu", "process"] | None
) = None,
memory_unit: (
Unit | Literal["B", "KB", "MB", "GB"] | None
) = None,
summary: bool = False
) -> Iterator[None]
Measure some interval in the trial.
The results of the profiling will be available in the .summary
attribute
with the name of the interval as the key.
from amltk.optimization import Trial
import time
trial = Trial.create(name="trial", config={"x": 1})
with trial.profile("some_interval"):
# Do some work
time.sleep(1)
print(trial.profiler["some_interval"].time)
PARAMETER | DESCRIPTION |
---|---|
name |
The name of the interval.
TYPE:
|
time |
The timer kind to use for the trial. Defaults to the default timer kind of the profiler.
TYPE:
|
memory_unit |
The memory unit to use for the trial. Defaults to the default memory unit of the profiler.
TYPE:
|
summary |
Whether to add the interval to the summary.
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
Iterator[None]
|
The interval measured. Values will be nan until the with block is finished. |
Source code in src/amltk/optimization/trial.py
retrieve
#
Retrieve items related to the trial.
from amltk.optimization import Trial
from amltk.store import PathBucket
bucket = PathBucket("results")
# Create a trial, normally done by an optimizer
trial = Trial.create(name="trial", config={"x": 1}, bucket=bucket)
trial.store({"config.json": trial.config})
config = trial.retrieve("config.json")
print(config)
PARAMETER | DESCRIPTION |
---|---|
key |
The key of the item to retrieve as said in
TYPE:
|
check |
If provided, will check that the retrieved item is of the
provided type. If not, will raise a
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
R | Any
|
The retrieved item. |
RAISES | DESCRIPTION |
---|---|
TypeError
|
If |
Source code in src/amltk/optimization/trial.py
rich_renderables
#
rich_renderables() -> Iterable[RenderableType]
The renderables for rich for this report.
Source code in src/amltk/optimization/trial.py
store
#
Store items related to the trial.
from amltk.optimization import Trial
from amltk.store import PathBucket
trial = Trial.create(name="trial", config={"x": 1}, bucket=PathBucket("my-trial"))
trial.store({"config.json": trial.config})
print(trial.storage)
PARAMETER | DESCRIPTION |
---|---|
items |
The items to store, a dict from the key to store it under
to the item itself.If using a |
Source code in src/amltk/optimization/trial.py
success
#
Generate a success report.
from amltk.optimization import Trial, Metric
loss_metric = Metric("loss", minimize=True)
trial = Trial.create(name="trial", config={"x": 1}, metrics=[loss_metric])
report = trial.success(loss=1)
print(report)
Trial.Report(trial=Trial(name='trial', config={'x': 1}, bucket=PathBucket(PosixPath('trial-trial-2024-04-24T14:12:09.965253')), metrics=MetricCollection(metrics={'loss': Metric(name='loss', minimize=True, bounds=None, fn=None)}), created_at=datetime.datetime(2024, 4, 24, 14, 12, 9, 965250), seed=None, fidelities={}, summary={}, storage=set(), extras={}), status=<Status.SUCCESS: 'success'>, reported_at=datetime.datetime(2024, 4, 24, 14, 12, 9, 965346), exception=None, values={'loss': 1})
PARAMETER | DESCRIPTION |
---|---|
**metrics |
The metrics of the trial, where the key is the name of the metrics and the value is the metric. |
RETURNS | DESCRIPTION |
---|---|
Report[I]
|
The report of the trial. |
Source code in src/amltk/optimization/trial.py
options: members: False
History#
amltk.optimization.history
#
The History
is
used to keep a structured record of what occured with
Trial
s and their associated
Report
s.
Usage
from amltk.optimization import Trial, History, Metric
from amltk.store import PathBucket
loss = Metric("loss", minimize=True)
def target_function(trial: Trial) -> Trial.Report:
x = trial.config["x"]
y = trial.config["y"]
trial.store({"config.json": trial.config})
loss = x**2 - y
return trial.success(loss=loss)
# ... usually obtained from an optimizer
bucket = PathBucket("all-trial-results")
history = History()
for x, y in zip([1, 2, 3], [4, 5, 6]):
name = f"trial_{x}_{y}"
trial = Trial.create(name=name, config={"x": x, "y": y}, bucket=bucket / name, metrics=[loss])
report = target_function(trial)
history.add(report)
print(history.df())
bucket.rmdir() # markdon-exec: hide
status trial_seed ... config:x config:y
name ...
trial_1_4 success
You'll often need to perform some operations on a
History
so we provide some utility functions here:
filter(key=...)
- Filters the history by some predicate, e.g.history.filter(lambda report: report.status == "success")
groupby(key=...)
- Groups the history by some key, e.g.history.groupby(lambda report: report.config["x"] < 5)
sortby(key=...)
- Sorts the history by some key, e.g.history.sortby(lambda report: report.profiles["trial"].time.end)
There is also some serialization capabilities built in, to allow you to store your reports and load them back in later:
df(...)
- Output apd.DataFrame
of all the information available.from_df(...)
- Create aHistory
from apd.DataFrame
.
You can also retrieve individual reports from the history by using their
name, e.g. history.reports["some-unique-name"]
or iterate through
the history with for report in history: ...
.