Quickstart#

Make sure you first followed the setup guide.

We will be using the synthetic MFHartmann for this tutorial as this requires no downloads to run.

In general, the only import you should need for generic use is just import mfpbench and using mfpbench.get(...) to get a benchmark.

There are also some nuances when working with tabular data that should be mentioned, see Tabular Benchmarks for more information.

Quick Reference

Useful Properties

.space - The space of the benchmark
.start - The starting fidelity of the benchmark
.end - The end fidelity of the benchmark
.fidelity_name - The name of the fidelity
.table - The table backing a TabularBenchmark.
Config - The type of config used by the benchmark will be attached to the benchmark object.
Result - The type of result used by the benchmark will be attached to the benchmark object.

Main Methods

sample(n) - Sample one or many configs from a benchmark
query(config, at) - Query a benchmark for a given fidelity
trajectory(config) - Get the full trajectory curve of a config

Other

load() - Load a benchmark into memory if not already

Getting a benchmark#

We try to make it so the normal use case of a benchmark is as simple as possible. For this we use get(). Each benchmarks comes with it's own **kwargs but you can find them in the API documentation of get().

Get a benchmark

import mfpbench

benchmark = mfpbench.get("mfh3")
print(benchmark.name)

mfh3

API

Get a benchmark.

PARAMETER	DESCRIPTION
`name`	The name of the benchmark TYPE: `str`
`prior`	The prior to use for the benchmark. * str - If it ends in {.json} or {.yaml, .yml}, it will convert it to a path and use it as if it is a path to a config. Otherwise, it is treated as preset * Path - path to a file * Config - A Config object * None - Use the default if available TYPE: `str \| Path \| Config \| None` DEFAULT: `None`
`preload`	Whether to preload the benchmark data in TYPE: `bool` DEFAULT: `False`
`**kwargs`	Extra arguments, optional or required for other benchmarks. Please look up the associated benchmarks. TYPE: `Any` DEFAULT: `{}`

For the **kwargs, please see the benchmarks listed below by name=

name='lcbench' (YAHPO-GYM)

Possible task_id=:

('3945', '7593', '34539', '126025', '126026', '126029', '146212', '167104', '167149', '167152', '167161', '167168', '167181', '167184', '167185', '167190', '167200', '167201', '168329', '168330', '168331', '168335', '168868', '168908', '168910', '189354', '189862', '189865', '189866', '189873', '189905', '189906', '189908', '189909')

PARAMETER	DESCRIPTION
`task_id`	The task id to choose. TYPE: `str`
`seed`	The seed to use TYPE: `int \| None` DEFAULT: `None`
`datadir`	The path to where mfpbench stores it data. If left to `None`, will use the `_default_download_dir = ./data/yahpo-gym-data`. TYPE: `str \| Path \| None` DEFAULT: `None`
`seed`	The seed for the benchmark instance TYPE: `int \| None` DEFAULT: `None`
`prior`	The prior to use for the benchmark. If None, no prior is used. If a str, will check the local location first for a prior specific for this benchmark, otherwise assumes it to be a Path. If a Path, will load the prior from the path. If a Mapping, will be used directly. TYPE: `str \| Path \| C \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	If given, will perturb the prior by this amount. Only used if `prior=` is given as a config. TYPE: `float \| None` DEFAULT: `None`
`session`	The onnxruntime session to use. If None, will create a new one. Not for faint hearted This is only a backdoor for onnx compatibility issues with YahpoGym. You are advised not to use this unless you know what you are doing. TYPE: `InferenceSession \| None` DEFAULT: `None`

name='lm1b_transformer_2048' (PD1)

PARAMETER	DESCRIPTION
`datadir`	Path to the data directory TYPE: `str \| Path \| None` DEFAULT: `None`
`seed`	The seed to use for the space TYPE: `int \| None` DEFAULT: `None`
`prior`	Any prior to use for the benchmark TYPE: `str \| Path \| C \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	Whether to perturb the prior. If specified, this is interpreted as the std of a normal from which to perturb numerical hyperparameters of the prior, and the raw probability of swapping a categorical value. TYPE: `float \| None` DEFAULT: `None`

name='uniref50_transformer_128' (PD1)

PARAMETER	DESCRIPTION
`datadir`	Path to the data directory TYPE: `str \| Path \| None` DEFAULT: `None`
`seed`	The seed to use for the space TYPE: `int \| None` DEFAULT: `None`
`prior`	Any prior to use for the benchmark TYPE: `str \| Path \| C \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	Whether to perturb the prior. If specified, this is interpreted as the std of a normal from which to perturb numerical hyperparameters of the prior, and the raw probability of swapping a categorical value. TYPE: `float \| None` DEFAULT: `None`

name='cifar100_wideresnet_2048' (PD1)

PARAMETER	DESCRIPTION
`datadir`	Path to the data directory TYPE: `str \| Path \| None` DEFAULT: `None`
`seed`	The seed to use for the space TYPE: `int \| None` DEFAULT: `None`
`prior`	Any prior to use for the benchmark TYPE: `str \| Path \| C \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	Whether to perturb the prior. If specified, this is interpreted as the std of a normal from which to perturb numerical hyperparameters of the prior, and the raw probability of swapping a categorical value. TYPE: `float \| None` DEFAULT: `None`

name='imagenet_resnet_512' (PD1)

PARAMETER	DESCRIPTION
`datadir`	Path to the data directory TYPE: `str \| Path \| None` DEFAULT: `None`
`seed`	The seed to use for the space TYPE: `int \| None` DEFAULT: `None`
`prior`	Any prior to use for the benchmark TYPE: `str \| Path \| C \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	Whether to perturb the prior. If specified, this is interpreted as the std of a normal from which to perturb numerical hyperparameters of the prior, and the raw probability of swapping a categorical value. TYPE: `float \| None` DEFAULT: `None`

name='jahs'

Possible task_id=:

('CIFAR10', 'ColorectalHistology', 'FashionMNIST')

PARAMETER	DESCRIPTION
`task_id`	The specific task to use. TYPE: `Literal['CIFAR10', 'ColorectalHistology', 'FashionMNIST']`
`datadir`	The path to where mfpbench stores it data. If left to `None`, will use `_default_download_dir = "./data/jahs-bench-data"`. TYPE: `str \| Path \| None` DEFAULT: `None`
`seed`	The seed to give this benchmark instance TYPE: `int \| None` DEFAULT: `None`
`prior`	The prior to use for the benchmark. if `str` - A preset if `Path` - path to a file if `dict`, Config, Configuration - A config if `None` - Use the default if available TYPE: `str \| Path \| JAHSConfig \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	If given, will perturb the prior by this amount. Only used if `prior=` is given as a config. TYPE: `float \| None` DEFAULT: `None`

name='mfh3'

PARAMETER	DESCRIPTION
`seed`	The seed to use. TYPE: `int \| None` DEFAULT: `None`
`bias`	How much bias to introduce TYPE: `float \| None` DEFAULT: `None`
`noise`	How much noise to introduce TYPE: `float \| None` DEFAULT: `None`
`prior`	The prior to use for the benchmark. if `Path` - path to a file if `Mapping` - Use directly if `None` - There is no prior TYPE: `str \| Path \| C \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	If not None, will perturb the prior by this amount. For numericals, while for categoricals, this is interpreted as the probability of swapping the value for a random one. TYPE: `float \| None` DEFAULT: `None`

name='mfh6'

PARAMETER	DESCRIPTION
`seed`	The seed to use. TYPE: `int \| None` DEFAULT: `None`
`bias`	How much bias to introduce TYPE: `float \| None` DEFAULT: `None`
`noise`	How much noise to introduce TYPE: `float \| None` DEFAULT: `None`
`prior`	The prior to use for the benchmark. if `Path` - path to a file if `Mapping` - Use directly if `None` - There is no prior TYPE: `str \| Path \| C \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	If not None, will perturb the prior by this amount. For numericals, while for categoricals, this is interpreted as the probability of swapping the value for a random one. TYPE: `float \| None` DEFAULT: `None`

name='lcbench_tabular'

Possible task_id=:

('adult', 'airlines', 'albert', 'Amazon_employee_access', 'APSFailure', 'Australian', 'bank-marketing', 'blood-transfusion-service-center', 'car', 'christine', 'cnae-9', 'connect-4', 'covertype', 'credit-g', 'dionis', 'fabert', 'Fashion-MNIST', 'helena', 'higgs', 'jannis', 'jasmine', 'jungle_chess_2pcs_raw_endgame_complete', 'kc1', 'KDDCup09_appetency', 'kr-vs-kp', 'mfeat-factors', 'MiniBooNE', 'nomao', 'numerai28.6', 'phoneme', 'segment', 'shuttle', 'sylvine', 'vehicle', 'volkert')

PARAMETER	DESCRIPTION
`task_id`	The task to benchmark on. TYPE: `str`
`datadir`	The directory to look for the data in. If `None`, uses the default download directory. TYPE: `str \| Path \| None` DEFAULT: `None`
`remove_constants`	Whether to remove constant config columns from the data or not. TYPE: `bool` DEFAULT: `False`
`seed`	The seed to use. TYPE: `int \| None` DEFAULT: `None`
`prior`	The prior to use for the benchmark. If None, no prior is used. If a str, will check the local location first for a prior specific for this benchmark, otherwise assumes it to be a Path. If a Path, will load the prior from the path. If a Mapping, will be used directly. TYPE: `str \| Path \| LCBenchTabularConfig \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	If not None, will perturb the prior by this amount. For numericals, this is interpreted as the standard deviation of a normal distribution while for categoricals, this is interpreted as the probability of swapping the value for a random one. TYPE: `float \| None` DEFAULT: `None`

Source code in src/mfpbench/get.py

def get(
    name: str,
    *,
    prior: str | Path | Config | None = None,
    preload: bool = False,
    **kwargs: Any,
) -> Benchmark:
    """Get a benchmark.

    Args:
        name: The name of the benchmark
        prior: The prior to use for the benchmark.
            * str -
                If it ends in {.json} or {.yaml, .yml}, it will convert it to a path and
                use it as if it is a path to a config. Otherwise, it is treated as preset
            * Path - path to a file
            * Config - A Config object
            * None - Use the default if available
        preload: Whether to preload the benchmark data in
        **kwargs: Extra arguments, optional or required for other benchmarks. Please
            look up the associated benchmarks.

    For the `#!python **kwargs`, please see the benchmarks listed below by `name=`

    ??? note "`#!python name='lcbench'` (YAHPO-GYM)"

        Possible `#!python task_id=`:

        ::: mfpbench.LCBenchBenchmark.yahpo_instances

        ::: mfpbench.LCBenchBenchmark.__init__
            options:
                show_source: false

    ??? note "`#!python name='lm1b_transformer_2048'` (PD1)"

        ::: mfpbench.PD1lm1b_transformer_2048.__init__
            options:
                show_source: false

    ??? note "`#!python name='uniref50_transformer_128'` (PD1)"

        ::: mfpbench.PD1uniref50_transformer_128.__init__
            options:
                show_source: false

    ??? note "`#!python name='cifar100_wideresnet_2048'` (PD1)"

        ::: mfpbench.PD1cifar100_wideresnet_2048.__init__
            options:
                show_source: false

    ??? note "`#!python name='imagenet_resnet_512'` (PD1)"

        ::: mfpbench.PD1imagenet_resnet_512.__init__
            options:
                show_source: false

    ??? note "`#!python name='jahs'`"

        Possible `#!python task_id=`:

        ::: mfpbench.JAHSBenchmark.task_ids

        ::: mfpbench.JAHSBenchmark.__init__
            options:
                show_source: false

    ??? note "`#!python name='mfh3'`"

        ::: mfpbench.MFHartmann3Benchmark.__init__
            options:
                show_source: false

    ??? note "`#!python name='mfh6'`"

        ::: mfpbench.MFHartmann6Benchmark.__init__
            options:
                show_source: false

    ??? note "`#!python name='lcbench_tabular'`"

        Possible `#!python task_id=`:

        ::: mfpbench.LCBenchTabularBenchmark.task_ids

        ::: mfpbench.LCBenchTabularBenchmark.__init__
            options:
                show_source: false


    """  # noqa: E501
    b = _mapping.get(name, None)
    bench: Benchmark

    if b is None:
        raise ValueError(f"{name} is not a benchmark in {list(_mapping.keys())}")

    if isinstance(prior, str) and any(
        prior.endswith(suffix) for suffix in [".json", ".yaml", ".yml"]
    ):
        prior = Path(prior)

    bench = b(prior=prior, **kwargs)

    if preload:
        bench.load()

    return bench

Preloading benchmarks

By default, benchmarks will not load in required data or surrogate models. To have these ready and in memory, you can pass in preload=True.

Properties of Benchmarks#

All benchmarks inherit from Benchmark which has some useful properties we might want to know about:

Benchmark Properties

print(f"Benchmark fidelity starts at: {benchmark.start}")
print(f"Benchmark fidelity ends at: {benchmark.end}")
print(f"Benchmark fidelity is called: {benchmark.fidelity_name}")
print(f"Benchmark has conditionals: {benchmark.has_conditionals}")
print("Benchmark has the following space")
print(benchmark.space)

Benchmark fidelity starts at: 3
Benchmark fidelity ends at: 100
Benchmark fidelity is called: z
Benchmark has conditionals: False
Benchmark has the following space
Configuration space object:
  Hyperparameters:
mfh3
    X_0, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5
    X_1, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5
    X_2, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5

Sampling from a benchmark#

To sample from a benchmark, we use the sample() method. This method takes in a number of samples to return and returns a list of configs.

Sample from a benchmark

config = benchmark.sample()
print(config)

configs = benchmark.sample(10, seed=2)

MFHartmann3Config(X_0=0.21397264541817052, X_1=0.03910426573268033, X_2=0.45116976402423326)

Querying a benchmark#

To query a benchmark, we use the query() method. This method takes in a config and a fidelity to query at and returns the Result of the benchmark at that fidelity. By default, this will return at the maximum fidelity but you can pass at= to query at a different fidelity.

Query a benchmark

value = benchmark.query(config)
print(value)

value = benchmark.query(config, at=benchmark.start)
print(value)

MFHartmannResult(fidelity=100, value=-0.3311697366757019, fid_cost=1.0)
MFHartmannResult(fidelity=3, value=-0.05551820886263131, fid_cost=0.050855000000000004)

When querying a benchmark, you can get the entire trajctory curve of a config with trajectory(). This will be a list[Result], one for each fidelity available.

Get the trajectory of a config

trajectory = benchmark.trajectory(config)
print(len(trajectory))

errors = [r.error for r in trajectory]

trajectory = benchmark.trajectory(config, frm=benchmark.start, to=benchmark.end // 2)
print(len(trajectory))

98
48

Tip

The query and trajectory function can take in a Config object or anything that looks like a mapping.

Working with `Config` objects#

When interacting with a Benchmark, you will always be returned Config objects. These contain some simple methods to make working with them easier.

They behave like a non-mutable dictionary so you can use them like a non-mutable dictionary.

config = benchmark.sample()
print("index", config["X_1"])

print("get", config.get("X_1231", 42))

for key, value in config.items():
    print(key, value)

print("contains", "X_1" in config)

print("len", len(config))

print("dict", dict(config))

index 0.32085906581482715
get 42
X_0 0.3932037853293264
X_1 0.32085906581482715
X_2 0.2698739842726221
contains True
len 3
dict {'X_0': 0.3932037853293264, 'X_1': 0.32085906581482715, 'X_2': 0.2698739842726221}

How is that done?

This is done by inheriting from python's Mapping class and implementing it's methods, namely __getitem__() __iter__(), __len__(). You can also implement things to look like lists, containers and other pythonic things!

dict()/from_dict()copy()mutate()perturb()save()/from_file()

Config.dict() returns a dictionary of the config. This is useful for working with the config in other libraries.

config = benchmark.sample()
print(config)

config_dict = config.dict()
print(config_dict)

new_config = benchmark.Config.from_dict(config_dict)
print(new_config)

MFHartmann3Config(X_0=0.5813966650388789, X_1=0.4193927552821569, X_2=0.9450414913392458)
{'X_0': 0.5813966650388789, 'X_1': 0.4193927552821569, 'X_2': 0.9450414913392458}
MFHartmann3Config(X_0=0.5813966650388789, X_1=0.4193927552821569, X_2=0.9450414913392458)

Config.copy() returns a new config with the same values.

config = benchmark.sample()
print(config)

new_config = config.copy()
print(new_config)
print(new_config == config)

MFHartmann3Config(X_0=0.10318422593399368, X_1=0.9347920100442383, X_2=0.6802034397555954)
MFHartmann3Config(X_0=0.10318422593399368, X_1=0.9347920100442383, X_2=0.6802034397555954)
True

Config.mutate() takes in a dictionary of keys to values and returns a new config with those values changed.

config = benchmark.sample()
print(config)

new_config = config.mutate(X_1=0.5)
print(new_config)

MFHartmann3Config(X_0=0.10020405283458711, X_1=0.3847430385366982, X_2=0.3763848226810871)
MFHartmann3Config(X_0=0.10020405283458711, X_1=0.5, X_2=0.3763848226810871)

Config.perturb() takes in the space the config is from, a standard deviation and/or a categorical swap change and returns a new config with the values perturbed by a normal distribution with the given standard deviation and/or the categorical swap change.

config = benchmark.sample()
print(config)

perturbed_config = config.perturb(space=benchmark.space, std=0.2)
print(perturbed_config)

MFHartmann3Config(X_0=0.1357023740863451, X_1=0.23077038986742926, X_2=0.8823950508970834)
MFHartmann3Config(X_0=0.11168565072935235, X_1=0.3034037030248159, X_2=0.8836056310127075)

Config.save() and Config.from_file() are used to save and load configs to and from disk.

config = benchmark.sample()
print(config)

config.save("example_config.yaml")
loaded_config = benchmark.Config.from_file("example_config.yaml")

config.save("example_config.json")
loaded_config = benchmark.Config.from_file("example_config.json")

print(loaded_config == config)

MFHartmann3Config(X_0=0.7075448956004533, X_1=0.3485179542630552, X_2=0.6901479678873498)
True

Working with `Result` objects#

When interacting with a Benchmark, all results will be communicated back with Result objects. These contain some simple methods to make working with them easier. Every benchmark will have a different set of results available but in general we try to make at least an error and score available. We also make a cost available for benchmarks, which is often something like the time taken to train the specific config. These error and score attributes are usually validation errors and scores. Some benchmarks also provide a test_error and test_score which are the test errors and scores, but not all.

config = benchmark.sample()
result = benchmark.query(config)

print("error", result.error)
print("cost", result.cost)

print(result)

error -0.3956291475964744
cost 1.0
MFHartmannResult(fidelity=100, value=-0.3956291475964744, fid_cost=1.0)

These share the dict() and from_dict() methods as Config objects but do not behave like dictionaries.

The most notable property of Result objects is that also have the fidelity at which they were evaluated at and also the config that was evaluated to generate the results.

Tabular Benchmarks#

Some benchmarks are tabular in nature, meaning they have a table of results that can be queried. These benchmarks inherit from TabularBenchmark and have a table property that is the ground source of truth for the benchmark. This table is a pandas.DataFrame and can be queried as such.

In general, tabular benchmarks will have to construct themselves using the base TabularBenchmark This requires the follow arguments which can be used to normalize the table for efficient indexing and usage. Predefined tabular benchmarks will fill these in easily for you, e.g. LCBenchTabularBenchmark.

Required arguments for a TabularBenchmark

The main required arguments are .config_name, .fidelity_name, .config_keys, .result_keys

PARAMETER	DESCRIPTION
`name`	The name of this benchmark. TYPE: `str`
`table`	The table to use for the benchmark. TYPE: `DataFrame`
`config_name`	The column in the table that contains the config id TYPE: `str`
`fidelity_name`	The column in the table that contains the fidelity TYPE: `str`
`result_keys`	The columns in the table that contain the results TYPE: `Sequence[str]`
`config_keys`	The columns in the table that contain the config values TYPE: `Sequence[str]`
`remove_constants`	Remove constant config columns from the data or not. TYPE: `bool` DEFAULT: `False`
`space`	The configuration space to use for the benchmark. If None, will just be an empty space. TYPE: `ConfigurationSpace \| None` DEFAULT: `None`
`prior`	The prior to use for the benchmark. If None, no prior is used. If a string, will be treated as a prior specific for this benchmark if it can be found, otherwise assumes it to be a Path. If a Path, will load the prior from the path. If a dict or Configuration, will be used directly. TYPE: `str \| Path \| CTabular \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	If not None, will perturb the prior by this amount. For numericals, while for categoricals, this is interpreted as the probability of swapping the value for a random one. TYPE: `float \| None` DEFAULT: `None`
`seed`	The seed to use for the benchmark. TYPE: `int \| None` DEFAULT: `None`

Source code in src/mfpbench/tabular.py

def __init__(  # noqa: PLR0913, C901
    self,
    name: str,
    table: pd.DataFrame,
    *,
    config_name: str,
    fidelity_name: str,
    result_keys: Sequence[str],
    config_keys: Sequence[str],
    remove_constants: bool = False,
    space: ConfigurationSpace | None = None,
    seed: int | None = None,
    prior: str | Path | CTabular | Mapping[str, Any] | None = None,
    perturb_prior: float | None = None,
):
    """Initialize the benchmark.

    Args:
        name: The name of this benchmark.
        table: The table to use for the benchmark.
        config_name: The column in the table that contains the config id
        fidelity_name: The column in the table that contains the fidelity
        result_keys: The columns in the table that contain the results
        config_keys: The columns in the table that contain the config values
        remove_constants: Remove constant config columns from the data or not.
        space: The configuration space to use for the benchmark. If None, will
            just be an empty space.
        prior: The prior to use for the benchmark. If None, no prior is used.
            If a string, will be treated as a prior specific for this benchmark
            if it can be found, otherwise assumes it to be a Path.
            If a Path, will load the prior from the path.
            If a dict or Configuration, will be used directly.
        perturb_prior: If not None, will perturb the prior by this amount.
            For numericals, while for categoricals, this is interpreted as the
            probability of swapping the value for a random one.
        seed: The seed to use for the benchmark.
    """
    cls = self.__class__
    if remove_constants:

        def is_constant(_s: pd.Series) -> bool:
            _arr = _s.to_numpy()
            return bool((_arr == _arr[0]).all())

        constant_cols = [
            col for col in table.columns if is_constant(table[col])  # type: ignore
        ]
        table = table.drop(columns=constant_cols)  # type: ignore
        config_keys = [k for k in config_keys if k not in constant_cols]

    # If the table isn't indexed, index it
    index_cols = [config_name, fidelity_name]
    if table.index.names != index_cols:
        # Only drop the index if it's not relevant.
        relevant_cols: list[str] = [  # type: ignore
            *list(index_cols),  # type: ignore
            *list(result_keys),
            *list(config_keys),
        ]
        relevant = any(name in relevant_cols for name in table.index.names)
        table = table.reset_index(drop=not relevant)

        if config_name not in table.columns:
            raise ValueError(f"{config_name=} not in columns {table.columns}")
        if fidelity_name not in table.columns:
            raise ValueError(f"{fidelity_name=} not in columns {table.columns}")

        table = table.set_index(index_cols)
        table = table.sort_index()

    # Make sure all keys are in the table
    for key in chain(result_keys, config_keys):
        if key not in table.columns:
            raise ValueError(f"{key=} not in columns {table.columns}")

    # Make sure the keyword "id" is not in the columns as we use it to
    # identify configs
    if "id" in table.columns:
        raise ValueError(f"{table.columns=} contains 'id'. Please rename it")

    # Make sure we have equidistance fidelities for all configs
    fidelity_values = table.index.get_level_values(fidelity_name)
    fidelity_counts = fidelity_values.value_counts()
    if not (fidelity_counts == fidelity_counts.iloc[0]).all():
        raise ValueError(f"{fidelity_name=} not  uniform. \n{fidelity_counts}")

    # We now have the following table
    #
    # config_id fidelity | **metric, **config_values
    #     0         0    |
    #               1    |
    #               2    |
    #     1         0    |
    #               1    |
    #               2    |
    #   ...

    # Here we get all the unique configs
    # config_id fidelity | **metric, **config_values
    #     0         0    |
    #     1         0    |
    #   ...
    config_id_table = table.groupby(level=config_name).agg("first")
    configs = {
        str(config_id): cls.Config.from_dict(
            {
                **row[config_keys].to_dict(),  # type: ignore
                "id": str(config_id),
            },
        )
        for config_id, row in config_id_table.iterrows()
    }

    fidelity_values = table.index.get_level_values(fidelity_name).unique()

    # We just assume equidistant fidelities
    sorted_fids = sorted(fidelity_values)
    start = sorted_fids[0]
    end = sorted_fids[-1]
    step = sorted_fids[1] - sorted_fids[0]

    # Create the configuration space
    if space is None:
        space = ConfigurationSpace(name, seed=seed)

    self.table = table
    self.configs = configs
    self.fidelity_name = fidelity_name
    self.config_name = config_name
    self.config_keys = sorted(config_keys)
    self.result_keys = sorted(result_keys)
    self.fidelity_range = (start, end, step)  # type: ignore

    super().__init__(
        name=name,
        seed=seed,
        space=space,
        prior=prior,
        perturb_prior=perturb_prior,
    )

Difference for `Config`#

When working with tabular benchmarks, the config type that is used is a TabularConfig. The one difference is that it includes an .id property that is used to identify the config in the table. This is what's used to retrieve results from the table. If this is missing when doing a query(), we'll do our best to match the config to the table and get the correct id, but this is not guaranteed.

When using dict(), this id is not included in the dictionary. In general you should either store the config object itself or at least config.id, that you can include back in before calling query().

Using your own Tabular Data#

To facilitate each of use for you own usage of tabular data, we provide a GenericTabularBenchmark that can be used to load in and use your own tabular data.

import pandas as pd
from mfpbench import GenericTabularBenchmark

# Create some fake data
df = pd.DataFrame(
    {
        "config": ["a", "a", "a", "b", "b", "b", "c", "c", "c"],
        "fidelity": [1, 2, 3, 1, 2, 3, 1, 2, 3],
        "balanced_accuracy": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
        "color": ["red", "red", "red", "blue", "blue", "blue", "green", "green", "green"],
        "shape": ["circle", "circle", "circle", "square", "square", "square", "triangle", "triangle", "triangle"],
        "kind": ["mlp", "mlp", "mlp", "mlp", "mlp", "mlp", "mlp", "mlp", "mlp"],
    }
)
print(df)
print()
print()

benchmark = GenericTabularBenchmark(
    df,
    name="mydata",
    config_name="config",
    fidelity_name="fidelity",
    config_keys=["color", "shape"],
    result_keys=["balanced_accuracy"],
    result_mapping={
        "error": lambda df: 1 - df["balanced_accuracy"],
        "score": lambda df: df["balanced_accuracy"],
    },
    remove_constants=True,
)

print(benchmark.table)

  config  fidelity  balanced_accuracy  color     shape kind
0      a         1                0.1    red    circle  mlp
1      a         2                0.2    red    circle  mlp
2      a         3                0.3    red    circle  mlp
3      b         1                0.4   blue    square  mlp
4      b         2                0.5   blue    square  mlp
5      b         3                0.6   blue    square  mlp
6      c         1                0.7  green  triangle  mlp
7      c         2                0.8  green  triangle  mlp
8      c         3                0.9  green  triangle  mlp


                 balanced_accuracy  color     shape  error  score
config fidelity                                                  
a      1                       0.1    red    circle    0.9    0.1
       2                       0.2    red    circle    0.8    0.2
       3                       0.3    red    circle    0.7    0.3
b      1                       0.4   blue    square    0.6    0.4
       2                       0.5   blue    square    0.5    0.5
       3                       0.6   blue    square    0.4    0.6
c      1                       0.7  green  triangle    0.3    0.7
       2                       0.8  green  triangle    0.2    0.8
       3                       0.9  green  triangle    0.1    0.9

You can then operate on this benchmark as expected.

config = benchmark.sample()
print(config)

result = benchmark.query(config, at=2)

print(result)

GenericTabularConfig(id='c', _values={'color': 'green', 'shape': 'triangle'})
GenericTabularResult(fidelity=2, _values={'balanced_accuracy': 0.8, 'error': 0.19999999999999996, 'score': 0.8})

API for GenericTabularBenchmark

PARAMETER	DESCRIPTION
`table`	The table to use for the benchmark TYPE: `DataFrame`
`name`	The name of the benchmark. If None, will be set to `unknown-{datetime.now().isoformat()}` TYPE: `str \| None` DEFAULT: `None`
`fidelity_name`	The column in the table that contains the fidelity TYPE: `str`
`config_name`	The column in the table that contains the config id TYPE: `str`
`result_keys`	The columns in the table that contain the results TYPE: `Sequence[str]`
`config_keys`	The columns in the table that contain the config values TYPE: `Sequence[str]`
`result_mapping`	A mapping from the result keys to the table keys. If a string, will be used as the key in the table. If a callable, will be called with the table and the result will be used as the value. TYPE: `dict[str, str \| Callable[[DataFrame], Any]] \| None` DEFAULT: `None`
`remove_constants`	Remove constant config columns from the data or not. TYPE: `bool` DEFAULT: `False`
`space`	The configuration space to use for the benchmark. If None, will just be an empty space. TYPE: `ConfigurationSpace \| None` DEFAULT: `None`
`seed`	The seed to use. TYPE: `int \| None` DEFAULT: `None`
`prior`	The prior to use for the benchmark. If None, no prior is used. If a str, will check the local location first for a prior specific for this benchmark, otherwise assumes it to be a Path. If a Path, will load the prior from the path. If a Mapping, will be used directly. TYPE: `str \| Path \| GenericTabularConfig \| Mapping[str, Any] \| None` DEFAULT: `None`
`perturb_prior`	If not None, will perturb the prior by this amount. For numericals, this is interpreted as the standard deviation of a normal distribution while for categoricals, this is interpreted as the probability of swapping the value for a random one. TYPE: `float \| None` DEFAULT: `None`

Source code in src/mfpbench/tabular.py

def __init__(  # noqa: PLR0913
    self,
    table: pd.DataFrame,
    *,
    name: str | None = None,
    fidelity_name: str,
    config_name: str,
    result_keys: Sequence[str],
    config_keys: Sequence[str],
    result_mapping: (dict[str, str | Callable[[pd.DataFrame], Any]] | None) = None,
    remove_constants: bool = False,
    space: ConfigurationSpace | None = None,
    seed: int | None = None,
    prior: str | Path | GenericTabularConfig | Mapping[str, Any] | None = None,
    perturb_prior: float | None = None,
):
    """Initialize the benchmark.

    Args:
        table: The table to use for the benchmark
        name: The name of the benchmark. If None, will be set to
            `unknown-{datetime.now().isoformat()}`

        fidelity_name: The column in the table that contains the fidelity
        config_name: The column in the table that contains the config id
        result_keys: The columns in the table that contain the results
        config_keys: The columns in the table that contain the config values
        result_mapping: A mapping from the result keys to the table keys.
            If a string, will be used as the key in the table. If a callable,
            will be called with the table and the result will be used as the value.
        remove_constants: Remove constant config columns from the data or not.
        space: The configuration space to use for the benchmark. If None, will
            just be an empty space.
        seed: The seed to use.
        prior: The prior to use for the benchmark. If None, no prior is used.
            If a str, will check the local location first for a prior
            specific for this benchmark, otherwise assumes it to be a Path.
            If a Path, will load the prior from the path.
            If a Mapping, will be used directly.
        perturb_prior: If not None, will perturb the prior by this amount.
            For numericals, this is interpreted as the standard deviation of a
            normal distribution while for categoricals, this is interpreted
            as the probability of swapping the value for a random one.
    """
    if name is None:
        name = f"unknown-{datetime.now().isoformat()}"

    _result_mapping: dict = result_mapping if result_mapping is not None else {}

    # Remap the result keys so it works with the generic result types
    if _result_mapping is not None:
        for k, v in _result_mapping.items():
            if isinstance(v, str):
                if v not in table.columns:
                    raise ValueError(f"{v} not in columns\n{table.columns}")

                table[k] = table[v]
            elif callable(v):
                table[k] = v(table)
            else:
                raise ValueError(f"Unknown result mapping {v} for {k}")

    super().__init__(
        name=name,
        table=table,
        config_name=config_name,
        fidelity_name=fidelity_name,
        result_keys=[*result_keys, *_result_mapping.keys()],
        config_keys=config_keys,
        remove_constants=remove_constants,
        space=space,
        seed=seed,
        prior=prior,
        perturb_prior=perturb_prior,
    )

Quickstart#

Getting a benchmark#

Properties of Benchmarks#

Sampling from a benchmark#

Querying a benchmark#

Working with Config objects#

Working with Result objects#

Tabular Benchmarks#

Difference for Config#

Using your own Tabular Data#

Working with `Config` objects#

Working with `Result` objects#

Difference for `Config`#