Configuring and Running Optimizations
The neps.run
function is the core of the NePS optimization process, where the search for the best hyperparameters
and architectures takes place. This document outlines the arguments and options available within this function,
providing a detailed guide to customize the optimization process to your specific needs.
Search Strategy
At default NePS intelligently selects the most appropriate search strategy based on your defined configurations in
pipeline_space
.
The characteristics of your search space, as represented in the pipeline_space
, play a crucial role in determining
which optimizer NePS will choose. This automatic selection process ensures that the strategy aligns perfectly
with the specific requirements and nuances of your search space, thereby optimizing the effectiveness of the
hyperparameter and/or architecture optimization. You can also manually select a specific or custom optimizer that better
matches your specific needs. For more information, refer here.
Arguments
Mandatory Arguments
run_pipeline
(function): The objective function, targeted by NePS for minimization, by evaluation various configurations. It requires these configurations as input and should return either a dictionary or a sole loss value as the output. For correct setup instructions, refer to here-
pipeline_space
(dict | yaml | configspace): This defines the search space for the configurations from which the optimizer samples. It accepts either a dictionary with the configuration names as keys, a path to a YAML configuration file, or a configSpace.ConfigurationSpace object. For comprehensive information and examples, please refer to the detailed guide available here -
root_directory
(str): The directory path where the information about the optimization and its progress gets stored. This is also used to synchronize multiple calls to run(.) for parallelization. -
Budget: To define a budget, provide either or both of the following parameters:
max_evaluations_total
(int, default: None): Specifies the total number of evaluations to conduct before halting the optimization process.max_cost_total
(int, default: None): Prevents the initiation of new evaluations once this cost threshold is surpassed. This requires adding a cost value to the output of therun_pipeline
function, for example, return {'loss': loss, 'cost': cost}. For more details, please refer here
Optional Arguments
Further Monitoring Options
overwrite_working_directory
(bool, default: False): When set to True, the working directory specified byroot_directory
will be cleared at the beginning of the run. This is e.g. useful when debugging arun_pipeline
function.post_run_summary
(bool, default: False): When enabled, this option generates a summary CSV file upon the completion of the optimization process. The summary includes details of the optimization procedure, such as the best configuration, the number of errors occurred, and the final performance metrics.development_stage_id
(int | float | str, default: None): An optional identifier used when working with multiple development stages. Instead of creating new root directories, use this identifier to save the results of an optimization run in a separate dev_id folder within the root_directory.task_id
(int | float | str, default: None): An optional identifier used when the optimization process involves multiple tasks. This functions similarly todevelopment_stage_id
, but it creates a folder named after the task_id instead of dev_id, providing an organized way to separate results for different tasks within theroot_directory
.
Parallelization Setup
max_evaluations_per_run
(int, default: None): Limits the number of evaluations for this specific call ofneps.run
.continue_until_max_evaluation_completed
(bool, default: False): In parallel setups, pending evaluations normally count towards max_evaluations_total, halting new ones when this limit is reached. Setting this to True enables continuous sampling of new evaluations until the total of completed ones meets max_evaluations_total, optimizing resource use in time-sensitive scenarios.
For an overview and further resources on how NePS supports parallelization in distributed systems, refer to the Parallelization Overview.
Handling Errors
loss_value_on_error
(float, default: None): When set, any error encountered in an evaluated configuration will not halt the process; instead, the specified loss value will be used for that configuration.cost_value_on_error
(float, default: None): Similar toloss_value_on_error
, but for the cost value.ignore_errors
(bool, default: False): If True, errors encountered during the evaluation of configurations will be ignored, and the optimization will continue. Note: This error configs still count towards max_evaluations_total.
Search Strategy Customization
searcher
(Literal["bayesian_optimization", "hyperband",..] | BaseOptimizer, default: "default"): Specifies manually which of the optimization strategy to use. Provide a string identifying one of the built-in search strategies or an instance of a customBaseOptimizer
.searcher_path
(Path | str, default: None): A path to a custom searcher implementation.**searcher_kwargs
: Additional keyword arguments to be passed to the searcher.
For more information about the available searchers and how to customize your own, refer here.
Others
pre_load_hooks
(Iterable, default: None): A list of hook functions to be called before loading results.
Parallelization
neps.run
can be called multiple times with multiple processes or machines, to parallelize the optimization process.
Ensure that root_directory
points to a shared location across all instances to synchronize the optimization efforts.
For more information look here
Customization
The neps.run
function allows for extensive customization through its arguments, enabling to adapt the
optimization process to the complexities of your specific problems.
For a deeper understanding of how to use neps.run
in a practical scenario, take a look at our
examples and templates.