Implementation of Succesive Halving supporting multi-fidelity, multi-objective, and multi-processing.
Internally, a tracker keeps track of configurations and their bracket and stage.
The behaviour of this intensifier is as follows:
First, adds configurations from the runhistory to the tracker. The first stage is always filled-up. For example,
the user provided 4 configs with the tell-method but the first stage requires 8 configs: 4 new configs are
sampled and added together with the provided configs as a group to the tracker.
While loop:
If a trial in the tracker has not been yielded yet, yield it.
If we are running out of trials, we simply add a new batch of configurations to the first stage.
eta : int, defaults to 3
Input that controls the proportion of configurations discarded in each round of Successive Halving.
n_seeds : int, defaults to 1
How many seeds to use for each instance.
instance_seed_order : str, defaults to "shuffle_once"
How to order the instance-seed pairs. Can be set to:
- `None`: No shuffling at all and use the instance-seed order provided by the user.
- `shuffle_once`: Shuffle the instance-seed keys once and use the same order across all runs.
- `shuffle`: Shuffles the instance-seed keys for each bracket individually.
incumbent_selection : str, defaults to "highest_observed_budget"
How to select the incumbent when using budgets. Can be set to:
- `any_budget`: Incumbent is the best on any budget i.e., best performance regardless of budget.
- `highest_observed_budget`: Incumbent is the best in the highest budget run so far.
- `highest_budget`: Incumbent is selected only based on the highest budget.
max_incumbents : int, defaults to 10
How many incumbents to keep track of in the case of multi-objective.
seed : int, defaults to None
Internal seed used for random events like shuffle seeds.
Source code in smac/intensifier/successive_halving.py
def__init__(self,scenario:Scenario,eta:int=3,n_seeds:int=1,instance_seed_order:str|None="shuffle_once",max_incumbents:int=10,incumbent_selection:str="highest_observed_budget",seed:int|None=None,):super().__init__(scenario=scenario,n_seeds=n_seeds,max_incumbents=max_incumbents,seed=seed,)self._eta=etaself._instance_seed_order=instance_seed_orderself._incumbent_selection=incumbent_selectionself._highest_observed_budget_only=Falseifincumbent_selection=="any_budget"elseTrue# Global variables derived from scenarioself._min_budget=self._scenario.min_budgetself._max_budget=self._scenario.max_budget
def__post_init__(self)->None:"""Post initialization steps after the runhistory has been set."""super().__post_init__()# We generate our instance seed pairs onceis_keys=self.get_instance_seed_keys_of_interest()# Budgets, followed by lots of sanity-checkingeta=self._etamin_budget=self._min_budgetmax_budget=self._max_budgetifmax_budgetisnotNoneandmin_budgetisnotNoneandmax_budget<min_budget:raiseValueError("Max budget has to be larger than min budget.")ifself.uses_instances:ifisinstance(min_budget,float)orisinstance(max_budget,float):raiseValueError("Successive Halving requires integer budgets when using instances.")min_budget=min_budgetifmin_budgetisnotNoneelse1max_budget=max_budgetifmax_budgetisnotNoneelselen(is_keys)ifmax_budget>len(is_keys):raiseValueError(f"Max budget of {max_budget} can not be greater than the number of instance-seed "f"keys ({len(is_keys)}).")ifmax_budget<len(is_keys):logger.warning(f"Max budget {max_budget} does not include all instance seed "f"pairs ({len(is_keys)}).")else:ifmin_budgetisNoneormax_budgetisNone:raiseValueError("Successive Halving requires the parameters min_budget and max_budget defined in the scenario.")iflen(is_keys)!=1:raiseValueError("Successive Halving supports only one seed when using budgets.")ifmin_budgetisNoneormin_budget<=0:raiseValueError("Min budget has to be larger than 0.")budget_type="INSTANCES"ifself.uses_instanceselse"BUDGETS"logger.info(f"Successive Halving uses budget type {budget_type} with eta {eta}, "f"min budget {min_budget}, and max budget {max_budget}.")# Pre-computing Successive Halving variablesmax_iter=self._get_max_iterations(eta,max_budget,min_budget)budgets,n_configs=self._compute_configs_and_budgets_for_stages(eta,max_budget,max_iter)# Global variablesself._min_budget=min_budgetself._max_budget=max_budget# Stage variables, depending on the bracket (0 is the bracket here since SH only has one bracket)self._max_iterations:dict[int,int]={0:max_iter+1}self._n_configs_in_stage:dict[int,list]={0:n_configs}self._budgets_in_stage:dict[int,list]={0:budgets}
The intensifier makes use of a callback to efficiently update the incumbent based on the runhistory
(every time new information is available). Moreover, incorporating the callback here allows developers
more options in the future.
Source code in smac/intensifier/abstract_intensifier.py
defget_callback(self)->Callback:"""The intensifier makes use of a callback to efficiently update the incumbent based on the runhistory (every time new information is available). Moreover, incorporating the callback here allows developers more options in the future. """classRunHistoryCallback(Callback):def__init__(self,intensifier:AbstractIntensifier):self.intensifier=intensifierdefon_tell_end(self,smbo:smac.main.smbo.SMBO,info:TrialInfo,value:TrialValue)->None:self.intensifier.update_incumbents(info.config)returnRunHistoryCallback(self)
defget_incumbent(self)->Configuration|None:"""Returns the current incumbent in a single-objective setting."""ifself._scenario.count_objectives()>1:raiseValueError("Cannot get a single incumbent for multi-objective optimization.")iflen(self._incumbents)==0:returnNoneassertlen(self._incumbents)==1returnself._incumbents[0]
There are situations in which incumbents are evaluated on more trials than others. This method returns the
instances that are not part of the lowest intersection of instances for all incumbents.
Source code in smac/intensifier/abstract_intensifier.py
defget_incumbent_instance_seed_budget_key_differences(self,compare:bool=False)->list[InstanceSeedBudgetKey]:"""There are situations in which incumbents are evaluated on more trials than others. This method returns the instances that are not part of the lowest intersection of instances for all incumbents. """incumbents=self.get_incumbents()iflen(incumbents)>0:# We want to calculate the differences so that we can evaluate the other incumbents on the same instancesincumbent_isb_keys=[self.get_instance_seed_budget_keys(incumbent,compare)forincumbentinincumbents]iflen(incumbent_isb_keys)<=1:return[]# Compute the actual differencesintersection_isb_keys=set.intersection(*map(set,incumbent_isb_keys))# type: ignoreunion_isb_keys=set.union(*map(set,incumbent_isb_keys))# type: ignoreincumbent_isb_keys=list(union_isb_keys-intersection_isb_keys)# type: ignoreiflen(incumbent_isb_keys)==0:return[]returnincumbent_isb_keys# type: ignorereturn[]
defget_incumbent_instance_seed_budget_keys(self,compare:bool=False)->list[InstanceSeedBudgetKey]:"""Find the lowest intersection of instance-seed-budget keys for all incumbents."""incumbents=self.get_incumbents()iflen(incumbents)>0:# We want to calculate the smallest set of trials that is used by all incumbents# Reason: We can not fairly compare otherwiseincumbent_isb_keys=[self.get_instance_seed_budget_keys(incumbent,compare)forincumbentinincumbents]instances=list(set.intersection(*map(set,incumbent_isb_keys)))# type: ignorereturninstances# type: ignorereturn[]
Returns the incumbents (points on the pareto front) of the runhistory as copy. In case of a single-objective
optimization, only one incumbent (if is) is returned.
configs : list[Configuration]
The configs of the Pareto front.
sort_by : str, defaults to None
Sort the trials by cost (lowest cost first) or num_trials (config with lowest number of trials
first).
Source code in smac/intensifier/abstract_intensifier.py
defget_incumbents(self,sort_by:str|None=None)->list[Configuration]:"""Returns the incumbents (points on the pareto front) of the runhistory as copy. In case of a single-objective optimization, only one incumbent (if is) is returned. Returns ------- configs : list[Configuration] The configs of the Pareto front. sort_by : str, defaults to None Sort the trials by ``cost`` (lowest cost first) or ``num_trials`` (config with lowest number of trials first). """rh=self.runhistoryifsort_by=="cost":returnlist(sorted(self._incumbents,key=lambdaconfig:rh._cost_per_config[rh.get_config_id(config)]))elifsort_by=="num_trials":returnlist(sorted(self._incumbents,key=lambdaconfig:len(rh.get_trials(config))))elifsort_byisNone:returnlist(self._incumbents)else:raiseValueError(f"Unknown sort_by value: {sort_by}.")
Returns the instance-seed-budget keys for a given configuration. This method supports highest_budget,
which only returns the instance-seed-budget keys for the highest budget (if specified). In this case, the
incumbents in update_incumbents are only changed if the costs on the highest budget are lower.
config: Configuration
The Configuration to be queried
compare : bool, defaults to False
Get rid of the budget information for comparing if the configuration was evaluated on the same
instance-seed keys.
Source code in smac/intensifier/successive_halving.py
defget_instance_seed_budget_keys(self,config:Configuration,compare:bool=False)->list[InstanceSeedBudgetKey]:"""Returns the instance-seed-budget keys for a given configuration. This method supports ``highest_budget``, which only returns the instance-seed-budget keys for the highest budget (if specified). In this case, the incumbents in ``update_incumbents`` are only changed if the costs on the highest budget are lower. Parameters ---------- config: Configuration The Configuration to be queried compare : bool, defaults to False Get rid of the budget information for comparing if the configuration was evaluated on the same instance-seed keys. """isb_keys=self.runhistory.get_instance_seed_budget_keys(config,highest_observed_budget_only=self._highest_observed_budget_only)# If incumbent should only be changed on the highest budget, we have to kick out all budgets below the highestifself.uses_budgetsandself._incumbent_selection=="highest_budget":isb_keys=[keyforkeyinisb_keysifkey.budget==self._max_budget]ifcompare:# Get rid of duplicatesisb_keys=list(set([InstanceSeedBudgetKey(instance=key.instance,seed=key.seed,budget=None)forkeyinisb_keys]))returnisb_keys
Returns a list of instance-seed keys. Considers seeds and instances from the
runhistory (self._tf_seeds and self._tf_instances). If no seeds or instances were found, new
seeds and instances are generated based on the global intensifier seed.
validate : bool, defaults to False
Whether to get validation trials or training trials. The only difference lies in different seeds.
seed : int | None, defaults to None
The seed used for the validation trials.
defget_instance_seed_keys_of_interest(self,*,validate:bool=False,seed:int|None=None,)->list[InstanceSeedKey]:"""Returns a list of instance-seed keys. Considers seeds and instances from the runhistory (``self._tf_seeds`` and ``self._tf_instances``). If no seeds or instances were found, new seeds and instances are generated based on the global intensifier seed. Warning ------- The passed seed is only used for validation. For training, the global intensifier seed is used. Parameters ---------- validate : bool, defaults to False Whether to get validation trials or training trials. The only difference lies in different seeds. seed : int | None, defaults to None The seed used for the validation trials. Returns ------- instance_seed_keys : list[InstanceSeedKey] Instance-seed keys of interest. """ifself._runhistoryisNone:raiseRuntimeError("Please set the runhistory before calling this method.")iflen(self._tf_instances)==0:raiseRuntimeError("Please call __post_init__ before calling this method.")ifseedisNone:seed=0# We cache the instance-seed keys for efficiency and consistency reasonsif(self._instance_seed_keysisNoneandnotvalidate)or(self._instance_seed_keys_validationisNoneandvalidate):instance_seed_keys:list[InstanceSeedKey]=[]ifvalidate:rng=np.random.RandomState(seed)else:rng=self._rngi=0whileTrue:found_enough_configs=(self._max_config_callsisnotNoneandlen(instance_seed_keys)>=self._max_config_calls)used_enough_seeds=self._n_seedsisnotNoneandi>=self._n_seedsiffound_enough_configsorused_enough_seeds:breakifvalidate:next_seed=int(rng.randint(low=0,high=MAXINT,size=1)[0])else:try:next_seed=self._tf_seeds[i]logger.info(f"Added existing seed {next_seed} from runhistory to the intensifier.")exceptIndexError:# Use global random generator for a new seed and mark it so it will be reused for another confignext_seed=int(rng.randint(low=0,high=MAXINT,size=1)[0])# This line here is really important because we don't want to add the same seed twiceifnext_seedinself._tf_seeds:continueself._tf_seeds.append(next_seed)logger.debug(f"Added a new random seed {next_seed} to the intensifier.")# If no instances are used, tf_instances includes Noneforinstanceinself._tf_instances:instance_seed_keys.append(InstanceSeedKey(instance,next_seed))# Only use one seed in deterministic caseifself._scenario.deterministic:logger.info("Using only one seed for deterministic scenario.")break# Seed counteri+=1# Now we cut so that we only have max_config_calls instance_seed_keys# We favor instances over seeds here: That makes sure we always work with the same instance/seed pairsifself._max_config_callsisnotNone:iflen(instance_seed_keys)>self._max_config_calls:instance_seed_keys=instance_seed_keys[:self._max_config_calls]logger.info(f"Cut instance-seed keys to {self._max_config_calls} entries.")# Set it globallyifnotvalidate:self._instance_seed_keys=instance_seed_keyselse:self._instance_seed_keys_validation=instance_seed_keysifnotvalidate:assertself._instance_seed_keysisnotNoneinstance_seed_keys=self._instance_seed_keyselse:assertself._instance_seed_keys_validationisnotNoneinstance_seed_keys=self._instance_seed_keys_validationreturninstance_seed_keys.copy()
defget_rejected_configs(self)->list[Configuration]:"""Returns rejected configurations when racing against the incumbent failed."""configs=[]forrejected_config_idinself._rejected_config_ids:configs.append(self.runhistory._ids_config[rejected_config_id])returnconfigs
defload(self,filename:str|Path)->None:"""Loads the latest state of the intensifier including the incumbents and trajectory."""ifisinstance(filename,str):filename=Path(filename)try:withopen(filename)asfp:data=json.load(fp)exceptExceptionase:logger.warning(f"Encountered exception {e} while reading runhistory from {filename}. Not adding any trials!")return# We reset the intensifier and then reset the runhistoryself.reset()ifself._runhistoryisnotNone:self.runhistory=self._runhistoryself._incumbents=[self.runhistory.get_config(config_id)forconfig_idindata["incumbent_ids"]]self._incumbents_changed=data["incumbents_changed"]self._rejected_config_ids=data["rejected_config_ids"]self._trajectory=[TrajectoryItem(**item)foritemindata["trajectory"]]self.set_state(data["state"])
defprint_tracker(self)->None:"""Prints the number of configurations in each bracket/stage."""messages=[]for(bracket,stage),othersinself._tracker.items():counter=0for_,config_idsinothers:counter+=len(config_ids)ifcounter>0:messages.append(f"--- Bracket {bracket} / Stage {stage}: {counter} configs")iflen(messages)>0:logger.debug(f"{self.__class__.__name__} statistics:")formessageinmessages:logger.debug(message)
defreset(self)->None:"""Reset the internal variables of the intensifier including the tracker."""super().reset()# States# dict[tuple[bracket, stage], list[tuple[seed to shuffle instance-seed keys, list[config_id]]]self._tracker:dict[tuple[int,int],list[tuple[int|None,list[Configuration]]]]=defaultdict(list)
defsave(self,filename:str|Path)->None:"""Saves the current state of the intensifier. In addition to the state (retrieved by ``get_state``), this method also saves the incumbents and trajectory. """ifisinstance(filename,str):filename=Path(filename)assertstr(filename).endswith(".json")filename.parent.mkdir(parents=True,exist_ok=True)data={"incumbent_ids":[self.runhistory.get_config_id(config)forconfiginself._incumbents],"rejected_config_ids":self._rejected_config_ids,"incumbents_changed":self._incumbents_changed,"trajectory":[dataclasses.asdict(item)foriteminself._trajectory],"state":self.get_state(),}withopen(filename,"w")asfp:json.dump(data,fp,indent=2,cls=NumpyEncoder)
Updates the incumbents. This method is called everytime a trial is added to the runhistory. Since only
the affected config and the current incumbents are used, this method is very efficient. Furthermore, a
configuration is only considered incumbent if it has a better performance on all incumbent instances.
Crucially, if there is no incumbent (at the start) then, the first configuration assumes
incumbent status. For the next configuration, we need to check if the configuration
is better on all instances that have been evaluated for the incumbent. If this is the
case, then we can replace the incumbent. Otherwise, a) we need to requeue the config to
obtain the missing instance-seed-budget combination or b) mark this configuration as
inferior ("rejected") to not consider it again. The comparison behaviour is controlled by
self.get_instance_seed_budget_keys() and self.get_incumbent_instance_seed_budget_keys().
Notably, this method is written to support both multi-fidelity and multi-objective
optimization. While the get_instance_seed_budget_keys() method and
self.get_incumbent_instance_seed_budget_keys() are used for the multi-fidelity behaviour,
calculate_pareto_front() is used as a hard coded way to support multi-objective
optimization, including the single objective as special case. calculate_pareto_front()
is called on the set of all (in case of MO) incumbents amended with the challenger
configuration, provided it has a sufficient overlap in seed-instance-budget combinations.
Lastly, if we have a self._max_incumbents and the pareto front provides more than this
specified amount, we cut the incumbents using crowding distance.
Source code in smac/intensifier/abstract_intensifier.py
defupdate_incumbents(self,config:Configuration)->None:"""Updates the incumbents. This method is called everytime a trial is added to the runhistory. Since only the affected config and the current incumbents are used, this method is very efficient. Furthermore, a configuration is only considered incumbent if it has a better performance on all incumbent instances. Crucially, if there is no incumbent (at the start) then, the first configuration assumes incumbent status. For the next configuration, we need to check if the configuration is better on all instances that have been evaluated for the incumbent. If this is the case, then we can replace the incumbent. Otherwise, a) we need to requeue the config to obtain the missing instance-seed-budget combination or b) mark this configuration as inferior ("rejected") to not consider it again. The comparison behaviour is controlled by self.get_instance_seed_budget_keys() and self.get_incumbent_instance_seed_budget_keys(). Notably, this method is written to support both multi-fidelity and multi-objective optimization. While the get_instance_seed_budget_keys() method and self.get_incumbent_instance_seed_budget_keys() are used for the multi-fidelity behaviour, calculate_pareto_front() is used as a hard coded way to support multi-objective optimization, including the single objective as special case. calculate_pareto_front() is called on the set of all (in case of MO) incumbents amended with the challenger configuration, provided it has a sufficient overlap in seed-instance-budget combinations. Lastly, if we have a self._max_incumbents and the pareto front provides more than this specified amount, we cut the incumbents using crowding distance. """rh=self.runhistory# What happens if a config was rejected, but it appears again? Give it another try even if it# has already been evaluated? Yes!# Associated trials and idconfig_isb_keys=self.get_instance_seed_budget_keys(config)config_id=rh.get_config_id(config)config_hash=get_config_hash(config)# We skip updating incumbents if no instances are available# Note: This is especially the case if trials of a config are still running# because if trials are running, the runhistory does not update the trials in the fast data structureiflen(config_isb_keys)==0:logger.debug(f"No relevant instances evaluated for config {config_hash}. Updating incumbents is skipped.")return# Now we get the incumbents and see which trials have been usedincumbents=self.get_incumbents()incumbent_ids=[rh.get_config_id(c)forcinincumbents]# Find the lowest intersection of instance-seed-budget keys for all incumbents.incumbent_isb_keys=self.get_incumbent_instance_seed_budget_keys()# Save for laterprevious_incumbents=incumbents.copy()previous_incumbent_ids=incumbent_ids.copy()# Little sanity check here for consistencyiflen(incumbents)>0:assertincumbent_isb_keysisnotNoneassertlen(incumbent_isb_keys)>0# If there are no incumbents at all, we just use the new config as new incumbent# Problem: We can add running incumbentsiflen(incumbents)==0:# incumbent_isb_keys is None and len(incumbents) == 0:logger.info(f"Added config {config_hash} as new incumbent because there are no incumbents yet.")self._update_trajectory([config])# Nothing else to doreturn# Comparison keys# This one is a bit tricky: We would have problems if we compare with budgets because we might have different# scenarios (depending on the incumbent selection specified in Successive Halving).# 1) Any budget/highest observed budget: We want to get rid of the budgets because if we know it is calculated# on the same instance-seed already then we are ready to go. Imagine we would check for the same budgets,# then the configs can not be compared although the user does not care on which budgets configurations have# been evaluated.# 2) Highest budget: We only want to compare the configs if they are evaluated on the highest budget.# Here we do actually care about the budgets. Please see the ``get_instance_seed_budget_keys`` method from# Successive Halving to get more information.# Noitce: compare=True only takes effect when subclass implemented it. -- e.g. in SH it# will remove the budgets from the keys.config_isb_comparison_keys=self.get_instance_seed_budget_keys(config,compare=True)# Find the lowest intersection of instance-seed-budget keys for all incumbents.config_incumbent_isb_comparison_keys=self.get_incumbent_instance_seed_budget_keys(compare=True)# Now we have to check if the new config has been evaluated on the same keys as the incumbentsifnotall([keyinconfig_isb_comparison_keysforkeyinconfig_incumbent_isb_comparison_keys]):# We can not tell if the new config is better/worse than the incumbents because it has not been# evaluated on the necessary trialslogger.debug(f"Could not compare config {config_hash} with incumbents because it's evaluated on "f"different trials.")# The config has to go to a queue now as it is a challenger and a potential incumbentreturnelse:# If all instances are available and the config is incumbent and even evaluated on more trials# then there's nothing we can doifconfiginincumbentsandlen(config_isb_keys)>len(incumbent_isb_keys):logger.debug("Config is already an incumbent but can not be compared to other incumbents because ""the others are missing trials.")return# Add config to incumbents so that we compare only the new config and existing incumbentsifconfignotinincumbents:incumbents.append(config)incumbent_ids.append(config_id)# Now we get all instance-seed-budget keys for each incumbent (they might be different when using budgets)all_incumbent_isb_keys=[]forincumbentinincumbents:all_incumbent_isb_keys.append(self.get_instance_seed_budget_keys(incumbent))# We compare the incumbents now and only return the ones on the pareto frontnew_incumbents=calculate_pareto_front(rh,incumbents,all_incumbent_isb_keys)new_incumbent_ids=[rh.get_config_id(c)forcinnew_incumbents]iflen(previous_incumbents)==len(new_incumbents):ifprevious_incumbents==new_incumbents:# No changes in the incumbents, we need this clause because we can't use set difference thenifconfig_idinnew_incumbent_ids:self._remove_rejected_config(config_id)else:# config worse than incumbents and thus rejectedself._add_rejected_config(config_id)returnelse:# In this case, we have to determine which config replaced which incumbent and reject itremoved_incumbent_id=list(set(previous_incumbent_ids)-set(new_incumbent_ids))[0]removed_incumbent_hash=get_config_hash(rh.get_config(removed_incumbent_id))self._add_rejected_config(removed_incumbent_id)ifremoved_incumbent_id==config_id:logger.debug(f"Rejected config {config_hash} because it is not better than the incumbents on "f"{len(config_isb_keys)} instances.")else:self._remove_rejected_config(config_id)logger.info(f"Added config {config_hash} and rejected config {removed_incumbent_hash} as incumbent because "f"it is not better than the incumbents on {len(config_isb_keys)} instances: ")print_config_changes(rh.get_config(removed_incumbent_id),config,logger=logger)eliflen(previous_incumbents)<len(new_incumbents):# Config becomes a new incumbent; nothing is rejected in this caseself._remove_rejected_config(config_id)logger.info(f"Config {config_hash} is a new incumbent. "f"Total number of incumbents: {len(new_incumbents)}.")else:# There might be situations that the incumbents might be removed because of updated cost information of# configforincumbentinprevious_incumbents:ifincumbentnotinnew_incumbents:self._add_rejected_config(incumbent)logger.debug(f"Removed incumbent {get_config_hash(incumbent)} because of the updated costs from config "f"{config_hash}.")# Cut incumbents: We only want to keep a specific number of incumbents# We use the crowding distance for thatiflen(new_incumbents)>self._max_incumbents:new_incumbents=sort_by_crowding_distance(rh,new_incumbents,all_incumbent_isb_keys)new_incumbents=new_incumbents[:self._max_incumbents]# or random?# idx = self._rng.randint(0, len(new_incumbents))# del new_incumbents[idx]# del new_incumbent_ids[idx]logger.info(f"Removed one incumbent using crowding distance because more than {self._max_incumbents} are ""available.")self._update_trajectory(new_incumbents)