Portfolio
A portfolio in meta-learning is to a set (ordered or not) of configurations that maximize some notion of coverage across datasets or tasks. The intuition here is that this also means that any new dataset is also covered!
Suppose we have the given performances of some configurations across some datasets.
import pandas as pd
performances = {
"c1": [90, 60, 20, 10],
"c2": [20, 10, 90, 20],
"c3": [10, 20, 40, 90],
"c4": [90, 10, 10, 10],
}
portfolio = pd.DataFrame(performances, index=["dataset_1", "dataset_2", "dataset_3", "dataset_4"])
print(portfolio)
If we could only choose k=3
of these configurations on some new given dataset, which ones would
you choose and in what priority?
Here is where we can apply portfolio_selection()
!
The idea is that we pick a subset of these algorithms that maximise some value of utility for
the portfolio. We do this by adding a single configuration from the entire set, 1-by-1 until
we reach k
, beginning with the empty portfolio.
Let's see this in action!
import pandas as pd
from amltk.metalearning import portfolio_selection
performances = {
"c1": [90, 60, 20, 10],
"c2": [20, 10, 90, 20],
"c3": [10, 20, 40, 90],
"c4": [90, 10, 10, 10],
}
portfolio = pd.DataFrame(performances, index=["dataset_1", "dataset_2", "dataset_3", "dataset_4"])
selected_portfolio, trajectory = portfolio_selection(
portfolio,
k=3,
scaler="minmax"
)
print(selected_portfolio)
print()
print(trajectory)
The trajectory tells us which configuration was added at each time stamp along with the utility of the portfolio with that configuration added. However we havn't specified how exactly we defined the utility of a given portfolio. We could define our own function to do so:
import pandas as pd
from amltk.metalearning import portfolio_selection
performances = {
"c1": [90, 60, 20, 10],
"c2": [20, 10, 90, 20],
"c3": [10, 20, 40, 90],
"c4": [90, 10, 10, 10],
}
portfolio = pd.DataFrame(performances, index=["dataset_1", "dataset_2", "dataset_3", "dataset_4"])
def my_function(p: pd.DataFrame) -> float:
# Take the maximum score for each dataset and then take the mean across them.
return p.max(axis=1).mean()
selected_portfolio, trajectory = portfolio_selection(
portfolio,
k=3,
scaler="minmax",
portfolio_value=my_function,
)
print(selected_portfolio)
print()
print(trajectory)
This notion of reducing across all configurations for a dataset and then aggregating these is common enough that we can also directly just define these operations and we will perform the rest.
import pandas as pd
import numpy as np
from amltk.metalearning import portfolio_selection
performances = {
"c1": [90, 60, 20, 10],
"c2": [20, 10, 90, 20],
"c3": [10, 20, 40, 90],
"c4": [90, 10, 10, 10],
}
portfolio = pd.DataFrame(performances, index=["dataset_1", "dataset_2", "dataset_3", "dataset_4"])
selected_portfolio, trajectory = portfolio_selection(
portfolio,
k=3,
scaler="minmax",
row_reducer=np.max, # This is actually the default
aggregator=np.mean, # This is actually the default
)
print(selected_portfolio)
print()
print(trajectory)
def portfolio_selection(items, k, *, row_reducer=np.max, aggregator=np.mean, portfolio_value=None, maximize=True, scaler='minmax', with_replacement=False, stop_if_worse=False, seed=None)
#
Selects a portfolio of k
items from items
.
A portfolio is a subset of the items, and is selected by maximizing the
portfolio_value
function in a greedy selection approach.
At each iteration 0 <= i < k
, the portfolio_value
function is calculated
for the portfolio obtained by adding the i
th item to the portfolio. The item
that maximizes the portfolio_value
function is then added to the portfolio for
the next iteration.
The portfolio_function
can often be define by a row wise reduction
(row_reducer=
) followed by some aggregation over these reductions (aggregator=
).
You can also supply your own value function if desired (portfolio_value=
).
A Single Iteration
This uses the row_reducer=np.max
and aggregator=np.mean
to calculate the
value of a portfolio.
In this case, we have 4 datasets and our current portfolio
consists of config_1
and config_2
. We are going to calculate the value of
adding config_try
to the current best portfolio.
| config_1 | config_2 | config_try
dataset_1 | 1 | 0 | 0
dataset_2 | 0 | 0.5 | 1
dataset_3 | 0 | 0.5 | 0.5
dataset_4 | 1 | 1 | 0
Apply row_reducer
to each row, in this case np.max
Apply aggregator
to the reduced rows, in this case np.mean
PARAMETER | DESCRIPTION |
---|---|
items |
A dictionary of items to select from. |
k |
The number of items to select.
TYPE:
|
row_reducer |
A function to aggregate the rows of the portfolio. This is applied to a potential portfolio, for example to calculate the max score of all configs, for a given dataset (row). |
aggregator |
A function to take all the single values reduced by |
portfolio_value |
A custom function to calculate the value of a portfolio.
This will take precedence over |
maximize |
Whether to maximize or minimize the portfolio value.
TYPE:
|
scaler |
A scaler to use to scale the portfolio values. Is applied across the rows.
TYPE:
|
with_replacement |
Whether to select items with replacement.
TYPE:
|
stop_if_worse |
Whether to stop if the portfolio value is worse than the current best.
TYPE:
|
seed |
The seed to use for breaking ties.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
The final portfolio |
Series
|
The trajectory, where the entry is the value once added to the portfolio. |
Source code in src/amltk/metalearning/portfolio.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
|