Regression

class pyrfr.regression.SwigPyIterator(*args, **kwargs)[source]

Bases: object

advance(n)[source]
copy()[source]
decr(n=1)[source]
distance(x)[source]
equal(x)[source]
incr(n=1)[source]
next()[source]
previous()[source]
thisown

The membership flag

value()[source]
class pyrfr.regression.base_tree(*args, **kwargs)[source]

Bases: object

depth()[source]

depth() const =0 -> index_t

fit(*args)[source]
`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
rfr::trees::tree_options< num_t, response_t, index_t > tree_opts, const std::vector< num_t > &sample_weights, rng_type &rng)=0`

fits a (possibly randomized) decision tree to a subset of the data

At each node, if it is ‘splitworthy’, a random subset of all features is considered for the split. Depending on the split_type provided, greedy or randomized choices can be made. Just make sure the max_features in tree_opts to a number smaller than the number of features!

  • data :

    the container holding the training data

  • tree_opts :

    a tree_options opject that controls certain aspects of “growing” the tree

  • sample_weights :

    vector containing the weights of all datapoints, can be used for subsampling (no checks are done here!)

  • rng :

    a (pseudo) random number generator

leaf_entries(feature_vector)[source]
`leaf_entries(const std::vector< num_t > &feature_vector) const =0 ->
std::vector< response_t > const &`

returns all response values in the leaf into which the given feature vector falls

  • feature_vector :

    an array containing a valid (in terms of size and values!) feature vector

std::vector<response_t> all response values in that leaf

number_of_leafs()[source]

number_of_leafs() const =0 -> index_t

number_of_nodes()[source]

number_of_nodes() const =0 -> index_t

predict(feature_vector)[source]

predict(const std::vector< num_t > &feature_vector) const =0 -> response_t

predicts the response value for a single feature vector

  • feature_vector :

    an array containing a valid (in terms of size and values!) feature vector

num_t the prediction of the response value (usually the mean of all responses in the corresponding leaf)

save_latex_representation(filename)[source]

save_latex_representation(const char *filename) const =0

creates a LaTeX document visualizing the tree

thisown

The membership flag

class pyrfr.regression.binary_full_tree_rss[source]

Bases: pyrfr.regression.base_tree

check_split_fractions(epsilon=1e-06)[source]

check_split_fractions(num_t epsilon=1e-6) const -> bool

depth()[source]

depth() const -> index_t

find_leaf_index(feature_vector)[source]

find_leaf_index(const std::vector< num_t > &feature_vector) const -> index_t

fit(data, tree_opts, sample_weights, rng)[source]
`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
rfr::trees::tree_options< num_t, response_t, index_t > tree_opts, const std::vector< num_t > &sample_weights, rng_type &rng)`

fits a randomized decision tree to a subset of the data

At each node, if it is ‘splitworthy’, a random subset of all features is considered for the split. Depending on the split_type provided, greedy or randomized choices can be made. Just make sure the max_features in tree_opts to a number smaller than the number of features!

  • data :

    the container holding the training data

  • tree_opts :

    a tree_options object that controls certain aspects of “growing” the tree

  • sample_weights :

    vector containing the weights of all allowed datapoints (set to individual entries to zero for subsampling), no checks are done here!

  • rng :

    the random number generator to be used

get_leaf(feature_vector)[source]
`get_leaf(const std::vector< num_t > &feature_vector) const -> const node_type
&`
leaf_entries(feature_vector)[source]
`leaf_entries(const std::vector< num_t > &feature_vector) const -> std::vector<
response_t > const &`

returns all response values in the leaf into which the given feature vector falls

  • feature_vector :

    an array containing a valid (in terms of size and values!) feature vector

std::vector<response_t> all response values in that leaf

leaf_statistic(feature_vector)[source]
`leaf_statistic(const std::vector< num_t > &feature_vector) const ->
rfr::util::weighted_running_statistics< num_t > const &`
marginalized_mean_prediction(feature_vector, node_index=0)[source]
`marginalized_mean_prediction(const std::vector< num_t > &feature_vector,
index_t node_index=0) const -> num_t`
number_of_leafs()[source]

number_of_leafs() const -> index_t

number_of_nodes()[source]

number_of_nodes() const -> index_t

partition(pcs)[source]
`partition(std::vector< std::vector< num_t > > pcs) const -> std::vector<
std::vector< std::vector< num_t > > >`
partition_recursor(the_partition, subspace, node_index)[source]
`partition_recursor(std::vector< std::vector< std::vector< num_t > > >
&the_partition, std::vector< std::vector< num_t > > &subspace, num_t node_index) const`
predict(feature_vector)[source]

predict(const std::vector< num_t > &feature_vector) const -> response_t

predicts the response value for a single feature vector

  • feature_vector :

    an array containing a valid (in terms of size and values!) feature vector

num_t the prediction of the response value (usually the mean of all responses in the corresponding leaf)

print_info()[source]

print_info() const

pseudo_downdate(features, response, weight)[source]
`pseudo_downdate(std::vector< num_t > features, response_t response, num_t
weight)`
pseudo_update(features, response, weight)[source]
`pseudo_update(std::vector< num_t > features, response_t response, num_t
weight)`
save_latex_representation(filename)[source]

save_latex_representation(const char *filename) const

a visualization by generating a LaTeX document that can be compiled

  • filename :

    Name of the file that will be used. Note that any existing file will be silently overwritten!

thisown

The membership flag

total_weight_in_subtree(node_index)[source]

total_weight_in_subtree(index_t node_index) const -> num_t

class pyrfr.regression.binary_rss_forest(*args)[source]

Bases: object

  • options : forest_options< num_t, response_t, index_t >
all_leaf_values(feature_vector)[source]
`all_leaf_values(const std::vector< num_t > &feature_vector) const ->
std::vector< std::vector< num_t > >`
ascii_string_representation()[source]

ascii_string_representation() -> std::string

covariance(f1, f2)[source]
`covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
num_t`
fit(data, rng)[source]
`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
rng_type &rng)`

growing the random forest for a given data set

  • data :

    a filled data container

  • rng :

    the random number generator to be used

kernel(f1, f2)[source]
`kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
num_t`
load_from_ascii_string(str)[source]

load_from_ascii_string(std::string const &str)

load_from_binary_file(filename)[source]

load_from_binary_file(const std::string filename)

num_trees()[source]

num_trees() -> unsigned int

options
out_of_bag_error()[source]

out_of_bag_error() -> num_t

predict(feature_vector)[source]

predict(const std::vector< num_t > &feature_vector) const -> response_t

predict_mean_var(feature_vector, weighted_data=False)[source]
`predict_mean_var(const std::vector< num_t > &feature_vector, bool
weighted_data=false) -> std::pair< num_t, num_t >`
print_info()[source]

print_info()

pseudo_downdate(features, response, weight)[source]
`pseudo_downdate(std::vector< num_t > features, response_t response, num_t
weight)`
pseudo_update(features, response, weight)[source]
`pseudo_update(std::vector< num_t > features, response_t response, num_t
weight)`
save_latex_representation(filename_template)[source]

save_latex_representation(const std::string filename_template)

save_to_binary_file(filename)[source]

save_to_binary_file(const std::string filename)

thisown

The membership flag

class pyrfr.regression.data_base(*args, **kwargs)[source]

Bases: object

The interface for any data container with the minimal functionality.

C++ includes: data_container.hpp

add_data_point(features, response, weight)[source]
`add_data_point(std::vector< num_t > features, response_t response, num_t
weight)=0`

method to add a single data point

  • features :

    a vector containing the features

  • response :

    the corresponding response value

  • weight :

    the weight of the data point

feature(feature_index, sample_index)[source]

feature(index_t feature_index, index_t sample_index) const =0 -> num_t

Function for accessing a single feature value, consistency checks might be omitted for performance.

  • feature_index :

    The index of the feature requested

  • sample_index :

    The index of the data point.

the stored value

features(feature_index, sample_indices)[source]
`features(index_t feature_index, const std::vector< index_t > &sample_indices)
const =0 -> std::vector< num_t >`

member function for accessing the feature values of multiple data points at once, consistency checks might be omitted for performance

  • feature_index :

    The index of the feature requested

  • sample_indices :

    The indices of the data point.

the stored values

get_bounds_of_feature(feature_index)[source]
`get_bounds_of_feature(index_t feature_index) const =0 -> std::pair< num_t,
num_t >`

query the allowed interval for a feature; applies only to continuous variables

  • feature_index :

    the index of the feature

std::pair<num_t,num_t> interval of allowed values

get_type_of_feature(feature_index)[source]

get_type_of_feature(index_t feature_index) const =0 -> index_t

query the type of a feature

  • feature_index :

    the index of the feature

int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

get_type_of_response()[source]

get_type_of_response() const =0 -> index_t

query the type of the response

index_t type of the response: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

num_data_points()[source]

num_data_points() const =0 -> index_t

the number of data points in the container

num_features()[source]

num_features() const =0 -> index_t

the number of features of every datapoint in the container

response(sample_index)[source]

response(index_t sample_index) const =0 -> response_t

member function to query a single response value, consistency checks might be omitted for performance

  • sample_index :

    the response of which data point

the response value

retrieve_data_point(index)[source]

retrieve_data_point(index_t index) const =0 -> std::vector< num_t >

method to retrieve a data point

  • index :

    index of the datapoint to extract

std::vector<num_t> the features of the data point

set_bounds_of_feature(feature_index, min, max)[source]

set_bounds_of_feature(index_t feature_index, num_t min, num_t max)=0

specifies the interval of allowed values for a feature

To marginalize out certain feature dimensions using non-i.i.d. data, the numerical bounds on each variable have to be known. This only applies to numerical features.

Note: The forest will not check if a datapoint is consistent with the specified bounds!

  • feature_index :

    feature_index the index of the feature

  • min :

    the smallest value for the feature

  • max :

    the largest value for the feature

set_type_of_feature(feature_index, feature_type)[source]

set_type_of_feature(index_t feature_index, index_t feature_type)=0

specifying the type of a feature

  • feature_index :

    the index of the feature whose type is specified

  • feature_type :

    the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

set_type_of_response(response_type)[source]

set_type_of_response(index_t response_type)=0

specifying the type of the response

  • response_type :

    the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

thisown

The membership flag

weight(sample_index)[source]

weight(index_t sample_index) const =0 -> num_t

function to access the weight attributed to a single data point

  • sample_index :

    which data point

the weigth of that sample

class pyrfr.regression.default_data_container(num_f)[source]

Bases: pyrfr.regression.data_base

A data container for mostly continuous data.

It might happen that only a small fraction of all features is categorical. In that case it would be wasteful to store the type of every feature separately. Instead, this data_container only stores the non-continuous ones in a hash-map.

C++ includes: default_data_container.hpp

add_data_point(features, response, weight=1)[source]
`add_data_point(std::vector< num_t > features, response_t response, num_t
weight=1)`

method to add a single data point

  • features :

    a vector containing the features

  • response :

    the corresponding response value

  • weight :

    the weight of the data point

check_consistency()[source]

check_consistency() -> bool

feature(feature_index, sample_index)[source]

feature(index_t feature_index, index_t sample_index) const -> num_t

Function for accessing a single feature value, consistency checks might be omitted for performance.

  • feature_index :

    The index of the feature requested

  • sample_index :

    The index of the data point.

the stored value

features(feature_index, sample_indices)[source]
`features(index_t feature_index, const std::vector< index_t > &sample_indices)
const -> std::vector< num_t >`

member function for accessing the feature values of multiple data points at once, consistency checks might be omitted for performance

  • feature_index :

    The index of the feature requested

  • sample_indices :

    The indices of the data point.

the stored values

get_bounds_of_feature(feature_index)[source]
`get_bounds_of_feature(index_t feature_index) const -> std::pair< num_t, num_t
>`

query the allowed interval for a feature; applies only to continuous variables

  • feature_index :

    the index of the feature

std::pair<num_t,num_t> interval of allowed values

get_min_max_of_feature(feature_index)[source]
`get_min_max_of_feature(index_t feature_index) const -> std::pair< num_t, num_t
>`
get_type_of_feature(feature_index)[source]

get_type_of_feature(index_t feature_index) const -> index_t

query the type of a feature

  • feature_index :

    the index of the feature

int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

As most features are assumed to be numerical, it is actually beneficial to store only the categorical exceptions in a hash-map. Type = 0 means continuous, and Type = n >= 1 means categorical with options in {0, n-1}.

  • feature_index :

    the index of the feature

int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {1,2,...,n}

get_type_of_response()[source]

get_type_of_response() const -> index_t

query the type of the response

index_t type of the response: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

guess_bounds_from_data()[source]

guess_bounds_from_data()

import_csv_files(*args)[source]
`import_csv_files(const std::string &feature_file, const std::string
&response_file, std::string weight_file=””) -> int`
init_protected(num_f)[source]

init_protected(index_t num_f)

num_data_points()[source]

num_data_points() const -> index_t

the number of data points in the container

num_features()[source]

num_features() const -> index_t

the number of features of every datapoint in the container

print_data()[source]

print_data()

response(sample_index)[source]

response(index_t sample_index) const -> response_t

member function to query a single response value, consistency checks might be omitted for performance

  • sample_index :

    the response of which data point

the response value

retrieve_data_point(index)[source]

retrieve_data_point(index_t index) const -> std::vector< num_t >

method to retrieve a data point

  • index :

    index of the datapoint to extract

std::vector<num_t> the features of the data point

set_bounds_of_feature(feature_index, min, max)[source]

set_bounds_of_feature(index_t feature_index, num_t min, num_t max)

specifies the interval of allowed values for a feature

To marginalize out certain feature dimensions using non-i.i.d. data, the numerical bounds on each variable have to be known. This only applies to numerical features.

Note: The forest will not check if a datapoint is consistent with the specified bounds!

  • feature_index :

    feature_index the index of the feature

  • min :

    the smallest value for the feature

  • max :

    the largest value for the feature

set_type_of_feature(index, type)[source]

set_type_of_feature(index_t index, index_t type)

specifying the type of a feature

  • feature_index :

    the index of the feature whose type is specified

  • feature_type :

    the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

set_type_of_response(resp_t)[source]

set_type_of_response(index_t resp_t)

specifying the type of the response

  • response_type :

    the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

thisown

The membership flag

weight(sample_index)[source]

weight(index_t sample_index) const -> num_t

function to access the weight attributed to a single data point

  • sample_index :

    which data point

the weigth of that sample

class pyrfr.regression.default_data_container_with_instances(*args)[source]

Bases: pyrfr.regression.data_base

A data container for mostly continuous data with instances.

Similar to the mostly_continuous_data container, but with the capability to handle instance features.

C++ includes: default_data_container_with_instances.hpp

add_configuration(config_features)[source]

add_configuration(const std::vector< num_t > &config_features) -> index_t

add_data_point(*args)[source]
`add_data_point(index_t config_index, index_t instance_index, response_t r,
num_t weight=1)`
add_instance(instance_features)[source]

add_instance(const std::vector< num_t > instance_features) -> index_t

check_consistency()[source]

check_consistency()

feature(feature_index, sample_index)[source]

feature(index_t feature_index, index_t sample_index) const -> num_t

Function for accessing a single feature value, consistency checks might be omitted for performance.

  • feature_index :

    The index of the feature requested

  • sample_index :

    The index of the data point.

the stored value

features(feature_index, sample_indices)[source]
`features(index_t feature_index, const std::vector< index_t > &sample_indices)
const -> std::vector< num_t >`

member function for accessing the feature values of multiple data points at once, consistency checks might be omitted for performance

  • feature_index :

    The index of the feature requested

  • sample_indices :

    The indices of the data point.

the stored values

get_bounds_of_feature(feature_index)[source]
`get_bounds_of_feature(index_t feature_index) const -> std::pair< num_t, num_t
>`

query the allowed interval for a feature; applies only to continuous variables

  • feature_index :

    the index of the feature

std::pair<num_t,num_t> interval of allowed values

get_configuration_set(configuration_index)[source]

get_configuration_set(num_t configuration_index) -> std::vector< num_t >

get_features_by_configuration_and_instance(configuration_index, instance_index)[source]
`get_features_by_configuration_and_instance(num_t configuration_index, num_t
instance_index) -> std::vector< num_t >`
get_instance_set()[source]

get_instance_set() -> std::vector< num_t >

method to get instance as set_feature for predict_mean_var_of_mean_response_on_set method in regression forest

get_type_of_feature(feature_index)[source]

get_type_of_feature(index_t feature_index) const -> index_t

query the type of a feature

  • feature_index :

    the index of the feature

int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

get_type_of_response()[source]

get_type_of_response() const -> index_t

query the type of the response

index_t type of the response: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

num_configurations()[source]

num_configurations() -> index_t

num_data_points()[source]

num_data_points() const -> index_t

the number of data points in the container

num_features()[source]

num_features() const -> index_t

the number of features of every datapoint in the container

num_instances()[source]

num_instances() -> index_t

response(sample_index)[source]

response(index_t sample_index) const -> response_t

member function to query a single response value, consistency checks might be omitted for performance

  • sample_index :

    the response of which data point

the response value

retrieve_data_point(index)[source]

retrieve_data_point(index_t index) const -> std::vector< num_t >

method to retrieve a data point

  • index :

    index of the datapoint to extract

std::vector<num_t> the features of the data point

set_bounds_of_feature(feature_index, min, max)[source]

set_bounds_of_feature(index_t feature_index, num_t min, num_t max)

specifies the interval of allowed values for a feature

To marginalize out certain feature dimensions using non-i.i.d. data, the numerical bounds on each variable have to be known. This only applies to numerical features.

Note: The forest will not check if a datapoint is consistent with the specified bounds!

  • feature_index :

    feature_index the index of the feature

  • min :

    the smallest value for the feature

  • max :

    the largest value for the feature

set_type_of_configuration_feature(index, type)[source]

set_type_of_configuration_feature(index_t index, index_t type)

set_type_of_feature(index, type)[source]

set_type_of_feature(index_t index, index_t type)

specifying the type of a feature

  • feature_index :

    the index of the feature whose type is specified

  • feature_type :

    the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

set_type_of_instance_feature(index, type)[source]

set_type_of_instance_feature(index_t index, index_t type)

set_type_of_response(resp_t)[source]

set_type_of_response(index_t resp_t)

specifying the type of the response

  • response_type :

    the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

thisown

The membership flag

weight(sample_index)[source]

weight(index_t sample_index) const -> num_t

function to access the weight attributed to a single data point

  • sample_index :

    which data point

the weigth of that sample

class pyrfr.regression.default_random_engine(*args)[source]

Bases: object

seed(arg2)[source]
thisown

The membership flag

class pyrfr.regression.fanova_forest(*args)[source]

Bases: pyrfr.regression.fanova_forest_prototype

all_leaf_values(feature_vector)
`all_leaf_values(const std::vector< num_t > &feature_vector) const ->
std::vector< std::vector< num_t > >`
all_split_values()[source]

all_split_values() -> std::vector< std::vector< std::vector< num_t > > >

ascii_string_representation()

ascii_string_representation() -> std::string

covariance(f1, f2)
`covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
num_t`
fit(data, rng)[source]
`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data, rng_t
&rng)`

growing the random forest for a given data set

  • data :

    a filled data container

  • rng :

    the random number generator to be used

get_cutoffs()[source]

get_cutoffs() -> std::pair< num_t, num_t >

get_trees_total_variances()[source]

get_trees_total_variances() -> std::vector< num_t >

kernel(f1, f2)
`kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
num_t`
load_from_ascii_string(str)

load_from_ascii_string(std::string const &str)

load_from_binary_file(filename)

load_from_binary_file(const std::string filename)

marginal_mean_prediction(feature_vector)[source]

marginal_mean_prediction(const std::vector< num_t > &feature_vector) -> num_t

marginal_mean_variance_prediction(feature_vector)[source]
`marginal_mean_variance_prediction(const std::vector< num_t > &feature_vector)
-> std::pair< num_t, num_t >`
marginal_prediction_stat_of_tree(tree_index, feature_vector)[source]
`marginal_prediction_stat_of_tree(index_t tree_index, const std::vector< num_t >
&feature_vector) -> rfr::util::weighted_running_statistics< num_t >`
num_trees()

num_trees() -> unsigned int

options
out_of_bag_error()

out_of_bag_error() -> num_t

precompute_marginals()[source]

precompute_marginals()

predict(feature_vector)

predict(const std::vector< num_t > &feature_vector) const -> response_t

predict_mean_var(feature_vector, weighted_data=False)
`predict_mean_var(const std::vector< num_t > &feature_vector, bool
weighted_data=false) -> std::pair< num_t, num_t >`
print_info()

print_info()

pseudo_downdate(features, response, weight)
`pseudo_downdate(std::vector< num_t > features, response_t response, num_t
weight)`
pseudo_update(features, response, weight)
`pseudo_update(std::vector< num_t > features, response_t response, num_t
weight)`
save_latex_representation(filename_template)

save_latex_representation(const std::string filename_template)

save_to_binary_file(filename)

save_to_binary_file(const std::string filename)

set_cutoffs(lower, upper)[source]

set_cutoffs(num_t lower, num_t upper)

thisown

The membership flag

class pyrfr.regression.fanova_forest_prototype(*args)[source]

Bases: object

  • options : forest_options< num_t, response_t, index_t >
all_leaf_values(feature_vector)[source]
`all_leaf_values(const std::vector< num_t > &feature_vector) const ->
std::vector< std::vector< num_t > >`
ascii_string_representation()[source]

ascii_string_representation() -> std::string

covariance(f1, f2)[source]
`covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
num_t`
fit(data, rng)[source]
`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
rng_type &rng)`

growing the random forest for a given data set

  • data :

    a filled data container

  • rng :

    the random number generator to be used

kernel(f1, f2)[source]
`kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
num_t`
load_from_ascii_string(str)[source]

load_from_ascii_string(std::string const &str)

load_from_binary_file(filename)[source]

load_from_binary_file(const std::string filename)

num_trees()[source]

num_trees() -> unsigned int

options
out_of_bag_error()[source]

out_of_bag_error() -> num_t

predict(feature_vector)[source]

predict(const std::vector< num_t > &feature_vector) const -> response_t

predict_mean_var(feature_vector, weighted_data=False)[source]
`predict_mean_var(const std::vector< num_t > &feature_vector, bool
weighted_data=false) -> std::pair< num_t, num_t >`
print_info()[source]

print_info()

pseudo_downdate(features, response, weight)[source]
`pseudo_downdate(std::vector< num_t > features, response_t response, num_t
weight)`
pseudo_update(features, response, weight)[source]
`pseudo_update(std::vector< num_t > features, response_t response, num_t
weight)`
save_latex_representation(filename_template)[source]

save_latex_representation(const std::string filename_template)

save_to_binary_file(filename)[source]

save_to_binary_file(const std::string filename)

thisown

The membership flag

class pyrfr.regression.forest_opts(*args)[source]

Bases: object

  • num_trees : index_t

    number of trees in the forest

  • num_data_points_per_tree : index_t

    number of datapoints used in each tree

  • do_bootstrapping : bool

    flag to toggle bootstrapping

  • compute_oob_error : bool

    flag to enable/disable computing the out-of-bag error

  • tree_opts : rfr::trees::tree_options< num_t, response_t, index_t >

    the options for each tree

adjust_limits_to_data(data)[source]
`adjust_limits_to_data(const rfr::data_containers::base< num_t, response_t,
index_t > &data)`

adjusts all relevant variables to the data

compute_oob_error
do_bootstrapping
num_data_points_per_tree
num_trees
set_default_values()[source]

set_default_values()

(Re)set to default values for the forest.

thisown

The membership flag

tree_opts
class pyrfr.regression.idx_vector(*args)[source]

Bases: object

append(x)[source]
assign(n, x)[source]
back()[source]
begin()[source]
capacity()[source]
clear()[source]
empty()[source]
end()[source]
erase(*args)[source]
front()[source]
get_allocator()[source]
insert(*args)[source]
iterator()[source]
pop()[source]
pop_back()[source]
push_back(x)[source]
rbegin()[source]
rend()[source]
reserve(n)[source]
resize(*args)[source]
size()[source]
swap(v)[source]
thisown

The membership flag

class pyrfr.regression.num_num_pair(*args)[source]

Bases: object

first
second
thisown

The membership flag

class pyrfr.regression.num_vector(*args)[source]

Bases: object

append(x)[source]
assign(n, x)[source]
back()[source]
begin()[source]
capacity()[source]
clear()[source]
empty()[source]
end()[source]
erase(*args)[source]
front()[source]
get_allocator()[source]
insert(*args)[source]
iterator()[source]
pop()[source]
pop_back()[source]
push_back(x)[source]
rbegin()[source]
rend()[source]
reserve(n)[source]
resize(*args)[source]
size()[source]
swap(v)[source]
thisown

The membership flag

class pyrfr.regression.num_vector_vector(*args)[source]

Bases: object

append(x)[source]
assign(n, x)[source]
back()[source]
begin()[source]
capacity()[source]
clear()[source]
empty()[source]
end()[source]
erase(*args)[source]
front()[source]
get_allocator()[source]
insert(*args)[source]
iterator()[source]
pop()[source]
pop_back()[source]
push_back(x)[source]
rbegin()[source]
rend()[source]
reserve(n)[source]
resize(*args)[source]
size()[source]
swap(v)[source]
thisown

The membership flag

class pyrfr.regression.num_vector_vector_vector(*args)[source]

Bases: object

append(x)[source]
assign(n, x)[source]
back()[source]
begin()[source]
capacity()[source]
clear()[source]
empty()[source]
end()[source]
erase(*args)[source]
front()[source]
get_allocator()[source]
insert(*args)[source]
iterator()[source]
pop()[source]
pop_back()[source]
push_back(x)[source]
rbegin()[source]
rend()[source]
reserve(n)[source]
resize(*args)[source]
size()[source]
swap(v)[source]
thisown

The membership flag

class pyrfr.regression.qr_forest(*args)[source]

Bases: pyrfr.regression.binary_rss_forest

all_leaf_values(feature_vector)
`all_leaf_values(const std::vector< num_t > &feature_vector) const ->
std::vector< std::vector< num_t > >`
ascii_string_representation()

ascii_string_representation() -> std::string

covariance(f1, f2)
`covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
num_t`
fit(data, rng)
`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
rng_type &rng)`

growing the random forest for a given data set

  • data :

    a filled data container

  • rng :

    the random number generator to be used

kernel(f1, f2)
`kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
num_t`
load_from_ascii_string(str)

load_from_ascii_string(std::string const &str)

load_from_binary_file(filename)

load_from_binary_file(const std::string filename)

num_trees()

num_trees() -> unsigned int

options
out_of_bag_error()

out_of_bag_error() -> num_t

predict(feature_vector)

predict(const std::vector< num_t > &feature_vector) const -> response_t

predict_mean_var(feature_vector, weighted_data=False)
`predict_mean_var(const std::vector< num_t > &feature_vector, bool
weighted_data=false) -> std::pair< num_t, num_t >`
predict_quantiles(feature_vector, quantiles)[source]
`predict_quantiles(const std::vector< num_t > &feature_vector, std::vector<
num_t > quantiles) const -> std::vector< num_t >`
print_info()

print_info()

pseudo_downdate(features, response, weight)
`pseudo_downdate(std::vector< num_t > features, response_t response, num_t
weight)`
pseudo_update(features, response, weight)
`pseudo_update(std::vector< num_t > features, response_t response, num_t
weight)`
save_latex_representation(filename_template)

save_latex_representation(const std::string filename_template)

save_to_binary_file(filename)

save_to_binary_file(const std::string filename)

thisown

The membership flag

class pyrfr.regression.tree_opts(*args)[source]

Bases: object

  • max_features : index_type

    number of features to consider for each split

  • max_depth : index_type

    maximum depth for the tree

  • min_samples_to_split : index_type

    minumum number of samples to try splitting

  • min_samples_in_leaf : index_type

    minimum number of samples in a leaf

  • min_weight_in_leaf : num_type

    minimum total sample weights in a leaf

  • max_num_nodes : index_type

    maxmimum total number of nodes in the tree

  • max_num_leaves : index_type

    maxmimum total number of leaves in the tree

  • epsilon_purity : response_type

    minimum difference between two response values to be considered different*/

adjust_limits_to_data(data)[source]
`adjust_limits_to_data(const rfr::data_containers::base< num_type,
response_type, index_type > &data)`
epsilon_purity
max_depth
max_features
max_num_leaves
max_num_nodes
min_samples_in_leaf
min_samples_to_split
min_weight_in_leaf
set_default_values()[source]

set_default_values()

(Re)set to default values with no limits on the size of the tree

If nothing is know about the data, this member can be used to get a valid setting for the tree_options struct. But beware this setting could lead to a huge tree depending on the amount of data. There is no limit to the size, and nodes are split into pure leafs. For each split, every feature is considered! This not only slows the training down, but also makes this tree deterministic!

thisown

The membership flag