Regression¶

class pyrfr.regression.SwigPyIterator(*args, **kwargs)[source]¶

Bases: object

advance(n)[source]¶

copy()[source]¶

decr(n=1)[source]¶

distance(x)[source]¶

equal(x)[source]¶

incr(n=1)[source]¶

next()[source]¶

previous()[source]¶

thisown¶: The membership flag

value()[source]¶

class pyrfr.regression.base_tree(*args, **kwargs)[source]¶

Bases: object

depth()[source]¶: depth() const =0 -> index_t

fit(*args)[source]¶

`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,: rfr::trees::tree_options< num_t, response_t, index_t > tree_opts, const std::vector< num_t > &sample_weights, rng_type &rng)=0`

fits a (possibly randomized) decision tree to a subset of the data

At each node, if it is ‘splitworthy’, a random subset of all features is considered for the split. Depending on the split_type provided, greedy or randomized choices can be made. Just make sure the max_features in tree_opts to a number smaller than the number of features!

data :

the container holding the training data
tree_opts :

a tree_options opject that controls certain aspects of “growing” the tree
sample_weights :

vector containing the weights of all datapoints, can be used for subsampling (no checks are done here!)
rng :

a (pseudo) random number generator

leaf_entries(feature_vector)[source]¶

`leaf_entries(const std::vector< num_t > &feature_vector) const =0 ->: std::vector< response_t > const &`

returns all response values in the leaf into which the given feature vector falls

feature_vector :

an array containing a valid (in terms of size and values!) feature vector

std::vector<response_t> all response values in that leaf

number_of_leafs()[source]¶: number_of_leafs() const =0 -> index_t

number_of_nodes()[source]¶: number_of_nodes() const =0 -> index_t

predict(feature_vector)[source]¶

predict(const std::vector< num_t > &feature_vector) const =0 -> response_t

predicts the response value for a single feature vector

feature_vector :

an array containing a valid (in terms of size and values!) feature vector

num_t the prediction of the response value (usually the mean of all responses in the corresponding leaf)

save_latex_representation(filename)[source]¶

save_latex_representation(const char *filename) const =0

creates a LaTeX document visualizing the tree

thisown¶: The membership flag

class pyrfr.regression.binary_full_tree_rss[source]¶

Bases: pyrfr.regression.base_tree

check_split_fractions(epsilon=1e-06)[source]¶: check_split_fractions(num_t epsilon=1e-6) const -> bool

depth()[source]¶: depth() const -> index_t

find_leaf_index(feature_vector)[source]¶: find_leaf_index(const std::vector< num_t > &feature_vector) const -> index_t

fit(data, tree_opts, sample_weights, rng)[source]¶

`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,: rfr::trees::tree_options< num_t, response_t, index_t > tree_opts, const std::vector< num_t > &sample_weights, rng_type &rng)`

fits a randomized decision tree to a subset of the data

At each node, if it is ‘splitworthy’, a random subset of all features is considered for the split. Depending on the split_type provided, greedy or randomized choices can be made. Just make sure the max_features in tree_opts to a number smaller than the number of features!

data :

the container holding the training data
tree_opts :

a tree_options object that controls certain aspects of “growing” the tree
sample_weights :

vector containing the weights of all allowed datapoints (set to individual entries to zero for subsampling), no checks are done here!
rng :

the random number generator to be used

get_leaf(feature_vector)[source]¶

`get_leaf(const std::vector< num_t > &feature_vector) const -> const node_type: &`

leaf_entries(feature_vector)[source]¶

`leaf_entries(const std::vector< num_t > &feature_vector) const -> std::vector<: response_t > const &`

returns all response values in the leaf into which the given feature vector falls

feature_vector :

an array containing a valid (in terms of size and values!) feature vector

std::vector<response_t> all response values in that leaf

leaf_statistic(feature_vector)[source]¶

`leaf_statistic(const std::vector< num_t > &feature_vector) const ->: rfr::util::weighted_running_statistics< num_t > const &`

marginalized_mean_prediction(feature_vector, node_index=0)[source]¶

`marginalized_mean_prediction(const std::vector< num_t > &feature_vector,: index_t node_index=0) const -> num_t`

number_of_leafs()[source]¶: number_of_leafs() const -> index_t

number_of_nodes()[source]¶: number_of_nodes() const -> index_t

partition(pcs)[source]¶

`partition(std::vector< std::vector< num_t > > pcs) const -> std::vector<: std::vector< std::vector< num_t > > >`

partition_recursor(the_partition, subspace, node_index)[source]¶

`partition_recursor(std::vector< std::vector< std::vector< num_t > > >: &the_partition, std::vector< std::vector< num_t > > &subspace, num_t node_index) const`

predict(feature_vector)[source]¶

predict(const std::vector< num_t > &feature_vector) const -> response_t

predicts the response value for a single feature vector

feature_vector :

an array containing a valid (in terms of size and values!) feature vector

num_t the prediction of the response value (usually the mean of all responses in the corresponding leaf)

print_info()[source]¶: print_info() const

pseudo_downdate(features, response, weight)[source]¶

`pseudo_downdate(std::vector< num_t > features, response_t response, num_t: weight)`

pseudo_update(features, response, weight)[source]¶

`pseudo_update(std::vector< num_t > features, response_t response, num_t: weight)`

save_latex_representation(filename)[source]¶

save_latex_representation(const char *filename) const

a visualization by generating a LaTeX document that can be compiled

filename :

Name of the file that will be used. Note that any existing file will be silently overwritten!

thisown¶: The membership flag

total_weight_in_subtree(node_index)[source]¶: total_weight_in_subtree(index_t node_index) const -> num_t

class pyrfr.regression.binary_rss_forest(*args)[source]¶

Bases: object

options : forest_options< num_t, response_t, index_t >

all_leaf_values(feature_vector)[source]¶

`all_leaf_values(const std::vector< num_t > &feature_vector) const ->: std::vector< std::vector< num_t > >`

ascii_string_representation()[source]¶: ascii_string_representation() -> std::string

covariance(f1, f2)[source]¶

`covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->: num_t`

fit(data, rng)[source]¶

`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,: rng_type &rng)`

growing the random forest for a given data set

data :

a filled data container
rng :

the random number generator to be used

kernel(f1, f2)[source]¶

`kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->: num_t`

load_from_ascii_string(str)[source]¶: load_from_ascii_string(std::string const &str)

load_from_binary_file(filename)[source]¶: load_from_binary_file(const std::string filename)

num_trees()[source]¶: num_trees() -> unsigned int

options¶

out_of_bag_error()[source]¶: out_of_bag_error() -> num_t

predict(feature_vector)[source]¶: predict(const std::vector< num_t > &feature_vector) const -> response_t

predict_mean_var(feature_vector, weighted_data=False)[source]¶

`predict_mean_var(const std::vector< num_t > &feature_vector, bool: weighted_data=false) -> std::pair< num_t, num_t >`

print_info()[source]¶: print_info()

pseudo_downdate(features, response, weight)[source]¶

`pseudo_downdate(std::vector< num_t > features, response_t response, num_t: weight)`

pseudo_update(features, response, weight)[source]¶

`pseudo_update(std::vector< num_t > features, response_t response, num_t: weight)`

save_latex_representation(filename_template)[source]¶: save_latex_representation(const std::string filename_template)

save_to_binary_file(filename)[source]¶: save_to_binary_file(const std::string filename)

thisown¶: The membership flag

class pyrfr.regression.data_base(*args, **kwargs)[source]¶

Bases: object

The interface for any data container with the minimal functionality.

C++ includes: data_container.hpp

add_data_point(features, response, weight)[source]¶

`add_data_point(std::vector< num_t > features, response_t response, num_t: weight)=0`

method to add a single data point

features :

a vector containing the features
response :

the corresponding response value
weight :

the weight of the data point

feature(feature_index, sample_index)[source]¶

feature(index_t feature_index, index_t sample_index) const =0 -> num_t

Function for accessing a single feature value, consistency checks might be omitted for performance.

feature_index :

The index of the feature requested
sample_index :

The index of the data point.

the stored value

features(feature_index, sample_indices)[source]¶

`features(index_t feature_index, const std::vector< index_t > &sample_indices): const =0 -> std::vector< num_t >`

member function for accessing the feature values of multiple data points at once, consistency checks might be omitted for performance

feature_index :

The index of the feature requested
sample_indices :

The indices of the data point.

the stored values

get_bounds_of_feature(feature_index)[source]¶

`get_bounds_of_feature(index_t feature_index) const =0 -> std::pair< num_t,: num_t >`

query the allowed interval for a feature; applies only to continuous variables

feature_index :

the index of the feature

std::pair<num_t,num_t> interval of allowed values

get_type_of_feature(feature_index)[source]¶

get_type_of_feature(index_t feature_index) const =0 -> index_t

query the type of a feature

feature_index :

the index of the feature

int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

get_type_of_response()[source]¶

get_type_of_response() const =0 -> index_t

query the type of the response

index_t type of the response: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

num_data_points()[source]¶

num_data_points() const =0 -> index_t

the number of data points in the container

num_features()[source]¶

num_features() const =0 -> index_t

the number of features of every datapoint in the container

response(sample_index)[source]¶

response(index_t sample_index) const =0 -> response_t

member function to query a single response value, consistency checks might be omitted for performance

sample_index :

the response of which data point

the response value

retrieve_data_point(index)[source]¶

retrieve_data_point(index_t index) const =0 -> std::vector< num_t >

method to retrieve a data point

index :

index of the datapoint to extract

std::vector<num_t> the features of the data point

set_bounds_of_feature(feature_index, min, max)[source]¶

set_bounds_of_feature(index_t feature_index, num_t min, num_t max)=0

specifies the interval of allowed values for a feature

To marginalize out certain feature dimensions using non-i.i.d. data, the numerical bounds on each variable have to be known. This only applies to numerical features.

Note: The forest will not check if a datapoint is consistent with the specified bounds!

feature_index :

feature_index the index of the feature
min :

the smallest value for the feature
max :

the largest value for the feature

set_type_of_feature(feature_index, feature_type)[source]¶

set_type_of_feature(index_t feature_index, index_t feature_type)=0

specifying the type of a feature

feature_index :

the index of the feature whose type is specified
feature_type :

the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

set_type_of_response(response_type)[source]¶

set_type_of_response(index_t response_type)=0

specifying the type of the response

response_type :

the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

thisown¶: The membership flag

weight(sample_index)[source]¶

weight(index_t sample_index) const =0 -> num_t

function to access the weight attributed to a single data point

sample_index :

which data point

the weigth of that sample

class pyrfr.regression.default_data_container(num_f)[source]¶

Bases: pyrfr.regression.data_base

A data container for mostly continuous data.

It might happen that only a small fraction of all features is categorical. In that case it would be wasteful to store the type of every feature separately. Instead, this data_container only stores the non-continuous ones in a hash-map.

C++ includes: default_data_container.hpp

add_data_point(features, response, weight=1)[source]¶

`add_data_point(std::vector< num_t > features, response_t response, num_t: weight=1)`

method to add a single data point

features :

a vector containing the features
response :

the corresponding response value
weight :

the weight of the data point

check_consistency()[source]¶: check_consistency() -> bool

feature(feature_index, sample_index)[source]¶

feature(index_t feature_index, index_t sample_index) const -> num_t

Function for accessing a single feature value, consistency checks might be omitted for performance.

feature_index :

The index of the feature requested
sample_index :

The index of the data point.

the stored value

features(feature_index, sample_indices)[source]¶

`features(index_t feature_index, const std::vector< index_t > &sample_indices): const -> std::vector< num_t >`

member function for accessing the feature values of multiple data points at once, consistency checks might be omitted for performance

feature_index :

The index of the feature requested
sample_indices :

The indices of the data point.

the stored values

get_bounds_of_feature(feature_index)[source]¶

`get_bounds_of_feature(index_t feature_index) const -> std::pair< num_t, num_t: >`

query the allowed interval for a feature; applies only to continuous variables

feature_index :

the index of the feature

std::pair<num_t,num_t> interval of allowed values

get_min_max_of_feature(feature_index)[source]¶

`get_min_max_of_feature(index_t feature_index) const -> std::pair< num_t, num_t: >`

get_type_of_feature(feature_index)[source]¶

get_type_of_feature(index_t feature_index) const -> index_t

query the type of a feature

feature_index :

the index of the feature

int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

As most features are assumed to be numerical, it is actually beneficial to store only the categorical exceptions in a hash-map. Type = 0 means continuous, and Type = n >= 1 means categorical with options in {0, n-1}.

feature_index :

the index of the feature

int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {1,2,...,n}

get_type_of_response()[source]¶

get_type_of_response() const -> index_t

query the type of the response

index_t type of the response: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

guess_bounds_from_data()[source]¶: guess_bounds_from_data()

import_csv_files(*args)[source]¶

`import_csv_files(const std::string &feature_file, const std::string: &response_file, std::string weight_file=””) -> int`

init_protected(num_f)[source]¶: init_protected(index_t num_f)

num_data_points()[source]¶

num_data_points() const -> index_t

the number of data points in the container

num_features()[source]¶

num_features() const -> index_t

the number of features of every datapoint in the container

print_data()[source]¶: print_data()

response(sample_index)[source]¶

response(index_t sample_index) const -> response_t

member function to query a single response value, consistency checks might be omitted for performance

sample_index :

the response of which data point

the response value

retrieve_data_point(index)[source]¶

retrieve_data_point(index_t index) const -> std::vector< num_t >

method to retrieve a data point

index :

index of the datapoint to extract

std::vector<num_t> the features of the data point

set_bounds_of_feature(feature_index, min, max)[source]¶

set_bounds_of_feature(index_t feature_index, num_t min, num_t max)

specifies the interval of allowed values for a feature

To marginalize out certain feature dimensions using non-i.i.d. data, the numerical bounds on each variable have to be known. This only applies to numerical features.

Note: The forest will not check if a datapoint is consistent with the specified bounds!

feature_index :

feature_index the index of the feature
min :

the smallest value for the feature
max :

the largest value for the feature

set_type_of_feature(index, type)[source]¶

set_type_of_feature(index_t index, index_t type)

specifying the type of a feature

feature_index :

the index of the feature whose type is specified
feature_type :

the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

set_type_of_response(resp_t)[source]¶

set_type_of_response(index_t resp_t)

specifying the type of the response

response_type :

the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

thisown¶: The membership flag

weight(sample_index)[source]¶

weight(index_t sample_index) const -> num_t

function to access the weight attributed to a single data point

sample_index :

which data point

the weigth of that sample

class pyrfr.regression.default_data_container_with_instances(*args)[source]¶

Bases: pyrfr.regression.data_base

A data container for mostly continuous data with instances.

Similar to the mostly_continuous_data container, but with the capability to handle instance features.

C++ includes: default_data_container_with_instances.hpp

add_configuration(config_features)[source]¶: add_configuration(const std::vector< num_t > &config_features) -> index_t

add_data_point(*args)[source]¶

`add_data_point(index_t config_index, index_t instance_index, response_t r,: num_t weight=1)`

add_instance(instance_features)[source]¶: add_instance(const std::vector< num_t > instance_features) -> index_t

check_consistency()[source]¶: check_consistency()

feature(feature_index, sample_index)[source]¶

feature(index_t feature_index, index_t sample_index) const -> num_t

Function for accessing a single feature value, consistency checks might be omitted for performance.

feature_index :

The index of the feature requested
sample_index :

The index of the data point.

the stored value

features(feature_index, sample_indices)[source]¶

`features(index_t feature_index, const std::vector< index_t > &sample_indices): const -> std::vector< num_t >`

member function for accessing the feature values of multiple data points at once, consistency checks might be omitted for performance

feature_index :

The index of the feature requested
sample_indices :

The indices of the data point.

the stored values

get_bounds_of_feature(feature_index)[source]¶

`get_bounds_of_feature(index_t feature_index) const -> std::pair< num_t, num_t: >`

query the allowed interval for a feature; applies only to continuous variables

feature_index :

the index of the feature

std::pair<num_t,num_t> interval of allowed values

get_configuration_set(configuration_index)[source]¶: get_configuration_set(num_t configuration_index) -> std::vector< num_t >

get_features_by_configuration_and_instance(configuration_index, instance_index)[source]¶

`get_features_by_configuration_and_instance(num_t configuration_index, num_t: instance_index) -> std::vector< num_t >`

get_instance_set()[source]¶

get_instance_set() -> std::vector< num_t >

method to get instance as set_feature for predict_mean_var_of_mean_response_on_set method in regression forest

get_type_of_feature(feature_index)[source]¶

get_type_of_feature(index_t feature_index) const -> index_t

query the type of a feature

feature_index :

the index of the feature

int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

get_type_of_response()[source]¶

get_type_of_response() const -> index_t

query the type of the response

index_t type of the response: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}

num_configurations()[source]¶: num_configurations() -> index_t

num_data_points()[source]¶

num_data_points() const -> index_t

the number of data points in the container

num_features()[source]¶

num_features() const -> index_t

the number of features of every datapoint in the container

num_instances()[source]¶: num_instances() -> index_t

response(sample_index)[source]¶

response(index_t sample_index) const -> response_t

member function to query a single response value, consistency checks might be omitted for performance

sample_index :

the response of which data point

the response value

retrieve_data_point(index)[source]¶

retrieve_data_point(index_t index) const -> std::vector< num_t >

method to retrieve a data point

index :

index of the datapoint to extract

std::vector<num_t> the features of the data point

set_bounds_of_feature(feature_index, min, max)[source]¶

set_bounds_of_feature(index_t feature_index, num_t min, num_t max)

specifies the interval of allowed values for a feature

To marginalize out certain feature dimensions using non-i.i.d. data, the numerical bounds on each variable have to be known. This only applies to numerical features.

Note: The forest will not check if a datapoint is consistent with the specified bounds!

feature_index :

feature_index the index of the feature
min :

the smallest value for the feature
max :

the largest value for the feature

set_type_of_configuration_feature(index, type)[source]¶: set_type_of_configuration_feature(index_t index, index_t type)

set_type_of_feature(index, type)[source]¶

set_type_of_feature(index_t index, index_t type)

specifying the type of a feature

feature_index :

the index of the feature whose type is specified
feature_type :

the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

set_type_of_instance_feature(index, type)[source]¶: set_type_of_instance_feature(index_t index, index_t type)

set_type_of_response(resp_t)[source]¶

set_type_of_response(index_t resp_t)

specifying the type of the response

response_type :

the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}

thisown¶: The membership flag

weight(sample_index)[source]¶

weight(index_t sample_index) const -> num_t

function to access the weight attributed to a single data point

sample_index :

which data point

the weigth of that sample

class pyrfr.regression.default_random_engine(*args)[source]¶

Bases: object

seed(arg2)[source]¶

thisown¶: The membership flag

class pyrfr.regression.fanova_forest(*args)[source]¶

Bases: pyrfr.regression.fanova_forest_prototype

all_leaf_values(feature_vector)¶

`all_leaf_values(const std::vector< num_t > &feature_vector) const ->: std::vector< std::vector< num_t > >`

all_split_values()[source]¶: all_split_values() -> std::vector< std::vector< std::vector< num_t > > >

ascii_string_representation()¶: ascii_string_representation() -> std::string

covariance(f1, f2)¶

`covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->: num_t`

fit(data, rng)[source]¶

`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data, rng_t: &rng)`

growing the random forest for a given data set

data :

a filled data container
rng :

the random number generator to be used

get_cutoffs()[source]¶: get_cutoffs() -> std::pair< num_t, num_t >

get_trees_total_variances()[source]¶: get_trees_total_variances() -> std::vector< num_t >

kernel(f1, f2)¶

`kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->: num_t`

load_from_ascii_string(str)¶: load_from_ascii_string(std::string const &str)

load_from_binary_file(filename)¶: load_from_binary_file(const std::string filename)

marginal_mean_prediction(feature_vector)[source]¶: marginal_mean_prediction(const std::vector< num_t > &feature_vector) -> num_t

marginal_mean_variance_prediction(feature_vector)[source]¶

`marginal_mean_variance_prediction(const std::vector< num_t > &feature_vector): -> std::pair< num_t, num_t >`

marginal_prediction_stat_of_tree(tree_index, feature_vector)[source]¶

`marginal_prediction_stat_of_tree(index_t tree_index, const std::vector< num_t >: &feature_vector) -> rfr::util::weighted_running_statistics< num_t >`

num_trees()¶: num_trees() -> unsigned int

options¶

out_of_bag_error()¶: out_of_bag_error() -> num_t

precompute_marginals()[source]¶: precompute_marginals()

predict(feature_vector)¶: predict(const std::vector< num_t > &feature_vector) const -> response_t

predict_mean_var(feature_vector, weighted_data=False)¶

`predict_mean_var(const std::vector< num_t > &feature_vector, bool: weighted_data=false) -> std::pair< num_t, num_t >`

print_info()¶: print_info()

pseudo_downdate(features, response, weight)¶

`pseudo_downdate(std::vector< num_t > features, response_t response, num_t: weight)`

pseudo_update(features, response, weight)¶

`pseudo_update(std::vector< num_t > features, response_t response, num_t: weight)`

save_latex_representation(filename_template)¶: save_latex_representation(const std::string filename_template)

save_to_binary_file(filename)¶: save_to_binary_file(const std::string filename)

set_cutoffs(lower, upper)[source]¶: set_cutoffs(num_t lower, num_t upper)

thisown¶: The membership flag

class pyrfr.regression.fanova_forest_prototype(*args)[source]¶

Bases: object

options : forest_options< num_t, response_t, index_t >

all_leaf_values(feature_vector)[source]¶

`all_leaf_values(const std::vector< num_t > &feature_vector) const ->: std::vector< std::vector< num_t > >`

ascii_string_representation()[source]¶: ascii_string_representation() -> std::string

covariance(f1, f2)[source]¶

`covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->: num_t`

fit(data, rng)[source]¶

`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,: rng_type &rng)`

growing the random forest for a given data set

data :

a filled data container
rng :

the random number generator to be used

kernel(f1, f2)[source]¶

`kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->: num_t`

load_from_ascii_string(str)[source]¶: load_from_ascii_string(std::string const &str)

load_from_binary_file(filename)[source]¶: load_from_binary_file(const std::string filename)

num_trees()[source]¶: num_trees() -> unsigned int

options¶

out_of_bag_error()[source]¶: out_of_bag_error() -> num_t

predict(feature_vector)[source]¶: predict(const std::vector< num_t > &feature_vector) const -> response_t

predict_mean_var(feature_vector, weighted_data=False)[source]¶

`predict_mean_var(const std::vector< num_t > &feature_vector, bool: weighted_data=false) -> std::pair< num_t, num_t >`

print_info()[source]¶: print_info()

pseudo_downdate(features, response, weight)[source]¶

`pseudo_downdate(std::vector< num_t > features, response_t response, num_t: weight)`

pseudo_update(features, response, weight)[source]¶

`pseudo_update(std::vector< num_t > features, response_t response, num_t: weight)`

save_latex_representation(filename_template)[source]¶: save_latex_representation(const std::string filename_template)

save_to_binary_file(filename)[source]¶: save_to_binary_file(const std::string filename)

thisown¶: The membership flag

class pyrfr.regression.forest_opts(*args)[source]¶

Bases: object

num_trees : index_t

number of trees in the forest
num_data_points_per_tree : index_t

number of datapoints used in each tree
do_bootstrapping : bool

flag to toggle bootstrapping
compute_oob_error : bool

flag to enable/disable computing the out-of-bag error
tree_opts : rfr::trees::tree_options< num_t, response_t, index_t >

the options for each tree

adjust_limits_to_data(data)[source]¶

`adjust_limits_to_data(const rfr::data_containers::base< num_t, response_t,: index_t > &data)`

adjusts all relevant variables to the data

compute_oob_error¶

do_bootstrapping¶

num_data_points_per_tree¶

num_trees¶

set_default_values()[source]¶

set_default_values()

(Re)set to default values for the forest.

thisown¶: The membership flag

tree_opts¶

class pyrfr.regression.idx_vector(*args)[source]¶

Bases: object

append(x)[source]¶

assign(n, x)[source]¶

back()[source]¶

begin()[source]¶

capacity()[source]¶

clear()[source]¶

empty()[source]¶

end()[source]¶

erase(*args)[source]¶

front()[source]¶

get_allocator()[source]¶

insert(*args)[source]¶

iterator()[source]¶

pop()[source]¶

pop_back()[source]¶

push_back(x)[source]¶

rbegin()[source]¶

rend()[source]¶

reserve(n)[source]¶

resize(*args)[source]¶

size()[source]¶

swap(v)[source]¶

thisown¶: The membership flag

class pyrfr.regression.num_num_pair(*args)[source]¶

Bases: object

first¶

second¶

thisown¶: The membership flag

class pyrfr.regression.num_vector(*args)[source]¶

Bases: object

append(x)[source]¶

assign(n, x)[source]¶

back()[source]¶

begin()[source]¶

capacity()[source]¶

clear()[source]¶

empty()[source]¶

end()[source]¶

erase(*args)[source]¶

front()[source]¶

get_allocator()[source]¶

insert(*args)[source]¶

iterator()[source]¶

pop()[source]¶

pop_back()[source]¶

push_back(x)[source]¶

rbegin()[source]¶

rend()[source]¶

reserve(n)[source]¶

resize(*args)[source]¶

size()[source]¶

swap(v)[source]¶

thisown¶: The membership flag

class pyrfr.regression.num_vector_vector(*args)[source]¶

Bases: object

append(x)[source]¶

assign(n, x)[source]¶

back()[source]¶

begin()[source]¶

capacity()[source]¶

clear()[source]¶

empty()[source]¶

end()[source]¶

erase(*args)[source]¶

front()[source]¶

get_allocator()[source]¶

insert(*args)[source]¶

iterator()[source]¶

pop()[source]¶

pop_back()[source]¶

push_back(x)[source]¶

rbegin()[source]¶

rend()[source]¶

reserve(n)[source]¶

resize(*args)[source]¶

size()[source]¶

swap(v)[source]¶

thisown¶: The membership flag

class pyrfr.regression.num_vector_vector_vector(*args)[source]¶

Bases: object

append(x)[source]¶

assign(n, x)[source]¶

back()[source]¶

begin()[source]¶

capacity()[source]¶

clear()[source]¶

empty()[source]¶

end()[source]¶

erase(*args)[source]¶

front()[source]¶

get_allocator()[source]¶

insert(*args)[source]¶

iterator()[source]¶

pop()[source]¶

pop_back()[source]¶

push_back(x)[source]¶

rbegin()[source]¶

rend()[source]¶

reserve(n)[source]¶

resize(*args)[source]¶

size()[source]¶

swap(v)[source]¶

thisown¶: The membership flag

class pyrfr.regression.qr_forest(*args)[source]¶

Bases: pyrfr.regression.binary_rss_forest

all_leaf_values(feature_vector)¶

`all_leaf_values(const std::vector< num_t > &feature_vector) const ->: std::vector< std::vector< num_t > >`

ascii_string_representation()¶: ascii_string_representation() -> std::string

covariance(f1, f2)¶

`covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->: num_t`

fit(data, rng)¶

`fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,: rng_type &rng)`

growing the random forest for a given data set

data :

a filled data container
rng :

the random number generator to be used

kernel(f1, f2)¶

`kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->: num_t`

load_from_ascii_string(str)¶: load_from_ascii_string(std::string const &str)

load_from_binary_file(filename)¶: load_from_binary_file(const std::string filename)

num_trees()¶: num_trees() -> unsigned int

options¶

out_of_bag_error()¶: out_of_bag_error() -> num_t

predict(feature_vector)¶: predict(const std::vector< num_t > &feature_vector) const -> response_t

predict_mean_var(feature_vector, weighted_data=False)¶

`predict_mean_var(const std::vector< num_t > &feature_vector, bool: weighted_data=false) -> std::pair< num_t, num_t >`

predict_quantiles(feature_vector, quantiles)[source]¶

`predict_quantiles(const std::vector< num_t > &feature_vector, std::vector<: num_t > quantiles) const -> std::vector< num_t >`

print_info()¶: print_info()

pseudo_downdate(features, response, weight)¶

`pseudo_downdate(std::vector< num_t > features, response_t response, num_t: weight)`

pseudo_update(features, response, weight)¶

`pseudo_update(std::vector< num_t > features, response_t response, num_t: weight)`

save_latex_representation(filename_template)¶: save_latex_representation(const std::string filename_template)

save_to_binary_file(filename)¶: save_to_binary_file(const std::string filename)

thisown¶: The membership flag

class pyrfr.regression.tree_opts(*args)[source]¶

Bases: object

max_features : index_type

number of features to consider for each split
max_depth : index_type

maximum depth for the tree
min_samples_to_split : index_type

minumum number of samples to try splitting
min_samples_in_leaf : index_type

minimum number of samples in a leaf
min_weight_in_leaf : num_type

minimum total sample weights in a leaf
max_num_nodes : index_type

maxmimum total number of nodes in the tree
max_num_leaves : index_type

maxmimum total number of leaves in the tree
epsilon_purity : response_type

minimum difference between two response values to be considered different*/

adjust_limits_to_data(data)[source]¶

`adjust_limits_to_data(const rfr::data_containers::base< num_type,: response_type, index_type > &data)`

epsilon_purity¶

max_depth¶

max_features¶

max_num_leaves¶

max_num_nodes¶

min_samples_in_leaf¶

min_samples_to_split¶

min_weight_in_leaf¶

set_default_values()[source]¶

set_default_values()

(Re)set to default values with no limits on the size of the tree

If nothing is know about the data, this member can be used to get a valid setting for the tree_options struct. But beware this setting could lead to a huge tree depending on the amount of data. There is no limit to the size, and nodes are split into pure leafs. For each split, every feature is considered! This not only slows the training down, but also makes this tree deterministic!

thisown¶: The membership flag

Regression¶

Previous topic

This Page