Regression¶
-
class
pyrfr.regression.
SwigPyIterator
(*args, **kwargs)[source]¶ Bases:
object
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
base_tree
(*args, **kwargs)[source]¶ Bases:
object
-
fit
(*args)[source]¶ - `fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
- rfr::trees::tree_options< num_t, response_t, index_t > tree_opts, const std::vector< num_t > &sample_weights, rng_type &rng)=0`
fits a (possibly randomized) decision tree to a subset of the data
At each node, if it is ‘splitworthy’, a random subset of all features is considered for the split. Depending on the split_type provided, greedy or randomized choices can be made. Just make sure the max_features in tree_opts to a number smaller than the number of features!
- data :
the container holding the training data
- tree_opts :
a tree_options opject that controls certain aspects of “growing” the tree
- sample_weights :
vector containing the weights of all datapoints, can be used for subsampling (no checks are done here!)
- rng :
a (pseudo) random number generator
-
leaf_entries
(feature_vector)[source]¶ - `leaf_entries(const std::vector< num_t > &feature_vector) const =0 ->
- std::vector< response_t > const &`
returns all response values in the leaf into which the given feature vector falls
- feature_vector :
an array containing a valid (in terms of size and values!) feature vector
std::vector<response_t> all response values in that leaf
-
predict
(feature_vector)[source]¶ predict(const std::vector< num_t > &feature_vector) const =0 -> response_t
predicts the response value for a single feature vector
- feature_vector :
an array containing a valid (in terms of size and values!) feature vector
num_t the prediction of the response value (usually the mean of all responses in the corresponding leaf)
-
save_latex_representation
(filename)[source]¶ save_latex_representation(const char *filename) const =0
creates a LaTeX document visualizing the tree
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
binary_full_tree_rss
[source]¶ Bases:
pyrfr.regression.base_tree
-
check_split_fractions
(epsilon=1e-06)[source]¶ check_split_fractions(num_t epsilon=1e-6) const -> bool
-
find_leaf_index
(feature_vector)[source]¶ find_leaf_index(const std::vector< num_t > &feature_vector) const -> index_t
-
fit
(data, tree_opts, sample_weights, rng)[source]¶ - `fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
- rfr::trees::tree_options< num_t, response_t, index_t > tree_opts, const std::vector< num_t > &sample_weights, rng_type &rng)`
fits a randomized decision tree to a subset of the data
At each node, if it is ‘splitworthy’, a random subset of all features is considered for the split. Depending on the split_type provided, greedy or randomized choices can be made. Just make sure the max_features in tree_opts to a number smaller than the number of features!
- data :
the container holding the training data
- tree_opts :
a tree_options object that controls certain aspects of “growing” the tree
- sample_weights :
vector containing the weights of all allowed datapoints (set to individual entries to zero for subsampling), no checks are done here!
- rng :
the random number generator to be used
-
get_leaf
(feature_vector)[source]¶ - `get_leaf(const std::vector< num_t > &feature_vector) const -> const node_type
- &`
-
leaf_entries
(feature_vector)[source]¶ - `leaf_entries(const std::vector< num_t > &feature_vector) const -> std::vector<
- response_t > const &`
returns all response values in the leaf into which the given feature vector falls
- feature_vector :
an array containing a valid (in terms of size and values!) feature vector
std::vector<response_t> all response values in that leaf
-
leaf_statistic
(feature_vector)[source]¶ - `leaf_statistic(const std::vector< num_t > &feature_vector) const ->
- rfr::util::weighted_running_statistics< num_t > const &`
-
marginalized_mean_prediction
(feature_vector, node_index=0)[source]¶ - `marginalized_mean_prediction(const std::vector< num_t > &feature_vector,
- index_t node_index=0) const -> num_t`
-
partition
(pcs)[source]¶ - `partition(std::vector< std::vector< num_t > > pcs) const -> std::vector<
- std::vector< std::vector< num_t > > >`
-
partition_recursor
(the_partition, subspace, node_index)[source]¶ - `partition_recursor(std::vector< std::vector< std::vector< num_t > > >
- &the_partition, std::vector< std::vector< num_t > > &subspace, num_t node_index) const`
-
predict
(feature_vector)[source]¶ predict(const std::vector< num_t > &feature_vector) const -> response_t
predicts the response value for a single feature vector
- feature_vector :
an array containing a valid (in terms of size and values!) feature vector
num_t the prediction of the response value (usually the mean of all responses in the corresponding leaf)
-
pseudo_downdate
(features, response, weight)[source]¶ - `pseudo_downdate(std::vector< num_t > features, response_t response, num_t
- weight)`
-
pseudo_update
(features, response, weight)[source]¶ - `pseudo_update(std::vector< num_t > features, response_t response, num_t
- weight)`
-
save_latex_representation
(filename)[source]¶ save_latex_representation(const char *filename) const
a visualization by generating a LaTeX document that can be compiled
- filename :
Name of the file that will be used. Note that any existing file will be silently overwritten!
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
binary_rss_forest
(*args)[source]¶ Bases:
object
- options : forest_options< num_t, response_t, index_t >
-
all_leaf_values
(feature_vector)[source]¶ - `all_leaf_values(const std::vector< num_t > &feature_vector) const ->
- std::vector< std::vector< num_t > >`
-
covariance
(f1, f2)[source]¶ - `covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
- num_t`
-
fit
(data, rng)[source]¶ - `fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
- rng_type &rng)`
growing the random forest for a given data set
- data :
a filled data container
- rng :
the random number generator to be used
-
kernel
(f1, f2)[source]¶ - `kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
- num_t`
-
options
¶
-
predict
(feature_vector)[source]¶ predict(const std::vector< num_t > &feature_vector) const -> response_t
-
predict_mean_var
(feature_vector, weighted_data=False)[source]¶ - `predict_mean_var(const std::vector< num_t > &feature_vector, bool
- weighted_data=false) -> std::pair< num_t, num_t >`
-
pseudo_downdate
(features, response, weight)[source]¶ - `pseudo_downdate(std::vector< num_t > features, response_t response, num_t
- weight)`
-
pseudo_update
(features, response, weight)[source]¶ - `pseudo_update(std::vector< num_t > features, response_t response, num_t
- weight)`
-
save_latex_representation
(filename_template)[source]¶ save_latex_representation(const std::string filename_template)
-
thisown
¶ The membership flag
-
class
pyrfr.regression.
data_base
(*args, **kwargs)[source]¶ Bases:
object
The interface for any data container with the minimal functionality.
C++ includes: data_container.hpp
-
add_data_point
(features, response, weight)[source]¶ - `add_data_point(std::vector< num_t > features, response_t response, num_t
- weight)=0`
method to add a single data point
- features :
a vector containing the features
- response :
the corresponding response value
- weight :
the weight of the data point
-
feature
(feature_index, sample_index)[source]¶ feature(index_t feature_index, index_t sample_index) const =0 -> num_t
Function for accessing a single feature value, consistency checks might be omitted for performance.
- feature_index :
The index of the feature requested
- sample_index :
The index of the data point.
the stored value
-
features
(feature_index, sample_indices)[source]¶ - `features(index_t feature_index, const std::vector< index_t > &sample_indices)
- const =0 -> std::vector< num_t >`
member function for accessing the feature values of multiple data points at once, consistency checks might be omitted for performance
- feature_index :
The index of the feature requested
- sample_indices :
The indices of the data point.
the stored values
-
get_bounds_of_feature
(feature_index)[source]¶ - `get_bounds_of_feature(index_t feature_index) const =0 -> std::pair< num_t,
- num_t >`
query the allowed interval for a feature; applies only to continuous variables
- feature_index :
the index of the feature
std::pair<num_t,num_t> interval of allowed values
-
get_type_of_feature
(feature_index)[source]¶ get_type_of_feature(index_t feature_index) const =0 -> index_t
query the type of a feature
- feature_index :
the index of the feature
int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}
-
get_type_of_response
()[source]¶ get_type_of_response() const =0 -> index_t
query the type of the response
index_t type of the response: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}
-
num_data_points
()[source]¶ num_data_points() const =0 -> index_t
the number of data points in the container
-
num_features
()[source]¶ num_features() const =0 -> index_t
the number of features of every datapoint in the container
-
response
(sample_index)[source]¶ response(index_t sample_index) const =0 -> response_t
member function to query a single response value, consistency checks might be omitted for performance
- sample_index :
the response of which data point
the response value
-
retrieve_data_point
(index)[source]¶ retrieve_data_point(index_t index) const =0 -> std::vector< num_t >
method to retrieve a data point
- index :
index of the datapoint to extract
std::vector<num_t> the features of the data point
-
set_bounds_of_feature
(feature_index, min, max)[source]¶ set_bounds_of_feature(index_t feature_index, num_t min, num_t max)=0
specifies the interval of allowed values for a feature
To marginalize out certain feature dimensions using non-i.i.d. data, the numerical bounds on each variable have to be known. This only applies to numerical features.
Note: The forest will not check if a datapoint is consistent with the specified bounds!
- feature_index :
feature_index the index of the feature
- min :
the smallest value for the feature
- max :
the largest value for the feature
-
set_type_of_feature
(feature_index, feature_type)[source]¶ set_type_of_feature(index_t feature_index, index_t feature_type)=0
specifying the type of a feature
- feature_index :
the index of the feature whose type is specified
- feature_type :
the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}
-
set_type_of_response
(response_type)[source]¶ set_type_of_response(index_t response_type)=0
specifying the type of the response
- response_type :
the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
default_data_container
(num_f)[source]¶ Bases:
pyrfr.regression.data_base
A data container for mostly continuous data.
It might happen that only a small fraction of all features is categorical. In that case it would be wasteful to store the type of every feature separately. Instead, this data_container only stores the non-continuous ones in a hash-map.
C++ includes: default_data_container.hpp
-
add_data_point
(features, response, weight=1)[source]¶ - `add_data_point(std::vector< num_t > features, response_t response, num_t
- weight=1)`
method to add a single data point
- features :
a vector containing the features
- response :
the corresponding response value
- weight :
the weight of the data point
-
feature
(feature_index, sample_index)[source]¶ feature(index_t feature_index, index_t sample_index) const -> num_t
Function for accessing a single feature value, consistency checks might be omitted for performance.
- feature_index :
The index of the feature requested
- sample_index :
The index of the data point.
the stored value
-
features
(feature_index, sample_indices)[source]¶ - `features(index_t feature_index, const std::vector< index_t > &sample_indices)
- const -> std::vector< num_t >`
member function for accessing the feature values of multiple data points at once, consistency checks might be omitted for performance
- feature_index :
The index of the feature requested
- sample_indices :
The indices of the data point.
the stored values
-
get_bounds_of_feature
(feature_index)[source]¶ - `get_bounds_of_feature(index_t feature_index) const -> std::pair< num_t, num_t
- >`
query the allowed interval for a feature; applies only to continuous variables
- feature_index :
the index of the feature
std::pair<num_t,num_t> interval of allowed values
-
get_min_max_of_feature
(feature_index)[source]¶ - `get_min_max_of_feature(index_t feature_index) const -> std::pair< num_t, num_t
- >`
-
get_type_of_feature
(feature_index)[source]¶ get_type_of_feature(index_t feature_index) const -> index_t
query the type of a feature
- feature_index :
the index of the feature
int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}
As most features are assumed to be numerical, it is actually beneficial to store only the categorical exceptions in a hash-map. Type = 0 means continuous, and Type = n >= 1 means categorical with options in {0, n-1}.
- feature_index :
the index of the feature
int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {1,2,...,n}
-
get_type_of_response
()[source]¶ get_type_of_response() const -> index_t
query the type of the response
index_t type of the response: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}
-
import_csv_files
(*args)[source]¶ - `import_csv_files(const std::string &feature_file, const std::string
- &response_file, std::string weight_file=””) -> int`
-
num_data_points
()[source]¶ num_data_points() const -> index_t
the number of data points in the container
-
num_features
()[source]¶ num_features() const -> index_t
the number of features of every datapoint in the container
-
response
(sample_index)[source]¶ response(index_t sample_index) const -> response_t
member function to query a single response value, consistency checks might be omitted for performance
- sample_index :
the response of which data point
the response value
-
retrieve_data_point
(index)[source]¶ retrieve_data_point(index_t index) const -> std::vector< num_t >
method to retrieve a data point
- index :
index of the datapoint to extract
std::vector<num_t> the features of the data point
-
set_bounds_of_feature
(feature_index, min, max)[source]¶ set_bounds_of_feature(index_t feature_index, num_t min, num_t max)
specifies the interval of allowed values for a feature
To marginalize out certain feature dimensions using non-i.i.d. data, the numerical bounds on each variable have to be known. This only applies to numerical features.
Note: The forest will not check if a datapoint is consistent with the specified bounds!
- feature_index :
feature_index the index of the feature
- min :
the smallest value for the feature
- max :
the largest value for the feature
-
set_type_of_feature
(index, type)[source]¶ set_type_of_feature(index_t index, index_t type)
specifying the type of a feature
- feature_index :
the index of the feature whose type is specified
- feature_type :
the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}
-
set_type_of_response
(resp_t)[source]¶ set_type_of_response(index_t resp_t)
specifying the type of the response
- response_type :
the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
default_data_container_with_instances
(*args)[source]¶ Bases:
pyrfr.regression.data_base
A data container for mostly continuous data with instances.
Similar to the mostly_continuous_data container, but with the capability to handle instance features.
C++ includes: default_data_container_with_instances.hpp
-
add_configuration
(config_features)[source]¶ add_configuration(const std::vector< num_t > &config_features) -> index_t
-
add_data_point
(*args)[source]¶ - `add_data_point(index_t config_index, index_t instance_index, response_t r,
- num_t weight=1)`
-
add_instance
(instance_features)[source]¶ add_instance(const std::vector< num_t > instance_features) -> index_t
-
feature
(feature_index, sample_index)[source]¶ feature(index_t feature_index, index_t sample_index) const -> num_t
Function for accessing a single feature value, consistency checks might be omitted for performance.
- feature_index :
The index of the feature requested
- sample_index :
The index of the data point.
the stored value
-
features
(feature_index, sample_indices)[source]¶ - `features(index_t feature_index, const std::vector< index_t > &sample_indices)
- const -> std::vector< num_t >`
member function for accessing the feature values of multiple data points at once, consistency checks might be omitted for performance
- feature_index :
The index of the feature requested
- sample_indices :
The indices of the data point.
the stored values
-
get_bounds_of_feature
(feature_index)[source]¶ - `get_bounds_of_feature(index_t feature_index) const -> std::pair< num_t, num_t
- >`
query the allowed interval for a feature; applies only to continuous variables
- feature_index :
the index of the feature
std::pair<num_t,num_t> interval of allowed values
-
get_configuration_set
(configuration_index)[source]¶ get_configuration_set(num_t configuration_index) -> std::vector< num_t >
-
get_features_by_configuration_and_instance
(configuration_index, instance_index)[source]¶ - `get_features_by_configuration_and_instance(num_t configuration_index, num_t
- instance_index) -> std::vector< num_t >`
-
get_instance_set
()[source]¶ get_instance_set() -> std::vector< num_t >
method to get instance as set_feature for predict_mean_var_of_mean_response_on_set method in regression forest
-
get_type_of_feature
(feature_index)[source]¶ get_type_of_feature(index_t feature_index) const -> index_t
query the type of a feature
- feature_index :
the index of the feature
int type of the feature: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}
-
get_type_of_response
()[source]¶ get_type_of_response() const -> index_t
query the type of the response
index_t type of the response: 0 - numerical value (float or int); n>0 - categorical value with n different values {0,1,...,n-1}
-
num_data_points
()[source]¶ num_data_points() const -> index_t
the number of data points in the container
-
num_features
()[source]¶ num_features() const -> index_t
the number of features of every datapoint in the container
-
response
(sample_index)[source]¶ response(index_t sample_index) const -> response_t
member function to query a single response value, consistency checks might be omitted for performance
- sample_index :
the response of which data point
the response value
-
retrieve_data_point
(index)[source]¶ retrieve_data_point(index_t index) const -> std::vector< num_t >
method to retrieve a data point
- index :
index of the datapoint to extract
std::vector<num_t> the features of the data point
-
set_bounds_of_feature
(feature_index, min, max)[source]¶ set_bounds_of_feature(index_t feature_index, num_t min, num_t max)
specifies the interval of allowed values for a feature
To marginalize out certain feature dimensions using non-i.i.d. data, the numerical bounds on each variable have to be known. This only applies to numerical features.
Note: The forest will not check if a datapoint is consistent with the specified bounds!
- feature_index :
feature_index the index of the feature
- min :
the smallest value for the feature
- max :
the largest value for the feature
-
set_type_of_configuration_feature
(index, type)[source]¶ set_type_of_configuration_feature(index_t index, index_t type)
-
set_type_of_feature
(index, type)[source]¶ set_type_of_feature(index_t index, index_t type)
specifying the type of a feature
- feature_index :
the index of the feature whose type is specified
- feature_type :
the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}
-
set_type_of_instance_feature
(index, type)[source]¶ set_type_of_instance_feature(index_t index, index_t type)
-
set_type_of_response
(resp_t)[source]¶ set_type_of_response(index_t resp_t)
specifying the type of the response
- response_type :
the actual type (0 - numerical, value >0 catergorical with values from {0,1,...value-1}
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
default_random_engine
(*args)[source]¶ Bases:
object
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
fanova_forest
(*args)[source]¶ Bases:
pyrfr.regression.fanova_forest_prototype
-
all_leaf_values
(feature_vector)¶ - `all_leaf_values(const std::vector< num_t > &feature_vector) const ->
- std::vector< std::vector< num_t > >`
-
all_split_values
()[source]¶ all_split_values() -> std::vector< std::vector< std::vector< num_t > > >
-
ascii_string_representation
()¶ ascii_string_representation() -> std::string
-
covariance
(f1, f2)¶ - `covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
- num_t`
-
fit
(data, rng)[source]¶ - `fit(const rfr::data_containers::base< num_t, response_t, index_t > &data, rng_t
- &rng)`
growing the random forest for a given data set
- data :
a filled data container
- rng :
the random number generator to be used
-
load_from_ascii_string
(str)¶ load_from_ascii_string(std::string const &str)
-
load_from_binary_file
(filename)¶ load_from_binary_file(const std::string filename)
-
marginal_mean_prediction
(feature_vector)[source]¶ marginal_mean_prediction(const std::vector< num_t > &feature_vector) -> num_t
-
marginal_mean_variance_prediction
(feature_vector)[source]¶ - `marginal_mean_variance_prediction(const std::vector< num_t > &feature_vector)
- -> std::pair< num_t, num_t >`
-
marginal_prediction_stat_of_tree
(tree_index, feature_vector)[source]¶ - `marginal_prediction_stat_of_tree(index_t tree_index, const std::vector< num_t >
- &feature_vector) -> rfr::util::weighted_running_statistics< num_t >`
-
num_trees
()¶ num_trees() -> unsigned int
-
options
¶
-
out_of_bag_error
()¶ out_of_bag_error() -> num_t
-
predict
(feature_vector)¶ predict(const std::vector< num_t > &feature_vector) const -> response_t
-
predict_mean_var
(feature_vector, weighted_data=False)¶ - `predict_mean_var(const std::vector< num_t > &feature_vector, bool
- weighted_data=false) -> std::pair< num_t, num_t >`
-
print_info
()¶ print_info()
-
pseudo_downdate
(features, response, weight)¶ - `pseudo_downdate(std::vector< num_t > features, response_t response, num_t
- weight)`
-
pseudo_update
(features, response, weight)¶ - `pseudo_update(std::vector< num_t > features, response_t response, num_t
- weight)`
-
save_latex_representation
(filename_template)¶ save_latex_representation(const std::string filename_template)
-
save_to_binary_file
(filename)¶ save_to_binary_file(const std::string filename)
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
fanova_forest_prototype
(*args)[source]¶ Bases:
object
- options : forest_options< num_t, response_t, index_t >
-
all_leaf_values
(feature_vector)[source]¶ - `all_leaf_values(const std::vector< num_t > &feature_vector) const ->
- std::vector< std::vector< num_t > >`
-
covariance
(f1, f2)[source]¶ - `covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
- num_t`
-
fit
(data, rng)[source]¶ - `fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
- rng_type &rng)`
growing the random forest for a given data set
- data :
a filled data container
- rng :
the random number generator to be used
-
kernel
(f1, f2)[source]¶ - `kernel(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
- num_t`
-
options
¶
-
predict
(feature_vector)[source]¶ predict(const std::vector< num_t > &feature_vector) const -> response_t
-
predict_mean_var
(feature_vector, weighted_data=False)[source]¶ - `predict_mean_var(const std::vector< num_t > &feature_vector, bool
- weighted_data=false) -> std::pair< num_t, num_t >`
-
pseudo_downdate
(features, response, weight)[source]¶ - `pseudo_downdate(std::vector< num_t > features, response_t response, num_t
- weight)`
-
pseudo_update
(features, response, weight)[source]¶ - `pseudo_update(std::vector< num_t > features, response_t response, num_t
- weight)`
-
save_latex_representation
(filename_template)[source]¶ save_latex_representation(const std::string filename_template)
-
thisown
¶ The membership flag
-
class
pyrfr.regression.
forest_opts
(*args)[source]¶ Bases:
object
- num_trees : index_t
number of trees in the forest
- num_data_points_per_tree : index_t
number of datapoints used in each tree
- do_bootstrapping : bool
flag to toggle bootstrapping
- compute_oob_error : bool
flag to enable/disable computing the out-of-bag error
- tree_opts : rfr::trees::tree_options< num_t, response_t, index_t >
the options for each tree
-
adjust_limits_to_data
(data)[source]¶ - `adjust_limits_to_data(const rfr::data_containers::base< num_t, response_t,
- index_t > &data)`
adjusts all relevant variables to the data
-
compute_oob_error
¶
-
do_bootstrapping
¶
-
num_data_points_per_tree
¶
-
num_trees
¶
-
thisown
¶ The membership flag
-
tree_opts
¶
-
class
pyrfr.regression.
num_num_pair
(*args)[source]¶ Bases:
object
-
first
¶
-
second
¶
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
num_vector_vector_vector
(*args)[source]¶ Bases:
object
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
qr_forest
(*args)[source]¶ Bases:
pyrfr.regression.binary_rss_forest
-
all_leaf_values
(feature_vector)¶ - `all_leaf_values(const std::vector< num_t > &feature_vector) const ->
- std::vector< std::vector< num_t > >`
-
ascii_string_representation
()¶ ascii_string_representation() -> std::string
-
covariance
(f1, f2)¶ - `covariance(const std::vector< num_t > &f1, const std::vector< num_t > &f2) ->
- num_t`
-
fit
(data, rng)¶ - `fit(const rfr::data_containers::base< num_t, response_t, index_t > &data,
- rng_type &rng)`
growing the random forest for a given data set
- data :
a filled data container
- rng :
the random number generator to be used
-
load_from_ascii_string
(str)¶ load_from_ascii_string(std::string const &str)
-
load_from_binary_file
(filename)¶ load_from_binary_file(const std::string filename)
-
num_trees
()¶ num_trees() -> unsigned int
-
options
¶
-
out_of_bag_error
()¶ out_of_bag_error() -> num_t
-
predict
(feature_vector)¶ predict(const std::vector< num_t > &feature_vector) const -> response_t
-
predict_mean_var
(feature_vector, weighted_data=False)¶ - `predict_mean_var(const std::vector< num_t > &feature_vector, bool
- weighted_data=false) -> std::pair< num_t, num_t >`
-
predict_quantiles
(feature_vector, quantiles)[source]¶ - `predict_quantiles(const std::vector< num_t > &feature_vector, std::vector<
- num_t > quantiles) const -> std::vector< num_t >`
-
print_info
()¶ print_info()
-
pseudo_downdate
(features, response, weight)¶ - `pseudo_downdate(std::vector< num_t > features, response_t response, num_t
- weight)`
-
pseudo_update
(features, response, weight)¶ - `pseudo_update(std::vector< num_t > features, response_t response, num_t
- weight)`
-
save_latex_representation
(filename_template)¶ save_latex_representation(const std::string filename_template)
-
save_to_binary_file
(filename)¶ save_to_binary_file(const std::string filename)
-
thisown
¶ The membership flag
-
-
class
pyrfr.regression.
tree_opts
(*args)[source]¶ Bases:
object
- max_features : index_type
number of features to consider for each split
- max_depth : index_type
maximum depth for the tree
- min_samples_to_split : index_type
minumum number of samples to try splitting
- min_samples_in_leaf : index_type
minimum number of samples in a leaf
- min_weight_in_leaf : num_type
minimum total sample weights in a leaf
- max_num_nodes : index_type
maxmimum total number of nodes in the tree
- max_num_leaves : index_type
maxmimum total number of leaves in the tree
- epsilon_purity : response_type
minimum difference between two response values to be considered different*/
-
adjust_limits_to_data
(data)[source]¶ - `adjust_limits_to_data(const rfr::data_containers::base< num_type,
- response_type, index_type > &data)`
-
epsilon_purity
¶
-
max_depth
¶
-
max_features
¶
-
max_num_leaves
¶
-
max_num_nodes
¶
-
min_samples_in_leaf
¶
-
min_samples_to_split
¶
-
min_weight_in_leaf
¶
-
set_default_values
()[source]¶ set_default_values()
(Re)set to default values with no limits on the size of the tree
If nothing is know about the data, this member can be used to get a valid setting for the tree_options struct. But beware this setting could lead to a huge tree depending on the amount of data. There is no limit to the size, and nodes are split into pure leafs. For each split, every feature is considered! This not only slows the training down, but also makes this tree deterministic!
-
thisown
¶ The membership flag