Distances
Distance functions.
This module contains functions for calculating the distance between two vectors.
DistanceMetric: TypeAlias
module-attribute
#
A metric used for calculating distances.
Takes two arrays-like objects and returns a float.
l1_distance
module-attribute
#
Calculates the l1 distance between each column in x and y.
The l1 distance is defined as:
`||x - y||_1 = sum_i(|x_i - y_i|)`
This is the sum of the absolute differences between each element in x and y.
See Also
l2_distance
module-attribute
#
Calculates the l2 distance between each column in x and y.
The l2 distance is defined as:
`||x - y||_2 = sqrt(sum_i(|x_i - y_i|^2))`
This is the square root of the sum of the squared differences between each element in x and y.
See Also
linf_distance
module-attribute
#
Calculates the linf distance between each column in x and y.
The linf distance is defined as:
`||x - y||_inf = max_i(|x_i - y_i|)`
This is the maximum absolute difference between each element in x and y.
See Also
euclidean_distance
module-attribute
#
Calculates the euclidean distance between each column in x and y.
Same as l2_distance()
.
NamedDistance: TypeAlias
module-attribute
#
Predefined distance metrics.
Possible values are:
"l1"
:l1_distance()
"l2"
:l2_distance()
"euclidean"
:euclidean_distance()
"cosine"
:cosine_distance()
"max"
:linf_distance()
class NearestNeighborsDistance(**nn_kwargs)
#
Uses sklearn.neighbors.NearestNeighbors to calculate the distance.
PARAMETER | DESCRIPTION |
---|---|
**nn_kwargs |
Keyword arguments to pass to sklearn.neighbors.NearestNeighbors.
TYPE:
|
Source code in src/amltk/distances.py
def __call__(x, y)
#
Calculates the distance between each column in x and y.
PARAMETER | DESCRIPTION |
---|---|
x |
An array-like with columns being the features and rows being the samples.
TYPE:
|
y |
A array with the same index as x.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[floating]
|
An array with the same index as x. |
Source code in src/amltk/distances.py
def pnorm(x, y, p=2)
#
Calculates the p-norm between each column in x and y.
The p-norm is defined as:
`||x - y||_p = (sum_i(|x_i - y_i|^p))^(1/p)`
The common values for p are 1, 2 and infinity.
Using a partial
To use this function with
dataset_distance()
,
you can wrap this in functools.partial()
.
from functools import partial
from amltk.metalearning import dataset_distance
from amltk.distances import pnorm
dataset_distance(
target,
dataset_metafeatures,
method=partial(pnorm, p=3), # (1)!
)
partial()
creates a new function with thep
argument set to 3.
PARAMETER | DESCRIPTION |
---|---|
x |
The vector to compare.
TYPE:
|
y |
The vector to compute the distance to
TYPE:
|
p |
The p in p-norm. |
RETURNS | DESCRIPTION |
---|---|
float
|
A series with the same index as x. |
Source code in src/amltk/distances.py
def cosine_distance(x, y)
#
Calculates the cosine distance between each column in x and y.
The cosine distance is defined as 1 - cosine_similarity. This means the distance is 0 when the vectors are identical, 1 when orthogonal and 2 when they are opposite.
PARAMETER | DESCRIPTION |
---|---|
x |
A dataframe with columns being the features and rows being the samples.
TYPE:
|
y |
A series with the same index as x.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A series with the same index as x. |