Distances

Distance functions.

This module contains functions for calculating the distance between two vectors.

`DistanceMetric: TypeAlias`
`module-attribute` #

A metric used for calculating distances.

Takes two arrays-like objects and returns a float.

`l1_distance`
`module-attribute` #

Calculates the l1 distance between each column in x and y.

The l1 distance is defined as:

`||x - y||_1 = sum_i(|x_i - y_i|)`

This is the sum of the absolute differences between each element in x and y.

`l2_distance`
`module-attribute` #

Calculates the l2 distance between each column in x and y.

The l2 distance is defined as:

`||x - y||_2 = sqrt(sum_i(|x_i - y_i|^2))`

This is the square root of the sum of the squared differences between each element in x and y.

`linf_distance`
`module-attribute` #

Calculates the linf distance between each column in x and y.

The linf distance is defined as:

`||x - y||_inf = max_i(|x_i - y_i|)`

This is the maximum absolute difference between each element in x and y.

`euclidean_distance`
`module-attribute` #

Calculates the euclidean distance between each column in x and y.

Same as l2_distance().

`NamedDistance: TypeAlias`
`module-attribute` #

Predefined distance metrics.

Possible values are:

"l1": l1_distance()
"l2": l2_distance()
"euclidean": euclidean_distance()
"cosine": cosine_distance()
"max": linf_distance()

`class NearestNeighborsDistance(**nn_kwargs)` #

Uses sklearn.neighbors.NearestNeighbors to calculate the distance.

PARAMETER	DESCRIPTION
`**nn_kwargs`	Keyword arguments to pass to sklearn.neighbors.NearestNeighbors. TYPE: `Any` DEFAULT: `{}`

Source code in src/amltk/distances.py

def __init__(self, **nn_kwargs: Any):
    """Creates a new NearestNeighborsDistance.

    Args:
        **nn_kwargs: Keyword arguments to pass to
            [sklearn.neighbors.NearestNeighbors][].
    """
    super().__init__()
    self.nn_kwargs = nn_kwargs

`def call(x, y)` #

Calculates the distance between each column in x and y.

PARAMETER	DESCRIPTION
`x`	An array-like with columns being the features and rows being the samples. TYPE: `ArrayLike`
`y`	A array with the same index as x. TYPE: `ArrayLike`

RETURNS	DESCRIPTION
`NDArray[floating]`	An array with the same index as x.

Source code in src/amltk/distances.py

def __call__(
    self,
    x: npt.ArrayLike,
    y: npt.ArrayLike,
) -> npt.NDArray[np.floating]:
    """Calculates the distance between each column in x and y.

    Args:
        x: An array-like with columns being the features and rows being the samples.
        y: A array with the same index as x.

    Returns:
        An array with the same index as x.
    """
    from sklearn.neighbors import NearestNeighbors

    self.nn = NearestNeighbors(**self.nn_kwargs)

    _x = np.asarray(x)
    _y = np.asarray(y)

    if _y.ndim != 1:
        raise ValueError(f"y must be a 1-dimensional array. Got shape {_y.shape}")

    _y = _y.reshape(1, -1)
    _x = _x.T

    if _x.ndim == 1:
        _x = np.asarray([_x])

    self.nn.fit(_x)
    distances, _ = self.nn.kneighbors(
        _y,
        n_neighbors=len(_x),
        return_distance=True,
    )
    return np.asarray(distances.reshape(-1), dtype=float)

`def pnorm(x, y, p=2)` #

Calculates the p-norm between each column in x and y.

The p-norm is defined as:

`||x - y||_p = (sum_i(|x_i - y_i|^p))^(1/p)`

The common values for p are 1, 2 and infinity.

l1_distance()
l2_distance()
linf_distance()

Using a partial

To use this function with dataset_distance(), you can wrap this in functools.partial().

from functools import partial
from amltk.metalearning import dataset_distance
from amltk.distances import pnorm

dataset_distance(
    target,
    dataset_metafeatures,
    method=partial(pnorm, p=3), # (1)!
)

partial() creates a new function with the p argument set to 3.

PARAMETER	DESCRIPTION
`x`	The vector to compare. TYPE: `ArrayLike`
`y`	The vector to compute the distance to TYPE: `ArrayLike`
`p`	The p in p-norm. TYPE: `int \| float` DEFAULT: `2`

RETURNS	DESCRIPTION
`float`	A series with the same index as x.

Source code in src/amltk/distances.py

def pnorm(
    x: npt.ArrayLike,
    y: npt.ArrayLike,
    p: int | float = 2,
) -> float:
    """Calculates the p-norm between each column in x and y.

    The p-norm is defined as:

        `||x - y||_p = (sum_i(|x_i - y_i|^p))^(1/p)`

    The common values for p are 1, 2 and infinity.

    * [`l1_distance()`][amltk.distances.l1_distance]
    * [`l2_distance()`][amltk.distances.l2_distance]
    * [`linf_distance()`][amltk.distances.linf_distance]

    !!! tip "Using a `partial`"

        To use this function with
        [`dataset_distance()`][amltk.metalearning.dataset_distance],
        you can wrap this in [`functools.partial()`][functools.partial].

        ```python
        from functools import partial
        from amltk.metalearning import dataset_distance
        from amltk.distances import pnorm

        dataset_distance(
            target,
            dataset_metafeatures,
            method=partial(pnorm, p=3), # (1)!
        )
        ```

        1. [`partial()`][functools.partial] creates a new function with the
        `p` argument set to 3.

    Args:
        x: The vector to compare.
        y: The vector to compute the distance to
        p: The p in p-norm.

    Returns:
        A series with the same index as x.
    """
    x = np.asarray(x)
    y = np.asarray(y)

    if p is np.inf:
        return float(np.max(np.abs(x - y)))

    return float(np.linalg.norm(x - y, ord=p))

`def cosine_distance(x, y)` #

Calculates the cosine distance between each column in x and y.

The cosine distance is defined as 1 - cosine_similarity. This means the distance is 0 when the vectors are identical, 1 when orthogonal and 2 when they are opposite.

PARAMETER	DESCRIPTION
`x`	A dataframe with columns being the features and rows being the samples. TYPE: `ArrayLike`
`y`	A series with the same index as x. TYPE: `ArrayLike`

RETURNS	DESCRIPTION
`float`	A series with the same index as x.

Source code in src/amltk/distances.py

def cosine_distance(x: npt.ArrayLike, y: npt.ArrayLike) -> float:
    """Calculates the cosine distance between each column in x and y.

    The cosine distance is defined as 1 - cosine_similarity. This means
    the distance is 0 when the vectors are identical, 1 when orthogonal
    and 2 when they are opposite.

    Args:
        x: A dataframe with columns being the features and rows being the samples.
        y: A series with the same index as x.

    Returns:
        A series with the same index as x.
    """
    x = np.asarray(x)
    y = np.asarray(y)

    cosine_similarity = np.dot(x, y) / (_norm(x) * _norm(y))
    return float(1 - cosine_similarity)

Distances

DistanceMetric: TypeAlias module-attribute #

l1_distance module-attribute #

l2_distance module-attribute #

linf_distance module-attribute #

euclidean_distance module-attribute #

NamedDistance: TypeAlias module-attribute #

class NearestNeighborsDistance(**nn_kwargs) #

def __call__(x, y) #

def pnorm(x, y, p=2) #

def cosine_distance(x, y) #

`DistanceMetric: TypeAlias`
`module-attribute` #

`l1_distance`
`module-attribute` #

`l2_distance`
`module-attribute` #

`linf_distance`
`module-attribute` #

`euclidean_distance`
`module-attribute` #

`NamedDistance: TypeAlias`
`module-attribute` #

`class NearestNeighborsDistance(**nn_kwargs)` #

`def call(x, y)` #

`def pnorm(x, y, p=2)` #

`def cosine_distance(x, y)` #