Skip to content

Profiler

Whether for debugging, building an AutoML system or for optimization purposes, we provide a powerful Profiler, which can generate a Profile of different sections of code. This is particularly useful with Trials, so much so that we attach one to every Trial made as trial.profiler.

When done profiling, you can export all generated profiles as a dataframe using profiler.df().

from amltk.profiling import Profiler
import numpy as np

profiler = Profiler()

with profiler("loading-data"):
    X = np.random.rand(1000, 1000)

with profiler("training-model"):
    model = np.linalg.inv(X)

with profiler("predicting"):
    y = model @ X

print(profiler.df())
                memory:start_vms  memory:end_vms  ...  time:kind  time:unit
loading-data        1.953935e+09      1961938944  ...       wall    seconds
training-model      1.961939e+09      2023387136  ...       wall    seconds
predicting          2.023387e+09      2031386624  ...       wall    seconds

[3 rows x 12 columns]

You'll find these profiles as keys in the Profiler, e.g. python profiler["loading-data"].

This will measure both the time it took within the block but also the memory consumed before and after the block finishes, allowing you to get an estimate of the memory consumed.

Memory, vms vs rms

While not entirely accurate, this should be enough for info for most use cases.

Given the main process uses 2GB of memory and the process then spawns a new process in which you are profiling, as you might do from a Task. In this new process you use another 2GB on top of that, then:

  • The virtual memory size (vms) will show 4GB as the new process will share the 2GB with the main process and have it's own 2GB.

  • The resident set size (rss) will show 2GB as the new process will only have 2GB of it's own memory.

If you need to profile some iterator, like a for loop, you can use Profiler.each() which will measure the entire loop but also each individual iteration. This can be useful for iterating batches of a deep-learning model, splits of a cross-validator or really any loop with work you want to profile.

from amltk.profiling import Profiler
import numpy as np

profiler = Profiler()

for i in profiler.each(range(3), name="for-loop"):
    X = np.random.rand(1000, 1000)

print(profiler.df())
            memory:start_vms  memory:end_vms  ...  time:kind  time:unit
for-loop        2.015379e+09      2023378944  ...       wall    seconds
for-loop:0      2.015379e+09      2015379456  ...       wall    seconds
for-loop:1      2.015379e+09      2023378944  ...       wall    seconds
for-loop:2      2.023379e+09      2023378944  ...       wall    seconds

[4 rows x 12 columns]

Lastly, to disable profiling without editing much code, you can always use Profiler.disable() and Profiler.enable() to toggle profiling on and off.

class Profile
dataclass
#

A profiler for measuring statistics between two events.

class Interval
dataclass
#

A class for representing a profiled interval.

def to_dict(*, prefix='') #

Convert the profile interval to a dictionary.

Source code in src/amltk/profiling/profiler.py
def to_dict(self, *, prefix: str = "") -> dict[str, Any]:
    """Convert the profile interval to a dictionary."""
    _prefix = "" if prefix == "" else f"{prefix}:"
    return {
        **self.memory.to_dict(prefix=f"{_prefix}memory:"),
        **self.time.to_dict(prefix=f"{_prefix}time:"),
    }

def from_dict(d)
classmethod
#

Create a profile interval from a dictionary.

PARAMETER DESCRIPTION
d

The dictionary to create from.

TYPE: Mapping[str, Any]

RETURNS DESCRIPTION
Interval

The profile interval.

Source code in src/amltk/profiling/profiler.py
@classmethod
def from_dict(cls, d: Mapping[str, Any]) -> Profile.Interval:
    """Create a profile interval from a dictionary.

    Args:
        d: The dictionary to create from.

    Returns:
        The profile interval.
    """
    return Profile.Interval(
        memory=Memory.from_dict(mapping_select(d, "memory:")),
        time=Timer.from_dict(mapping_select(d, "time:")),
    )

def measure(*, memory_unit='B', time_kind='wall')
classmethod
#

Profile a block of code.

Note

  • See Memory for more information on memory.
  • See Timer for more information on timing.
PARAMETER DESCRIPTION
memory_unit

The unit of memory to use.

TYPE: Unit | Literal['B', 'KB', 'MB', 'GB'] DEFAULT: 'B'

time_kind

The type of timer to use.

TYPE: Kind | Literal['wall', 'cpu', 'process'] DEFAULT: 'wall'

YIELDS DESCRIPTION
Interval

The Profiler Interval. Memory and Timings will not be valid until

Interval

the context manager is exited.

Source code in src/amltk/profiling/profiler.py
@classmethod
@contextmanager
def measure(
    cls,
    *,
    memory_unit: Memory.Unit | Literal["B", "KB", "MB", "GB"] = "B",
    time_kind: Timer.Kind | Literal["wall", "cpu", "process"] = "wall",
) -> Iterator[Profile.Interval]:
    """Profile a block of code.

    !!! note

        * See [`Memory`][amltk.profiling.Memory] for more information on memory.
        * See [`Timer`][amltk.profiling.Timer] for more information on timing.

    Args:
        memory_unit: The unit of memory to use.
        time_kind: The type of timer to use.

    Yields:
        The Profiler Interval. Memory and Timings will not be valid until
        the context manager is exited.
    """
    with Memory.measure(unit=memory_unit) as memory, Timer.time(
        kind=time_kind,
    ) as timer:
        yield Profile.Interval(memory=memory, time=timer)

def start(memory_unit='B', time_kind='wall')
classmethod
#

Start a memory tracker.

Note

  • See Memory for more information on memory.
  • See Timer for more information on timing.
PARAMETER DESCRIPTION
memory_unit

The unit of memory to use.

TYPE: Unit | Literal['B', 'KB', 'MB', 'GB'] DEFAULT: 'B'

time_kind

The type of timer to use.

TYPE: Kind | Literal['wall', 'cpu', 'process'] DEFAULT: 'wall'

RETURNS DESCRIPTION
Profile

The Memory tracker.

Source code in src/amltk/profiling/profiler.py
@classmethod
def start(
    cls,
    memory_unit: Memory.Unit | Literal["B", "KB", "MB", "GB"] = "B",
    time_kind: Timer.Kind | Literal["wall", "cpu", "process"] = "wall",
) -> Profile:
    """Start a memory tracker.

    !!! note

        * See [`Memory`][amltk.profiling.Memory] for more information on memory.
        * See [`Timer`][amltk.profiling.Timer] for more information on timing.

    Args:
        memory_unit: The unit of memory to use.
        time_kind: The type of timer to use.

    Returns:
        The Memory tracker.
    """
    return Profile(
        timer=Timer.start(kind=time_kind),
        memory=Memory.start(unit=memory_unit),
    )

def stop() #

Stop the memory tracker.

RETURNS DESCRIPTION
Interval

The memory interval.

Source code in src/amltk/profiling/profiler.py
def stop(self) -> Profile.Interval:
    """Stop the memory tracker.

    Returns:
        The memory interval.
    """
    return Profile.Interval(
        memory=self.memory.stop(),
        time=self.timer.stop(),
    )

def na()
classmethod
#

Create a profile interval that represents NA.

Source code in src/amltk/profiling/profiler.py
@classmethod
def na(cls) -> Profile.Interval:
    """Create a profile interval that represents NA."""
    return Profile.Interval(memory=Memory.na(), time=Timer.na())

class Profiler
dataclass
#

Bases: Mapping[str, Interval]

Profile and record various events.

Note

  • See Memory for more information on memory.
  • See Timer for more information on timing.
PARAMETER DESCRIPTION
memory_unit

The default unit of memory to use.

TYPE: Unit | Literal['B', 'KB', 'MB', 'GB'] DEFAULT: 'B'

time_kind

The default type of timer to use.

TYPE: Kind | Literal['wall', 'cpu', 'process'] DEFAULT: 'wall'

def __getitem__(key) #

Get a profile interval.

Source code in src/amltk/profiling/profiler.py
@override
def __getitem__(self, key: str) -> Profile.Interval:
    """Get a profile interval."""
    return self.profiles[key]

def __iter__() #

Iterate over the profile names.

Source code in src/amltk/profiling/profiler.py
@override
def __iter__(self) -> Iterator[str]:
    """Iterate over the profile names."""
    return iter(self.profiles)

def __len__() #

Get the number of profiles.

Source code in src/amltk/profiling/profiler.py
@override
def __len__(self) -> int:
    """Get the number of profiles."""
    return len(self.profiles)

def disable() #

Disable the profiler.

Source code in src/amltk/profiling/profiler.py
def disable(self) -> None:
    """Disable the profiler."""
    self.disabled = True

def enable() #

Enable the profiler.

Source code in src/amltk/profiling/profiler.py
def enable(self) -> None:
    """Enable the profiler."""
    self.disabled = False

def each(itr, *, name, itr_name=None) #

Profile each item in an iterable.

PARAMETER DESCRIPTION
itr

The iterable to profile.

TYPE: Iterable[T]

name

The name of the profile that lasts until iteration is complete

TYPE: str

itr_name

The name of the profile for each iteration. If a function is provided, it will be called with each item's index and the item. It should return a string. If None is provided, just the index will be used.

TYPE: Callable[[int, T], str] | None DEFAULT: None

YIELDS DESCRIPTION
T

The the items

Source code in src/amltk/profiling/profiler.py
def each(
    self,
    itr: Iterable[T],
    *,
    name: str,
    itr_name: Callable[[int, T], str] | None = None,
) -> Iterator[T]:
    """Profile each item in an iterable.

    Args:
        itr: The iterable to profile.
        name: The name of the profile that lasts until iteration is complete
        itr_name: The name of the profile for each iteration.
            If a function is provided, it will be called with each item's index
            and the item. It should return a string. If `None` is provided,
            just the index will be used.

    Yields:
        The the items
    """
    if itr_name is None:
        itr_name = lambda i, _: str(i)
    with self.measure(name=name):
        for i, item in enumerate(itr):
            with self.measure(name=itr_name(i, item)):
                yield item

def __call__(name, *, memory_unit=None, time_kind=None) #

Profile a block of code. Store the result on this object.

Note

  • See Memory for more information on memory.
  • See Timer for more information on timing.
PARAMETER DESCRIPTION
name

The name of the profile.

TYPE: str

memory_unit

The unit of memory to use. Overwrites the default.

TYPE: Unit | Literal['B', 'KB', 'MB', 'GB'] | None DEFAULT: None

time_kind

The type of timer to use. Overwrites the default.

TYPE: Kind | Literal['wall', 'cpu', 'process'] | None DEFAULT: None

Source code in src/amltk/profiling/profiler.py
@contextmanager
def measure(
    self,
    name: str,
    *,
    memory_unit: Memory.Unit | Literal["B", "KB", "MB", "GB"] | None = None,
    time_kind: Timer.Kind | Literal["wall", "cpu", "process"] | None = None,
) -> Iterator[None]:
    """Profile a block of code. Store the result on this object.

    !!! note

        * See [`Memory`][amltk.profiling.Memory] for more information on memory.
        * See [`Timer`][amltk.profiling.Timer] for more information on timing.

    Args:
        name: The name of the profile.
        memory_unit: The unit of memory to use. Overwrites the default.
        time_kind: The type of timer to use. Overwrites the default.
    """
    if self.disabled:
        yield
        return

    memory_unit = memory_unit or self.memory_unit
    time_kind = time_kind or self.time_kind

    self._running.append(name)
    entry_name = ":".join(self._running)

    with Profile.measure(memory_unit=memory_unit, time_kind=time_kind) as profile:
        self.profiles[entry_name] = profile
        yield

    self._running.pop()
Source code in src/amltk/profiling/profiler.py
@contextmanager
def __call__(
    self,
    name: str,
    *,
    memory_unit: Memory.Unit | Literal["B", "KB", "MB", "GB"] | None = None,
    time_kind: Timer.Kind | Literal["wall", "cpu", "process"] | None = None,
) -> Iterator[None]:
    """::: amltk.profiling.Profiler.measure"""  # noqa: D415
    with self.measure(name, memory_unit=memory_unit, time_kind=time_kind):
        yield

def measure(name, *, memory_unit=None, time_kind=None) #

Profile a block of code. Store the result on this object.

Note

  • See Memory for more information on memory.
  • See Timer for more information on timing.
PARAMETER DESCRIPTION
name

The name of the profile.

TYPE: str

memory_unit

The unit of memory to use. Overwrites the default.

TYPE: Unit | Literal['B', 'KB', 'MB', 'GB'] | None DEFAULT: None

time_kind

The type of timer to use. Overwrites the default.

TYPE: Kind | Literal['wall', 'cpu', 'process'] | None DEFAULT: None

Source code in src/amltk/profiling/profiler.py
@contextmanager
def measure(
    self,
    name: str,
    *,
    memory_unit: Memory.Unit | Literal["B", "KB", "MB", "GB"] | None = None,
    time_kind: Timer.Kind | Literal["wall", "cpu", "process"] | None = None,
) -> Iterator[None]:
    """Profile a block of code. Store the result on this object.

    !!! note

        * See [`Memory`][amltk.profiling.Memory] for more information on memory.
        * See [`Timer`][amltk.profiling.Timer] for more information on timing.

    Args:
        name: The name of the profile.
        memory_unit: The unit of memory to use. Overwrites the default.
        time_kind: The type of timer to use. Overwrites the default.
    """
    if self.disabled:
        yield
        return

    memory_unit = memory_unit or self.memory_unit
    time_kind = time_kind or self.time_kind

    self._running.append(name)
    entry_name = ":".join(self._running)

    with Profile.measure(memory_unit=memory_unit, time_kind=time_kind) as profile:
        self.profiles[entry_name] = profile
        yield

    self._running.pop()

def df() #

Convert the profiler to a dataframe.

Source code in src/amltk/profiling/profiler.py
def df(self) -> pd.DataFrame:
    """Convert the profiler to a dataframe."""
    return pd.DataFrame.from_dict(
        {k: v.to_dict() for k, v in self.profiles.items()},
        orient="index",
    )

def __rich__() #

Render the profiler.

Source code in src/amltk/profiling/profiler.py
def __rich__(self) -> RenderableType:
    """Render the profiler."""
    from amltk._richutil import df_to_table

    _df = self.df()
    return df_to_table(_df, title="Profiler", index_style="bold")