Path bucket
A module containing a concreate implementation of a
Bucket
that uses the Path API to store objects.
class PathBucket(path, *, loaders=None, create=True, clean=False, exists_ok=True)
#
A bucket that uses the Path API to store objects.
This bucket is a key-value lookup backed up by some filesystem.
By assinging to the bucket, you store the object to the filesystem.
However the values you get back are instead a Drop
that can be used to perform operations on the stores object, such as load
, get
and remove
.
Drop methods
Drop.load
- Load the object from the bucket.Drop.get
- Load the object from the bucket with a default if something fails.Drop.put
- Store an object in the bucket.Drop.remove
- Remove the object from the bucket.Drop.exists
- Check if the object exists in the bucket.
from amltk.store.paths import PathBucket
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
bucket = PathBucket("path/to/bucket")
array = np.array([1, 2, 3])
dataframe = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
model = LinearRegression()
# Store things
bucket["myarray.npy"] = array # (1)!
bucket["df.csv"] = dataframe # (2)!
bucket["model.pkl"].put(model)
bucket["config.json"] = {"hello": "world"}
assert bucket["config.json"].exists()
bucket["config.json"].remove()
# Load things
array = bucket["myarray.npy"].load()
maybe_df = bucket["df.csv"].get() # (3)!
model: LinearRegression = bucket["model.pkl"].get(check=LinearRegression) # (4)!
# Create subdirectories
model_bucket = bucket / "my_model" # (5)!
model_bucket["model.pkl"] = model
model_bucket["predictions.npy"] = model.predict(X)
# Acts like a mapping
assert "myarray.npy" in bucket
assert len(bucket) == 3
for key, item in bucket.items():
print(key, item.load())
del bucket["model.pkl"]
- The
=
is a shortcut forbucket["myarray.npy"].put(array)
- The extension is used to determine which
PathLoader
to use and how to save it. - The
get
method acts like thedict.load
method. - The
get
method can be used to check the type of the loaded object. If the type does not match, aTypeError
is raised. - Uses the familiar
Path
API to create subdirectories.
PARAMETER | DESCRIPTION |
---|---|
path |
The path to the bucket. |
loaders |
A sequence of loaders to use when loading objects. These will be prepended to the default loaders and attempted to be used first.
TYPE:
|
create |
If True, the base path will be created if it does not exist.
TYPE:
|
clean |
If True, the base path will be deleted if it exists.
TYPE:
|
exists_ok |
If False, an error will be raised if the base path already exists.
TYPE:
|
Source code in src/amltk/store/paths/path_bucket.py
def sizes()
#
Get the sizes of all the files in the bucket.
Files only
This method only returns the sizes of the files in the bucket. It does not include directories, their sizes, or their contents.
RETURNS | DESCRIPTION |
---|---|
dict[str, int]
|
A dictionary mapping the keys to the sizes of the files. |
Source code in src/amltk/store/paths/path_bucket.py
def add_loader(loader)
#
Add a loader to the bucket.
PARAMETER | DESCRIPTION |
---|---|
loader |
The loader to add.
TYPE:
|
def sub(key, *, create=None)
#
Create a subdirectory of the bucket.
PARAMETER | DESCRIPTION |
---|---|
key |
The name of the subdirectory.
TYPE:
|
create |
Whether the subdirectory will be created if it does not
exist. If None, the default, the value of
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
A new bucket with the same loaders as the current bucket. |