arlbench.core.algorithms.prioritised_item_buffer

Prioritised replay buffer.

Functions

create_prioritised_item_buffer(max_length, ...)

Creates a prioritised trajectory buffer that acts as an independent item buffer.

make_prioritised_item_buffer(max_length, ...)

Makes a prioritised trajectory buffer act as a independent item buffer.

arlbench.core.algorithms.prioritised_item_buffer.create_prioritised_item_buffer(max_length, min_length, sample_batch_size, add_sequences, add_batches, priority_exponent, device)[source]

Creates a prioritised trajectory buffer that acts as an independent item buffer.

Parameters:
  • max_length (int) – The maximum length of the buffer.

  • min_length (int) – The minimum length of the buffer.

  • sample_batch_size (int) – The batch size of the samples.

  • add_sequences (Optional[bool], optional) – Whether data is being added in sequences to the buffer. If False, single items are being added each time add is called. Defaults to False.

  • add_batches (bool) – (Optional[bool], optional): Whether adding data in batches to the buffer. If False, single items (or single sequences of items) are being added each time add is called. Defaults to False.

  • priority_exponent (float) – Priority exponent for sampling. Equivalent to alpha in the PER paper.

  • device (str) – “tpu”, “gpu” or “cpu”. Depending on chosen device, more optimal functions will be used to perform the buffer operations.

Return type:

PrioritisedTrajectoryBuffer

Returns: The buffer.

arlbench.core.algorithms.prioritised_item_buffer.make_prioritised_item_buffer(max_length, min_length, sample_batch_size, add_sequences=False, add_batches=False, priority_exponent=0.6, device='cpu')[source]

Makes a prioritised trajectory buffer act as a independent item buffer.

Parameters:
  • max_length (int) – The maximum length of the buffer.

  • min_length (int) – The minimum length of the buffer.

  • sample_batch_size (int) – The batch size of the samples.

  • add_sequences (Optional[bool], optional) – Whether data is being added in sequences to the buffer. If False, single items are being added each time add is called. Defaults to False.

  • add_batches (bool) – (Optional[bool], optional): Whether adding data in batches to the buffer. If False, single transitions or single sequences are being added each time add is called. Defaults to False.

  • priority_exponent (float) – Priority exponent for sampling. Equivalent to alpha in the PER paper.

  • device (str) – “tpu”, “gpu” or “cpu”. Depending on chosen device, more optimal functions will be used to perform the buffer operations.

Return type:

PrioritisedTrajectoryBuffer

Returns: The buffer.