Bases: object

These helpers yield groups of DocumentArray from a source DocumentArray or DocumentArrayMemmap.


Split the DocumentArray into multiple DocumentArray according to the tag value of each Document.


tag (str) – the tag name to split stored in tags.

Return type

Dict[Any, ForwardRef]


a dict where Documents with the same value on tag are grouped together, their orders are preserved from the original DocumentArray.


If the tags of Document do not contains the specified tag, return an empty dict.

batch(batch_size, shuffle=False)[source]

Creates a Generator that yields DocumentArray of size batch_size until docs is fully traversed along the traversal_path. The None docs are filtered out and optionally the docs can be filtered by checking for the existence of a Document attribute. Note, that the last batch might be smaller than batch_size.

  • batch_size (int) – Size of each generated batch (except the last one, which might be smaller, default: 32)

  • shuffle (bool) – If set, shuffle the Documents before dividing into minibatches.


a Generator of DocumentArray, each in the length of batch_size

Return type

Generator[ForwardRef, None, None]