jina.types.arrays.memmap module

class jina.types.arrays.memmap.DocumentArrayMemmap(path, key_length=36)[source]

Bases: jina.types.arrays.traversable.TraversableSequence, jina.types.arrays.document.DocumentArrayGetAttrMixin, jina.types.arrays.search_ops.DocumentArraySearchOpsMixin, collections.abc.Iterable

Create a memory-map to an DocumentArray stored in binary files on disk.

Memory-mapped files are used for accessing Document of large DocumentArray on disk, without reading the entire file into memory.

The DocumentArrayMemmap on-disk storage consists of two files:
  • header.bin: stores id, offset, length and boundary info of each Document in body.bin;

  • body.bin: stores Documents continuously

When loading DocumentArrayMemmap, it only loads the content of header.bin into memory, while storing all body.bin data on disk. As header.bin is often much smaller than body.bin, memory is saved.

This class is designed to work similarly as DocumentArray but differs in the following aspects:

To convert between a DocumentArrayMemmap and a DocumentArray

# convert from DocumentArrayMemmap to DocumentArray
dam = DocumentArrayMemmap('./tmp')
...

da = DocumentArray(dam)

# convert from DocumentArray to DocumentArrayMemmap
dam2 = DocumentArrayMemmap('./tmp')
dam2.extend(da)
reload()[source]

Reload header of this object from the disk.

This function is useful when another thread/process modify the on-disk storage and the change has not been reflected in this DocumentArray object.

This function only reloads the header, not the body.

extend(values)[source]

Extend the DocumentArrayMemmap by appending all the items from the iterable.

Parameters

values (Iterable[Document]) – the iterable of Documents to extend this array with

Return type

None

clear()[source]

Clear the on-disk data of DocumentArrayMemmap

Return type

None

append(doc, flush=True)[source]

Append :param:`doc` in DocumentArrayMemmap.

Parameters
  • doc (Document) – The doc needs to be appended.

  • flush (bool) – If set, then flush to disk on done.

Return type

None

prune()[source]

Prune deleted Documents from this object, this yields a smaller on-disk storage.

Return type

None