- class jina.types.arrays.mixins.text.TextToolsMixin¶
Help functions used in NLP for DA and DAM
- get_vocabulary(min_freq=1, text_attrs=('text',))¶
Get the text vocabulary in a dict that maps from the word to the index from all Documents.
str, …]) – the textual attributes where vocabulary will be derived from
int) – the minimum word frequency to be considered into the vocabulary.
- Return type
a vocabulary in dictionary where key is the word, value is the index. The value is 2-index, where 0 is reserved for padding, 1 is reserved for unknown token.