Jina 101: First Things to Learn About Jina
English • 日本語 • Français • Português • Deutsch • Русский язык • 中文 • عربية
Want a general introduction to neural search and how it’s different to regular old symbolic search? Check out our explainer blog post to learn more!
Document & Chunk
When most people think of search, they think of a bar you type words into, like Google. But search is much more than that - as well as text, you may want to search for a song, recipe, video, genetic sequence, scientific paper, or location.
In Jina, we call all of these things Documents. In short, a Document is anything you want to search for, and the input query you use when searching.
Documents can be huge though - how can we search for the right part? We do this by breaking a Document into Chunks. A Chunk is a small semantic unit of a Document, like a sentence, a 64x64 pixel image patch, or a pair of coordinates.
You can think of a Document like a chocolate bar. Documents have different formats and ingredients, but you can also break it into chunks any way you like. Eventually, what you buy and store are the chocolate bars, and what you eat and digest are the chunks. You don’t want to swallow the whole bar, but you don’t want to grind it into powder either; By doing that, you lose the flavor (i.e. the semantics).
Every part of Jina is configured with YAML files. YAML files offer customization, allowing you to change the behavior of an object without touching its code. Jina can build a very complicated object directly from a simple YAML file, or save an object into a YAML file.
How do we break down a Document into Chunks, and what happens next? Executors do all of this hard work, and each represents an algorithmic unit. They do things like encoding images into vectors, storing vectors on disk, ranking results, and so on. Each one has a simple interface, letting you concentrate on the algorithm and not get lost in the weeds. They feature persistence, scheduling, chaining, grouping, and parallelization out of the box. The properties of an Executor are stored in a YAML file. They always go hand in hand.
The Executor Family
The Executors are a big family. Each family member focuses on one important aspect of the search system. Let’s meet:
Crafter: for crafting/segmenting/transforming the Documents and Chunks;
Encoder: for representing the Chunk as vector;
Indexer: for saving and retrieving vectors and key-value information from storage;
Ranker: for sorting results;
Got a new algorithm in mind? No problem, this family always welcomes new members!
Executors do all the hard work, but they’re not great at talking to each other. A Driver helps them do this by defining how an Executor behaves to network requests. It interprets network traffic into a format the Executor can understand, for example translating Protobuf into a Numpy array.
All healthy families need to communicate, and the Executor clan is no different. They talk to each other via Peas.
While a Driver translates data for an Executor, A Pea wraps an Executor and lets it exchange data over a network or with other Peas. Peas can also run in Docker, containing all dependencies and context in one place.
So now you’ve got lots of Peas talking to each other and rolling all over the place. How can you organize them? Nature uses Pods, and so do we.
A Pod is a group of Peas with the same property, running in parallel on a local host or over the network. A Pod provides a single network interface for its Peas, making them look like one single Pea from the outside. Beyond that, a Pod adds further control, scheduling, and context management to the Peas.
Now we’ve got a garden full of Pods, with each Pod full of Peas. That’s a lot to manage! Say hello to Flow! Flow is like a Pea plant. Just as a plant manages nutrient flow and growth rate for its branches, Flow manages the states and context of a group of Pods, orchestrating them to accomplish one task. Whether a Pod is remote or running in Docker, one Flow rules them all!
From Micro to Macro
Jina is a happy family. You can feel the harmony when you use Jina.
You can design at the micro-level and scale up to the macro-level. YAMLs becomes algorithms, threads become processes, Pods become Flows. The patterns and logic always remain the same. This is the beauty of Jina.
✨Unleash your curiosity and happy searching! 🔍
The look and feel of this document (“Jina 101: First Things to Learn About Jina”) is copyright © Jina AI Limited. All rights reserved. Customer may not duplicate, copy, or reuse any portion of the visual design elements or concepts without express written permission from Jina AI Limited.