jina.executors.rankers

class jina.executors.rankers.BaseRanker(*args, **kwargs)[source]

Bases: jina.executors.BaseExecutor

The base class for a Ranker

score(*args, **kwargs)[source]
class jina.executors.rankers.Chunk2DocRanker(*args, **kwargs)[source]

Bases: jina.executors.rankers.BaseRanker

A Chunk2DocRanker translates the chunk-wise score (distance) to the doc-wise score.

In the query-time, Chunk2DocRanker is an almost-always required component. Because in the end we want to retrieve top-k documents of given query-document not top-k chunks of given query-chunks. The purpose of Chunk2DocRanker is to aggregate the already existed top-k chunks into documents.

The key function here is score().

See also

jina.drivers.handlers.score

required_keys = {'text'}

a set of str, key-values to extracted from the chunk-level protobuf message

COL_MATCH_PARENT_HASH = 'match_parent_hash'
COL_MATCH_HASH = 'match_hash'
COL_DOC_CHUNK_HASH = 'doc_chunk_hash'
COL_SCORE = 'score'
score(match_idx, query_chunk_meta, match_chunk_meta)[source]

Translate the chunk-level top-k results into doc-level top-k results. Some score functions may leverage the meta information of the query, hence the meta info of the query chunks and matched chunks are given as arguments.

Parameters
  • match_idx (ndarray) –

    a [N x 4] numpy ndarray, column-wise:

    • match_idx[:, 0]: doc_id of the matched chunks, integer

    • match_idx[:, 1]: chunk_id of the matched chunks, integer

    • match_idx[:, 2]: chunk_id of the query chunks, integer

    • match_idx[:, 3]: distance/metric/score between the query and matched chunks, float

  • query_chunk_meta (Dict) – the meta information of the query chunks, where the key is query chunks’ chunk_id, the value is extracted by the required_keys.

  • match_chunk_meta (Dict) – the meta information of the matched chunks, where the key is matched chunks’ chunk_id, the value is extracted by the required_keys.

Return type

ndarray

Returns

a [N x 2] numpy ndarray, where the first column is the matched documents’ doc_id (integer) the second column is the score/distance/metric between the matched doc and the query doc (float).

group_by_doc_id(match_idx)[source]

Group the match_idx by doc_id :return: an iterator over the groups

static sort_doc_by_score(r)[source]

Sort a list of (doc_id, score) tuples by the score. :return: an np.ndarray in the shape of [N x 2], where N in the length of the input list.

get_doc_id(match_with_same_doc_id)[source]
class jina.executors.rankers.Match2DocRanker(*args, **kwargs)[source]

Bases: jina.executors.rankers.BaseRanker

Re-scores the matches for a document. This Ranker is only responsible for calculating new scores and not for the actual sorting. The sorting is handled in the respective Matches2DocRankDriver. Possible implementations:

  • ReverseRanker (reverse scores of all matches)

  • BucketShuffleRanker (first buckets matches and then sort each bucket)

COL_MATCH_HASH = 'match_hash'
COL_SCORE = 'score'
score(query_meta, old_match_scores, match_meta)[source]

This function calculated the new scores for matches and returns them. :query_meta: a dictionary containing all the query meta information

requested by the required_keys class_variable.

Old_match_scores

contains old scores in the format {match_id: score}

Match_meta

a dictionary containing all the matches meta information requested by the required_keys class_variable. Format: {match_id: {attribute: attribute_value}}e.g.{5: {“length”: 3}}

Return type

ndarray

Returns

a np.ndarray in the shape of [N x 2] where N is the length of the old_match_scores. Semantic: [[match_id, new_score]]