A multi-stage metasearcgh engine, stage:
Use sparse and dense retrieval methods (sparce to find exact matches and dense handles semantical meaning) Before indexing we split the contents of long documents over multiple windows windows of 150 words with stride of 75 words
merge all candidates in a single list and then rank such documents mMiniLM reranker mMARCO MS MARCO mT5
with results ordered by relevance, take the first ten ranked documents and performs a highlighting step take long text and select the most important senteces to show
https://arxiv.org/pdf/2210.14837 https://dl.acm.org/doi/10.1145/3447548.3469053