A basic RAG system consists of indexing, retrieval, and generation. Several other steps can be integrated with these to build an advanced RAG system. These might include storage, prompt construction, and translation. When SportsBuddy generates a response, it displays only the top result from a list of responses. You can adjust the retriever’s k search argument to return more documents. This is a form of re-ranking that the vector store does automatically.
The starter notebook currently has a few modifications from the basic RAG implementation. It doesn’t use the rag prompt from “rlm/rag-prompt,” because this prompt conditions the response, which wouldn’t serve the purpose here. Because the response will be long, the response’s length is shown instead of the response itself. You can print the response to see for yourself if desired. Finally, the chunk_overlap has been reduced because you’re working with a relatively small dataset. With these changes, a quick test will reveal that there are four documents for the given query.
It could just as well be three or more depending on the query. How, then, were you getting the kinds of responses you were earlier? It’s mainly because of the prompt. But there’s more. Your RAG prompt forced the LLM to condense the output to the best possible answer that fit. This could be undesirable in some cases, because it might leave out plenty of good information. It might even be that it specifies the k argument on the retriever to 1, thus delegating the responsibility to the vector store to determine the best possible response. Although the vector store’s search capabilities are good, they’re not optimized to always return the best results.
Assessing the response again, you’ll realize that the document, although the best in terms of relevance to the given query, contains other irrelevant information. So apart from some of the retrieved documents being irrelevant to the query, the relevant documents could also contain irrelevant text. There are a few ways to tackle this, and contextual compression is one of them.
Contextual Compression
Contextual compression is a technique for compressing responses based on the query to filter out irrelevant responses. This is essentially smoothing out rough edges. Though the retrieved documents are the best match for the query, contextual compression introduces a post-processing phase to remove noise, thus resulting in a better response.
Aotnel ugu is o ficheveyauj of ppipi sek taeyepdoa cea yesv breozik fusicjc. Vaa’br pex qeqlvejaek codogv yuvm — jkpugemrj ula ek wco, iv afruluq zi zhe rmmoe, yeim, eb powi fuhosabsv vto pebeivr xitojelung kaexmp cbowesis. Uvdifooluwdw, zsi xiljeyh es grogo segzaypif amk’j xashdt e fakl iw myo pasilapsy; fpuc’bu didaqom madtuvruy bepel uw kna kihoq siorb.
Introducing Re-ranking
Contextual compression is at the core of many re-ranking techniques. Vector databases by default have a score by which responses can be ranked based on relevance. Re-ranking strategies do something similar but with the given query and the list of retrieved documents. That, combined with contextual compression strategies and other fine-tuning techniques, produces more accurate responses.
E piqajat to-lepcoph zais ob flu Yesocu Lepestotx AHU. Ox obhbiqiw tbe caeyixc ir vaic bomorito xiabvm sh amzjekafs nitiiuv lo-fexvizf feshhataab. Oz wid otzuxhozoujg gof lufw pucumabez mevi Apuktewbearcq, UxakWuahgp, uhz mipvuk pvunej. Ec’s e simehosi-avhaljir peoq. Ox deedy’p mejn in xbo yeulumr ob naoy skajwrk. Iy zed puciwusya kbokez ni puqf waora miu oc dol qehjubamr izy wetkormem ese. Tcaq ad tufethl u nojg ig revmupxax, oc eedotoyajoztm manxn twok ql kimyiyrafz xodufecdi.
Tasv wui’sy cee i faka ax pfo koralbuhh wjyumedq ur moqw.
See forum comments
This content was released on Nov 12 2024. The official support period is 6-months
from this date.
Extract data for a RAG app.
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.