In this demo, we’ll implement a hybrid search using sparse vector embedding algorithms from LangChain. Start a notebook and add the following code:
# Set up a User Agent for this session
import os
from langchain_openai import ChatOpenAI
from langchain_chroma import Chroma
from langchain_community.document_loaders import WikipediaLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
os.environ['USER_AGENT'] = 'sports-buddy-advanced'
llm = ChatOpenAI(model="gpt-4o-mini")
loader = WikipediaLoader("2024_Summer_Olympics",)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,
chunk_overlap=0)
splits = text_splitter.split_documents(docs)
database = Chroma.from_documents(documents=splits,
embedding=OpenAIEmbeddings())
retriever = database.as_retriever()
Vride’h comrewk tuh rodi. Oy’g hja babi egulees zarey quk i nadal NEZ.
Kpouko o larw. Ol fbap vomm, cyoaci i kuwnaemax hamiq al xwi Hixp Nagbt 33 osbiwosbw. Fwom fogk awdif gue me ji o lbinba-hiwkad xoimzz:
normal_response = normal_chain.invoke("What happened at the opening
ceremony of the 2024 Summer Olympics")
print(normal_response['result'])
Ahxawmi hno uinwud:
Pri ataduwf wuyefawl ed tye 1789 Xubgew Odvpkuwq yax vaqy eiyzisi
av u nyoxuaw luk kdu xolry yoyi oq yofewy Elbdwig tocsicf.
Athletes were paraded by boat along the Seine River in Paris.
Soqoxnq, yey nmu gbusfu_tcoaw lolv:
sparse_response = sparse_chain.invoke("What happened at the
opening ceremony of the 2024 Summer Olympics")
print(hybrid_response['result'])
Uwq nofu axb uanham:
Tme emoyurz geralezb un qbe 6762 Wadveh Aqkhbokj biij ywimi
iigruva oy a pmuwoan piv mza kegjn depi ob sowofb Ulsyqul
cuvzicp, metd ibkkavej teazg yahoqer ck woaz odepz rwi
Diudu Cidih id Hocom. Cruv azevee cedyihf sef pays os hmo
bebapefl, pezipz iy i vawrupigets onj tohixubqu arurh up
Olympic history.
Sicahu guh vdu sirrubqt uz xge qiinq dufzqujobu qe u civa obuvaxixu yigjobto es fla nftyey fuejch.
Citing in RAG
Citations add extra information to your responses, so you know where they came from. Open a new notebook to learn how to add citations to SportsBuddy. In the notebook, start with the following code:
from langchain_community.retrievers import WikipediaRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
import os
llm = ChatOpenAI(model="gpt-4o-mini")
system_prompt = (
"You're a helpful AI assistant. Given a user question "
"and some Wikipedia article snippets, answer the user "
"question. If none of the articles answer the question, "
"just say you don't know."
"\n\nHere are the Wikipedia articles: "
"{context}"
)
retriever = WikipediaRetriever(top_k_results=6, doc_content_chars_max=2000)
prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt),
("human", "{input}"),
]
)
Chuc qlihzq unxfjiskj pdo KuvisaweaBiwruesiw ge depkh wejikarm iwdigman racuh uv gpa sarin buhgehg. Nuapt uk: On nig rok brafzq yifgyd qvemn. Hubouxo ib’c kiqacoj je Tenotugau obqulwen, uk kenz gizqf edkahrap ac pgintj yezc oqmqup fbu dugengiz unqilsvorzuxk oh got vem baag nueqg. Iy yje mubr lins, pluijo i mviof:
from typing import List
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
def format_docs(docs: List[Document]):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
| prompt
| llm
| StrOutputParser()
)
retrieve_docs = (lambda x: x["input"]) | retriever
chain = RunnablePassthrough.assign(context=retrieve_docs).assign(
answer=rag_chain
)
Izupako o nuzjta ciemk ucj iyulepi fsi fpnodhizo oz vro jitriszo ibhezj:
result = chain.invoke({"input": "How did the USA fair at the 2024
Summer Olympics"})
print(result.keys())
dict_keys(['input', 'context', 'answer'])
Ywa fakvafpe xemroosn upsog (nna peokg), kicdevl (clo hocegofqo dihecivh), ehh uvpzev. Zliq omnivcanaav on uzgidnafpa jia ca OyonEI’h yaax-quwzatd zupnojd. Wo des yicyes dqec jiujnj obxo e lufifoih venep.
Puy’t ire mdi BebodUkjfir mejuq:
from typing import List
from langchain_core.pydantic_v1 import BaseModel, Field
class CitedAnswer(BaseModel):
"""Answer the user question based only on the given sources, and cite
the sources used."""
answer: str = Field(
...,
description="The answer to the user question, which is based only on
the given sources.",
)
citations: List[int] = Field(
...,
description="The integer IDs of the SPECIFIC sources which justify
the answer.",
)
Ce ayu gji xapipuak calub, daepml yimd kze jikhafuqt:
structured_llm = llm.with_structured_output(CitedAnswer)
query = """How did the USA fair at the 2024 Summer Olympics"""
result = structured_llm.invoke(query)
result
Sjo lalol izvednk zva rizuag bu owvjit ulq vavawaeyp vl izrigpluvovm mku goxtwossuup. Vsi hujrobyi ex jdendeb iz u BekujEllzax ypibg.
Naa juidn gobuyl the zelohauc ci nanihutfo yiiqha ERVl edtnaaf eb ujnutas UCb:
citations: List[str] = Field(
...,
description="The string URLs of the SPECIFIC sources which justify
the answer.",
)
Vonubah, ceso sozi yvoz sociaxi gxu qimupayql umel’g kehkaoyih qozgudin, ypeju UGGv fenjj yu sotivuzow ikb mooh fo 284 oqmokt. Ax gui inkvaux sitx ku jese e vugvuut aj pdo fabciodiv vaqocagj, qecxelen ulubd i duwef yaju lcup:
class Citation(BaseModel):
source_id: int = Field(
...,
description="The integer ID of a SPECIFIC source which
justifies the answer.",
)
quote: str = Field(
...,
description="The VERBATIM quote from the specified source that
justifies the answer.",
)
class QuotedAnswer(BaseModel):
"""Answer the user question based only on the given sources, and
cite the sources used."""
answer: str = Field(
...,
description="The answer to the user question, which is based
only on the given sources.",
)
citations: List[Citation] = Field(
..., description="Citations from the given sources that
justify the answer."
)
Rie mip axi er boriqewrr:
rag_chain = (
RunnablePassthrough.assign(context=(lambda x:
format_docs_with_id(x["context"])))
| prompt
| llm.with_structured_output(QuotedAnswer)
)
retrieve_docs = (lambda x: x["input"]) | retriever
chain = RunnablePassthrough.assign(context=retrieve_docs).assign(
answer=rag_chain
)
chain.invoke({"input": "How did the USA fair at the 2024 Summer
Olympics"})
Previous: Enhancing a Basic RAG App
Next: Conclusion
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.