How a history-aware retriever works?

Front page > Programming > How a history-aware retriever works?

How a history-aware retriever works?

Published on 2024-11-08

Browse:985

How a history-aware retriever works?

The history-aware retriever discussed in this post is the one returned by the create_history_aware_retriever function from the LangChain package. This function is designed to receive the following inputs in its constructor:

An LLM (a language model that receives a query and returns an answer);
A vector store retriever (a model that receives a query and returns a list of relevant documents).
A chat history (a list of message interactions, typically between a human and an AI).

When invoked, the history-aware retriever takes a user query as input and outputs a list of relevant documents. The relevant documents are based on the query combined with the context provided by the chat history.

At the end, I summarize its workflow.

Setting it

from langchain.chains import create_history_aware_retriever
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_chroma import Chroma
from dotenv import load_dotenv
import bs4

load_dotenv() # To get OPENAI_API_KEY

def create_vectorsore_retriever():
    """
    Returns a vector store retriever based on the text of a specific web page.
    """
    URL = r'https://lilianweng.github.io/posts/2023-06-23-agent/'
    loader = WebBaseLoader(
        web_paths=(URL,),
        bs_kwargs=dict(
            parse_only=bs4.SoupStrainer(class_=("post-content", "post-title", "post-header"))
        ))
    docs = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True)
    splits = text_splitter.split_documents(docs)
    vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
    return vectorstore.as_retriever()

def create_prompt():
    """
    Returns a prompt instructed to produce a rephrased question based on the user's
    last question, but referencing previous messages (chat history).
    """
    system_instruction = """Given a chat history and the latest user question \
        which might reference context in the chat history, formulate a standalone question \
        which can be understood without the chat history. Do NOT answer the question, \
        just reformulate it if needed and otherwise return it as is."""

    prompt = ChatPromptTemplate.from_messages([
        ("system", system_instruction),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}")])
    return prompt

llm = ChatOpenAI(model='gpt-4o-mini')
vectorstore_retriever = create_vectorsore_retriever()
prompt = create_prompt()

history_aware_retriever = create_history_aware_retriever(
    llm,
    vectorstore_retriever,
    prompt
)

Using it

Here, a question is being asked without any chat history, so the retriever only responds with the documents relevant to the last question.

chat_history = []

docs = history_aware_retriever.invoke({'input': 'what is planning?', 'chat_history': chat_history})
for i, doc in enumerate(docs):
    print(f'Chunk {i 1}:')
    print(doc.page_content)
    print()

Chunk 1:
Planning is essentially in order to optimize believability at the moment vs in time.
Prompt template: {Intro of an agent X}. Here is X's plan today in broad strokes: 1)
Relationships between agents and observations of one agent by another are all taken into consideration for planning and reacting.
Environment information is present in a tree structure.

Chunk 2:
language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.

Chunk 3:
Another quite distinct approach, LLM P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural

Chunk 4:
Planning

Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.


Memory

Now, based on the chat history, the retriever knows that the human want to know about task decomposition as well as planning. So it responds with chunks of text that reference both themes.

chat_history = [
    ('human', 'when I ask about planning I want to know about Task Decomposition too.')]

docs = history_aware_retriever.invoke({'input': 'what is planning?', 'chat_history': chat_history})
for i, doc in enumerate(docs):
    print(f'Chunk {i 1}:')
    print(doc.page_content)
    print()

Chunk 1:
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Chunk 2:
Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#

Chunk 3:
Planning

Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.


Memory

Chunk 4:
Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.

Now the entirely question is based on the chat history. And we can see that it responds with chunks of text that reference the correct concept.

chat_history = [
    ('human', 'What is ReAct?'),
    ('ai', 'ReAct integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space')]

docs = history_aware_retriever.invoke({'input': 'It is a way of doing what?', 'chat_history': chat_history})
for i, doc in enumerate(docs):
    print(f'Chunk {i 1}:')
    print(doc.page_content)
    print()

Chunk 1:
ReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.
The ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:
Thought: ...
Action: ...
Observation: ...

Chunk 2:
Fig. 2. Examples of reasoning trajectories for knowledge-intensive tasks (e.g. HotpotQA, FEVER) and decision-making tasks (e.g. AlfWorld Env, WebShop). (Image source: Yao et al. 2023).
In both experiments on knowledge-intensive tasks and decision-making tasks, ReAct works better than the Act-only baseline where Thought: … step is removed.

Chunk 3:
The LLM is provided with a list of tool names, descriptions of their utility, and details about the expected input/output.
It is then instructed to answer a user-given prompt using the tools provided when necessary. The instruction suggests the model to follow the ReAct format - Thought, Action, Action Input, Observation.

Chunk 4:
Case Studies#
Scientific Discovery Agent#
ChemCrow (Bran et al. 2023) is a domain-specific example in which LLM is augmented with 13 expert-designed tools to accomplish tasks across organic synthesis, drug discovery, and materials design. The workflow, implemented in LangChain, reflects what was previously described in the ReAct and MRKLs and combines CoT reasoning with tools relevant to the tasks:

Conclusion

In conclusion, the workflow of the history-aware retrievers functions as follows when .invoke({'input': '...', 'chat_history': '...'}) is called:

It replaces the input and chat_history placeholders in the prompt with specified values, creating a new ready-to-use prompt that essentially says "Take this chat history and this last input, and rephrase the last input in a way that anyone can understand it without seeing the chat history".
It sends the new prompt to the LLM and receives a rephrased input.
It then sends the rephrased input to the vector store retriever and receives a list of documents relevant to this rephrased input.
Finnally, it returns this list of relevant documents.

Obs.: It is important to note that the embedding used to transform text into vector is the one specified whem Chroma.from_documents is called. When none is specified (the present case), the default chroma embedding is used.

Release Statement This article is reproduced at: https://dev.to/guilhermecxe/how-a-history-aware-retriever-works-5e07?1 If there is any infringement, please contact [email protected] to delete it

Latest tutorial More>

Beyond `if` Statements: Where Else Can a Type with an Explicit `bool` Conversion Be Used Without Casting?
Contextual Conversion to bool Allowed Without a CastYour class defines an explicit conversion to bool, enabling you to use its instance 't' di...

Programming Published on 2024-12-26
What Happened to Column Offsetting in Bootstrap 4 Beta?
Bootstrap 4 Beta: The Removal and Restoration of Column OffsettingBootstrap 4, in its Beta 1 release, introduced significant changes to the way column...

Programming Published on 2024-12-26
How do I combine two associative arrays in PHP while preserving unique IDs and handling duplicate names?
Combining Associative Arrays in PHPIn PHP, combining two associative arrays into a single array is a common task. Consider the following request:Descr...

Programming Published on 2024-12-26
Using WebSockets in Go for Real-Time Communication
Building apps that require real-time updates—like chat applications, live notifications, or collaborative tools—requires a communication method faster...

Programming Published on 2024-12-26
$How to Fix \"ImproperlyConfigured: Error loading MySQLdb module\" in Django on macOS?$
How to Fix \"ImproperlyConfigured: Error loading MySQLdb module\" in Django on macOS?
MySQL Improperly Configured: The Problem with Relative PathsWhen running python manage.py runserver in Django, you may encounter the following error:I...

Programming Published on 2024-12-26
$How Can I Find Users with Today\'s Birthdays Using MySQL?$
How Can I Find Users with Today\'s Birthdays Using MySQL?
How to Identify Users with Today's Birthdays Using MySQLDetermining if today is a user's birthday using MySQL involves finding all rows where ...

Programming Published on 2024-12-26
How to Convert All Types of Smart Quotes in PHP?
Convert All Types of Smart Quotes in PHPSmart quotes are typographic marks used in place of regular straight quotes (' and "). They give a mo...

Programming Published on 2024-12-26
What are the Different Ways to Loop Through a JavaScript Array?
Looping Through an Array Using JavaScriptIterating through the elements of an array is a common task in JavaScript. There are several approaches avail...

Programming Published on 2024-12-26
How to Efficiently Pause Selenium WebDriver Execution in Python?
Waiting and Conditional Statements in Selenium WebDriverQuestion: How can I pause Selenium WebDriver execution for milliseconds in Python?Answer:While...

Programming Published on 2024-12-26
Should C++ Assignment Operators Be Virtual?
Virtual Assignment Operator and Its Necessities in C While assignment operators can be defined as virtual in C , it's not a mandatory requiremen...

Programming Published on 2024-12-26
Let vs. Var in JavaScript: What's the Difference in Scope and Usage?
Let vs. Var in JavaScript: Demystifying Scope and Temporal Dead ZonesIntroduced in ECMAScript 6, the let statement has sparked confusion among develop...

Programming Published on 2024-12-26
How to Split a String by Commas, Ignoring Commas Within Double Quotes Using JavaScript?
Split a String by Commas, Ignoring Commas within Double Quotes Using JavaScriptTo address the challenge of splitting a string by commas while preservi...

Programming Published on 2024-12-26
What Does the Exclamation Mark (!) Do in a JavaScript Function Expression?
Unveiling the Purpose of the Exclamation Mark in a Function ExpressionIn JavaScript, when executing code, encountering an exclamation mark (!) before ...

Programming Published on 2024-12-26
How to Access File Group ID (GID) Programmatically in Go?
Accessing File Group ID (GID) in GoIn Go, the os.Stat() function retrieves file information, including its system-specific attributes. This informatio...

Programming Published on 2024-12-26
Please provide me with the article. I need the text of the article to generate a suitable question-style title.
**

Programming Published on 2024-12-26