
Document Retriever
FlowHunt's Document Retriever enhances AI accuracy by connecting generative models to your own up-to-date documents and URLs, ensuring reliable and relevant ans...
Learn how to set up the ‘From H1 if exists’, ‘Load from pointer’, and ‘Skip Last Header’ parameters.
Document Retriever component allows the chatbot to retrieve knowledge from sources you specified in the Documents and Schedules. The role of this component is to control retrieval, and multiple parameters affect how the component retrieves information from those documents.
The From H1 if exists option tells the retriever to begin extracting content from the H1 header it finds (usually the article’s main title).
What Happens?
Use Case Example:
You want to only retrieve the actual guide, without any site navigation or page header clutter that exists on your website.
Note:
From H1 if exists is enabled in the Document Retriever component by default.
The Load from pointer option gives you more precision by allowing Document Retriever to load only data from a pointer in the possibly longer article.
What Happens?
What is a “pointer”?
A pointer is typically a unique string or heading present in the document (for example, an H2 or a specific phrase or section title).
Use Case Example:
You want to skip introductory sections and retrieve information for a specific relevant section of a possibly long article or document (e.g., from “Step 4: Add a live chat button” in a setup guide).
The Skip Last Header option is useful for ignoring the last header in the document, which is often repeated or used for navigation or footer purposes.
What Happens?
Use Case Example:
You want to avoid Document Retriever to load a footer navigation header (such as “Other Articles” at the end of a help page), ensuring that only the main content is processed.
Note:
Skip Last Header can help with documents that auto-generate footers or repetitive navigation elements. However, if you do not have such sections, using this parameter might cause the part of the article with valid information to not be retrieved. Therefore, it’s recommended to keep this option unchecked until there is a valid reason to enable it.
The Max tokens parameter allows you to control the maximum number of tokens (words and punctuation marks, as counted by the underlying AI model) that the Document Retriever will output from the extracted text.
What Happens?
Default Value:
The default value is typically 3000 tokens, but if needed, you can adjust this.
Use Case Example:
If you are processing lengthy documents, setting a lower Max tokens value helps keep responses concise. However, for best results, consider enabling the “Load from pointer” parameter. This ensures that the extracted text starts at the most relevant section of the document, rather than from the beginning, allowing you to obtain a focused and manageable chunk of information within your specified token limit. This combination is especially useful when you want concise, contextually relevant outputs from large sources.
Note:
If you find that information is being cut off, try increasing the Max tokens value. Conversely, if you want shorter, more focused outputs, reduce the Max tokens parameter.
When the Document Retriever finds several relevant documents, the Strategy parameter determines how they’re merged into a single text output for your chatbot, taking into account the “Max tokens” limit.
Two strategy options:
Include equal size from each document:
The token limit is divided evenly. For example, with three documents and a 3,000-token limit, each gets up to 1,000 tokens. This ensures that all sources contribute equally, which is useful when you want a balanced answer that draws from multiple documents.
Concat documents, fill from first up to the tokens limit:
Documents are added in order of relevance until the token limit is reached. The most relevant document fills the space first; if there’s room left, less relevant documents are added in order. If the first document is long, it might use the whole limit by itself.
How to choose?
Note:
These strategies only affect how the text is constructed from the retrieved documents before it is passed to the next step (such as AI generation). They do not change which documents are retrieved—only how their content is merged and trimmed to fit within the Max tokens setting.
While this article focuses on setting up the ‘From H1 if exists’, ‘Load from pointer’, ‘Skip Last Header’, and ‘Max tokens’ parameters, the Document Retriever also offers additional parameters that help control how documents are selected and retrieved:
This setting limits the number of documents the flow should retrieve, ensuring results remain relevant and responses are generated quickly.
This optional setting allows you to limit retrieval to one or more categories you’ve created in the Documents section of Knowledge Sources.
This allows you to include or hide a separate section, before the actual chatbot answer, with a list of resources that were retrieved by the retriever. For integration with LiveAgent, it must be checked, as this section is not supported and won’t be displayed correctly in the LiveAgent chatbot widget.
Let’s you restrict retrieval to one or more Schedules you have specified for crawling or updating content in Knowledge Sources.
Controls how closely the retrieved documents must match the input query, using a relevance score (from 0 to 1). For example, a threshold of 0.7–0.8 is recommended for highly relevant answers. Higher thresholds give more precise matches, while lower thresholds may include less relevant documents.
Example:
If you set a threshold of 0.6 and have four articles with relevance scores of 0.8, 0.65, 0.5, and 0.9, only those above 0.6 (i.e., 0.8, 0.65, and 0.9) will be used for extraction.
If the answer provided by the chatbot doesn’t contain information that you are certain the chatbot has available in your documents or schedules, try checking the conversation history with the “Verbose” option to see detailed logs of whether the Document Retriever was used and what documents were retrieved. If necessary, adjust your settings and prompt based on these logs.
FlowHunt's Document Retriever enhances AI accuracy by connecting generative models to your own up-to-date documents and URLs, ensuring reliable and relevant ans...
Integrate your workflows with Google Docs using the Google Docs Retriever component—seamlessly fetch document content for use in automations, chatbots, or knowl...
FlowHunt's Document to Text component transforms structured data from retrievers into readable markdown text, giving you precise control over how data is proces...