- 10 Sep 2024
- 5 Minutes to read
- Print
- DarkLight
- PDF
FAQ
- Updated on 10 Sep 2024
- 5 Minutes to read
- Print
- DarkLight
- PDF
What is the difference between the LLM Alfred and other models on the market? Read this article for more informationn:
What is the impact of the languages used on questions and documents? Alfred is multilingual, the embedding model is multilingual and cross-lingual, so the expected performance is theoretically correct regardless of the language combination used. However, it is possible that observed performance is better when the query language is the same as the document language.
When should I create or start a new discussion? It is advisable to create a new discussion when you want to change the topic of conversation. The history of the current conversation is taken into account when reformulating your query for retrieval. This means that if the history is not directly related to what you are about to ask, it is probably better to create a new session.
Is there an impact of buffer (context) size in the dialogue? This limits the size of the conversation history that the model can keep in memory when responding to you. It also limits the maximum length of the question you can ask the model, and the length of the response.
Should similar documents be removed from the corpus for better performance? Not necessarily. However, it is better to avoid duplicates.
Does deleting discussions impact evaluations or statistics? No, the session will still be taken into account in the statistics. It is simply hidden in the history of the left sidebar.
Can I see the general state of my evaluations? In the analytics tab at the bottom left of the interface, you can view your personal usage statistics, including your evaluations.
Can I change the LLM used? Yes, via the dropdown menu at the top of each session. However, this requires that you have dedicated GPUs to host an additional model (except for SaaS customers). more info
Is the system's response impacted by the general learning of the LLM or exclusively by the selected extracts? If relevant extracts to answer the question have been found, the LLM is required to respond based on the extracts. However, by the very nature of what an LLM is, its response is necessarily influenced by its pre-training.
Why does the system produce different responses after a REGENERATE? The REGENERATE button restarts the entire RAG process. This process includes the reformulation of the initial query by the LLM, which can lead to variation in the retrieved excerpts, as well as the formulation of the final response. This happens because the LLM operates on a probabilistic basis, so its generations can vary.
Can I change the default user language, and what are the impacts? Changing the default language (in the user profile) only changes the interface language (menus, etc.). This does not affect the language in which the model responds, as it is trained to respond in the language of the question asked by the user (unless a different language is explicitly required by the user).
Can I trust the system's response, and how can I verify it? We do everything to ensure the highest possible reliability of the responses. However, it happens that the identified excerpts to answer the user's query are not the most relevant, resulting in an unsatisfactory or inaccurate response. To verify that the response is reliable, the user should check the extracts and documents used by the LLM to respond (displayed after the response).
How does the system handle dates/years present in the documents? In Paradigm, retrieval of the document extracts is based on hybrid search, i.e., dense vector search + keyword search (lexical exact match). As of now, it is not yet possible to filter documents by date.
How does the system handle titles of documents? The document title is used to compute the relevance score.
Is it useful to split/modify a file to improve performance? Modifying a file can be useful, for example, when its understanding in its current state is difficult (e.g., notes), when its layout can make excerpt splitting difficult (e.g., text in multiple columns), or when the document contains a lot of "polluting" information (e.g., large tables with numbers and no explanations, notes with many abbreviations, etc.).
Why does the system limit to 5 excerpts per query and not more? This limitation is due to the fact that the model's context size is currently 8192 tokens (about 5000 words). Each time, the context must take into account the tool's prompt (not visible to the user), the current discussion history, and the excerpts retrieved to answer the query. Each excerpt being about 300 words, five excerpts per query has proven optimal based on our tests. The next version of the LLM, with a larger context, will allow for an increase in the number of excerpts.
What are the different sources of system hallucinations? In chat with docs mode, the response can be false if the retrieved excerpts are not relevant or if they are difficult to understand (e.g., specific language). In pure LLM mode (chat without documents), hallucinations are due to the probabilistic foundation of the model.
Is there an initial/system prompt? yes ! (you can find more information here)
Why doesn't the system allow for a document summary? Document summarization requires different/additional steps compared to querying a document base, as well as an interface adaptation. However, a summary feature is planned in LightOn's roadmap.
What to do in case of no system response? Several scenarios can explain the absence of a response:
- The type of question asked is not suitable for a RAG system (examples: summary request, question requiring cross-analysis);
- The answer is not in the documents;
- The right extracts were not found. In this case, the user can try to reformulate their question by providing more context to the model to facilitate its search.
What do the presented page numbers correspond to (physical or PDF pagination)? The page number comes from the physical pagination (e.g., the cover of a book will be page 1 and not the page numbers in the document).
Can multiple documents be imported at once? Yes, within the limit of 100 MB.
Is it possible to have documents with the same name but different content? Yes, this poses no problem. It is the content of the documents that is taken into account for the search.
Does the model learn from my documents? No, the model is not trained on your documents. However, it can rely on their content to provide a response.
Can I prioritize the workspaces, telling the LLM to search first in one and then in the other? This isn't possible at the moment.
Is there a maximum number of workspaces I can create for my company? No, company admins can create as many worskpaces as they want.