• 13 Jun 2024
  • 5 Minutes to read
  • Contributors
  • Dark
  • PDF


  • Dark
  • PDF

Article summary

What is the difference between the LLM Alfred and other models on the market? Alfred is based on the Open Source Falcon-40B foundation model and has been enhanced by LightOn to extend its context window to 8192 tokens and improve its performance in retrieval-augmented generation (RAG), i.e., dialogue with documents.

What is the impact of the languages used on questions and documents? Alfred is multilingual, the embedding model is multilingual and cross-lingual, so the expected performance is theoretically correct regardless of the language combination used. However, it is possible that observed performance is better when the query language is the same as the document language.

Why can't I continue a discussion based on a specific document? RAG relies on extracting the most relevant document excerpts to support the answer to a question. This excerpt-based operation means that you cannot continue the discussion based on an entire document.

When should I create or start a new discussion? It is advisable to create a new discussion when you want to change the topic of conversation. The history of the current conversation is taken into account when reformulating your query for retrieval. This means that if the history is not directly related to what you are about to ask, it is probably better to create a new session.

Is there an impact of buffer (context) size in the dialogue? This limits the size of the conversation history that the model can keep in memory when responding to you. It also limits the maximum length of the question you can ask the model, and the length of the response.

Should similar documents be removed from the corpus for better performance? Not necessarily. However, it is better to avoid duplicates.

Does deleting discussions impact evaluations or statistics? No, the session will still be taken into account in the statistics. It is simply hidden in the history of the left sidebar.

Can I see the general state of my evaluations? In the analytics tab at the bottom left of the interface, you can view your personal usage statistics, including your evaluations.

Can I change the LLM used? Yes, via the dropdown menu at the top of each session. However, this requires that you have dedicated GPUs to host an additional model.

Is the system's response impacted by the general learning of the LLM or exclusively by the selected excerpts? If relevant excerpts to answer the question have been found, the LLM is required to respond based on the excerpts. However, by the very nature of what an LLM is, its response is necessarily influenced by its pre-training.

Why does the system produce different responses after a REGENERATE? The REGENERATE button restarts the entire RAG process. This process includes the reformulation of the initial query by the LLM, which can lead to variation in the retrieved excerpts, as well as the formulation of the final response. This is explained by the fact that the LLM operates on a probabilistic basis, so its generations can vary.

Can I change the default user language, and what are the impacts? Changing the default language (in the user profile) only changes the interface language (menus, etc.). This does not affect the language in which the model responds, as it is conditioned to respond in the language of the question within the tool.

Can I trust the system's response, and how can I ensure it? We do everything to ensure the highest possible reliability of the responses. However, it happens that the identified excerpts to answer the user's query are not the most relevant, resulting in an unsatisfactory or inaccurate response. To verify that the response is reliable, the user should check the excerpts and documents used by the LLM to respond (displayed after the response).

How does the system handle dates/years present in the documents? To date, this information is not taken into account, nor are the document titles. An upcoming update will make the search hybrid (adding the lexical part to the RAG+LLM part) and will thus allow these data to be taken into account.

Is it useful to split/modify a file to improve performance? Modifying a file can be useful, for example, when its understanding in its current state is difficult (e.g., notes), when its layout can make excerpt splitting difficult (e.g., text in multiple columns), or when the document contains a lot of "polluting" information (e.g., large tables with numbers and no explanations, notes with many abbreviations, etc.).

Why does the system limit to 5 excerpts per query and not more? This limitation is due to the fact that the model's context size is currently 8192 tokens (about 5000 words). Each time, the context must take into account the tool's prompt (not visible to the user), the current discussion history, and the excerpts retrieved to answer the query. Each excerpt being about 300 words, five excerpts per query has proven optimal based on our tests. The next version of the LLM, with a larger context, will allow for an increase in the number of excerpts.

What are the different sources of system hallucinations? In chat with docs mode, the response can be false if the retrieved excerpts are not relevant or if they are difficult to understand (e.g., specific language). In pure LLM mode (chat without documents), hallucinations are due to the probabilistic foundation of the model.

Is there an initial/system prompt? There is a system prompt, invisible to users, that sets the framework for the model's responses and the relevance evaluation of the retrieved excerpts. This prompt specifies, for example, that responses should be in the language of the question, that the information from the documents should be adhered to, etc.

Why doesn't the system allow for a document summary? Document summarization requires different/additional steps compared to querying a document base, as well as an interface adaptation. However, a summary feature is planned in LightOn's roadmap.

What to do in case of no system response? Several scenarios can explain the absence of a response:

  1. The type of question asked is not suitable for a RAG system (examples: summary request, question requiring cross-analysis);
  2. The answer is not in the documents;
  3. The right excerpts were not found. In this case, the user can try to reformulate their question by providing more context to the model to facilitate its search.

What do the presented page numbers correspond to (physical or PDF pagination)? The page number comes from the physical pagination (e.g., the cover of a book will be page 1 and not the page numbers in the document).

Can multiple documents be imported at once? Yes, within the limit of 25 MB (this limit will soon be extended to 100 MB).

Is it possible to have documents with the same name but different content? Yes, this poses no problem. It is the content of the documents that is taken into account for the search.

Does the model learn from my documents? No, the model is not trained on your documents. However, it can rely on their content to provide a response.

Was this article helpful?