Alfred-3 Llama 3 8B
  • 14 Aug 2024
  • 2 Minutes to read
  • Dark
    Light
  • PDF

Alfred-3 Llama 3 8B

  • Dark
    Light
  • PDF

Article summary

AlfredV3.jpeg

LightOn unveils Alfred-3, based on Llama 3 8B, a more compact model yet the most efficient to date, optimized for Retrieval Augmented Generation (RAG). This new model enhances user experience and result reliability on Paradigm thereby meeting the needs of Enterprises.

What's new with Alfred 3?

Ever since Alfred-1, Alfred-2, LightOn continues on improving models so that they can better serve end-users. Alfred-3 features improvements and new functionalities such as:

  • Lightweight and Efficient Model: Compared to Alfred-2 with 40 billion parameters, Alfred-3 has only only 8 billion parameters but was trained on 15 times more tokens. Alfred-3 is lighter, requires less memory and GPUs, offers faster inference.
  • Better Information Retrieval: Compared to its previous versions, Alfred-3 allows more accurate document retrieval. This is partly due to Alfred's user question rewriting, followed by reordering and filtering of the retrieved excerpts, facilitating more natural and intuitive interaction with documents.
  • Multilingualism: The data from the various fine-tunings performed by LightOn are multilingual, with a particular emphasis on French to ensure that our model can optimally serve French businesses. Alfred-3 has been fed with a mix of information from over 30 languages. Fluency is guaranteed in 11 languages, including French.
  • Improved Security Features: Our model inherits improved content filtering and toxicity detection, ensuring a safer environment for users.
  • Maximum Token Limit: Supports up to 8,000 tokens in context.

Alfred-3 Optimized for "Chat with Docs"

Paradigm’s "Chat with Docs," feature is enhanced thanks to three different fine tuned version of Alfred, that improve its overall performance at three different stage:

  • Rephrasing
  • Reranking/Filtering
  • Final Generation

These innovations aim to enable the use of a different Alfred skill for each of these stages, with the goal of configuring a specific LoRA for rewriting, another for reordering.

What is a LoRa?

Low-Rank adapters are a parameter-efficient fine-tuning technique.

Benchmark Results

Benchmarks conducted during these trainings show that Alfred-3 consistently outperforms the previous version and the base Llama model in retrieving relevant passages:

  • Rephrasing : 5% improvement in precision and recall, and a 10% improvement in MRR (Mean Reciprocal Rank) on a conversational question-answer dataset compared to Alfred-2.
  • Reranking: 10% and 20% improvement in precision and recall, respectively, and a 25% improvement in MAP (Mean Average Precision) on an information retrieval dataset compared to Alfred-2.
Sources

Metrics consider the top 10 retrieved document excerpts. Recall evaluates if relevant excerpts are retrieved, precision if the retrieved excerpts are relevant.

The Latest from LightOn

Alfred-3 uses LightOn's expertise to bring the best of generative AI directly to its customers.

Integration into Paradigm

We continue to make the Paradigm platform the primary vector for using Alfred. Designed as a platform for business, Paradigm allows Alfred-3 to be used to its full potential for document search, prompt generation, and, in the future, the creation of intelligent autonomous agents.

We are convinced that for AI to be used to its full potential, it must go beyond the base model and blend into a secure, scalable platform tailored to the needs of business teams.


Was this article helpful?