Thank you for being here. Letβs take a deep breath and dive in to the best LLM papers of this week!
1. Jamba: A Hybrid Transformer-Mamba Language Model
π Author(s): Opher Lieber, et al. from AI21Labs
π Publication Date: Mar 28, 2024
β¨ Key Insights:
Whatβs New? They presented Jamba, a large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Jamba interleaves blocks of Transformer and Mamba layers and MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable.
Behind the New. Compared to vanilla transformers, Jamba provides high throughput and small memory footprint while maintaining state-of-the-art performance on standard language model benchmarks. The provided 7B-based model is said to fit in a single 80G A100 GPU.
So, How can we use this? Itβs time to expand the architecture from vanilla transformer architecture.
π Read Full Paper, Download Model
2. MULAN : A Study of Fact Mutability in Language Models
π Author(s): Emanuele Bugliarello, et al. from Google DeepMind
π Publication Date: Apr 03, 2024
β¨ Key Insights:
Whatβs New? They explored the problem of mutable facts, where its factuality has the potential to change over time. They created MuLan, a benchmark for evaluating the ability of English language models to anticipate time-contingency.
Behind the New. Their experiments show that mutable facts show different confidence, representation, and update behavior from immutable facts, leading way for factuality learning dynamics of LLMs.
So, How can we use this? LLMs limitation on time-dependent facts is a continuous problem. This insight to LLMs time awareness may pave a new way of updating LLMs.
π Read Full Paper
3. Multi-Conditional Ranking with Large Language Models
π Author(s): Pouya Pezeshkpour et al. from Megagon Labs
π Publication Date: Mar 30, 2024
β¨ Key Insights:
Whatβs New? They defined and explored the task of multi-conditional ranking by introducing MCRank, a benchmark tailored for assessing multi-conditional ranking across various item types and conditions.
Behind the New. They also proposed a novel decomposed reasoning method, consisting of EXtracting and Sorting the conditions, and then Iterativly Ranking the items (EXSIR).
So, How can we use this? Dividing the conditions is crucial in multi-conditional ranking.
π Read Full Paper
4. Gecko: Versatile Text Embeddings Distilled from Large Language Models
π Author(s): Jinhyuk Lee, et al. from Google DeepMind
π Publication Date: Mar 29, 2024
β¨ Key Insights:
Whatβs New? They presented Gecko, a compact and versatile text embedding model. Gecko achieved strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Gecko with 256 embedding dimensions outperformed all existing entries with 768 embedding size.
Behind the New. Their two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, they further refine the data quality by retrieving a set of candidate passages for each query, and relabeling the positive and hard negative passages using the same LLM.
So, How can we use this? LLM-generated synthetic dataset is using in everywhere, included in training the text embedding model, like Gecko.
π Read Full Paper
5. Many-shot Jailbreaking
π Author(s): Cem Anil, et al. from Anthropic
π Publication Date: Apr 03, 2024
β¨ Key Insights:
Whatβs New? They investigated a family of simple long-context attacks on large language models: prompting with hundreds of demonstrations of undesirable behavior.
Behind the New. They found that indeed MSJ (Many-Shot Jailbreak) could be composed with other jailbreaks to increase its effectiveness, thus reducing the overall context length required for a given attack to succeed.
So, How can we use this? How can we mitigate this? First, set the limitation on your context length. Or, fine-tune the model with long-context prompts!
π Read Full Paper
6. ReALM: Reference Resolution As Language Modeling
π Author(s): Joel Ruben Antony Moniz, et al. from Apple
π Publication Date: Mar 29, 2024
β¨ Key Insights:
Whatβs New? Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. They demonstrated large improvements over an existing system with similar functionality across different types of references, with their smallest models, ReALM.
Behind the New. They fine-tuned an LLM (a FLAN-T5). They found that their model performed in the same ballpark as the latest GPT-4 despite being a much lighter and faster model.
So, How can we use this? Apple keeps trying to their own LLM to adopt on their mobile phone. We can keep track of on-device LLM by following-up the papers from Apple.
π Read Full Paper
Stay curious, and until next week!