Thank you for being here. Letβs take a deep breath and dive in to the best LLM papers of this week!
1. Counting-Stars (β
): A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models
π Author(s): Mingyang Song, et al. from Tencent MLPD
π Publication Date: Mar 18, 2024
β¨ Key Insights:
Whatβs New? They proposed a simple, efficient, and reasonable strategy for evaluating long-context LLMs as a new benchmark, named Counting-Stars. GPT-4-Turbo achieved significant performance in the long context from 4K to 128K.
Behind the New. The Counting-Stars is designed to require LLMs to fully understand and capture long dependencies in long contexts and be able to collect inter-dependency across multiple pieces of evidence spanning the entire context to finish the task.
So, How can we use this? Counting-Stars is an interesting method to check the context-length generalizability! Sprinkle the stars in context and watch LLM whether they count stars well.
π Read Full Paper, Explore Github Repo
2. Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
π Author(s): Soyeong Jeong, et al. from KAIST
π Publication Date: Mar 21, 2024
β¨ Key Insights:
Whatβs New? They proposed a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs from the simplest to the most sophisticated ones based on the query complexity.
Behind the New. They automatically collected the training datasets without human labeling, by leveraging the predicted outcome for training the classifier.
So, How can we use this? Using RAG adaptively can reduce your cost and latency!
π Read Full Paper
3. SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees
π Author(s): Saehan Jo, et al. from Cornell University
π Publication Date: Mar 11, 2024
β¨ Key Insights:
Whatβs New? They introduced SMART, Scaling Models Adaptively for Reduced Token Fees, a novel LLM framework designed to minimize the inference costs of NLP tasks while ensuring sufficient result quality.
Behind the New. Their experiments on three real-world datasets show that, based on OpenAI models, SMART achieves significant cost savings, up to 25.6x in comparison to GPT-4.
So, How can we use this? I think SMART might make our service slower but satisfying cost efficiency without the degradation of output quality.
π Read Full Paper
4. Instructing Large Language Models to Identify and Ignore Irrelevant Conditions
π Author(s): Zhenyu Wu et al. from Xiβan Jiaotong University
π Publication Date: Mar 19, 2024
β¨ Key Insights:
Whatβs New? They proposed a novel approach named I3C that instructs LLMs to identify and ignore irrelevant conditions. It identifies a set of irrelevant condition candidates that have a weak semantic relevance with the question. Then it prompts LLMs to verify the irrelevant conditions. Lastly it instructs the LLMs with the verification on relevant and irrelevant conditions to avoid confusion and improve reasoning paths.
Behind the New. They developed I3C-Select that selects the most confusing problems based on the semantic relevance measurement. Notably, with GPT-3.5-Turbo and I3C-Select, they achieved an accuracy of 96.0 on GSM-IC2-1K, outperforming the Complex-CoT by +11.7,
So, How can we use this? Their plug-and-play module, I3C, can be added to any CoT prompting methods to enhance LLMsβ ability to explicitly identify and ignore irrelevant conditions!
π Read Full Paper, Explore Github Repo
5. USE: Dynamic User Modeling with Stateful Sequence Models
π Author(s): Zhihan Zhou, et al. from Northwestern University, USA
π Publication Date: Mar 20, 2024
β¨ Key Insights:
Whatβs New? They introduced the User Stateful Embedding (USE). USE generates user embeddings and reflects usersβ evolving behaviors without the need for exhaustive reprocessing by storing previous model states and revisiting them in the future.
Behind the New. Existing methods highly rely on stateless sequence models that lack memory of historical behavior. They have to either discard historical data and use only the most recent data or reprocess the old and new data jointly.
So, How can we use this? User embeddings play a crucial role in user engagement forecasting and personalized services. USE can express the userβs behavior well.
π Read Full Paper
Stay curious, and until next week!