Thank you for being here. Letβs take a deep breath and dive in to the best LLM papers of this week!
1. Yi: Open Foundation Models by 01.AI
π Author(s): 01.AI
π Publication Date: Mar 07, 2024
β¨ Key Insights:
Whatβs New? They introduced the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. They attributed the performance of Yi models primarily to its data quality resulting from their data-engineering efforts.
Behind the New. They polished a small scale instruction dataset over multiple iterations such that every single instance has been verified directly by their machine learning engineers.
So, How can we use this? They put a lot of effort to preprocess the data so if you want to improve your dataβs quality, this paper would be helpful. Also, they released their model in HuggingFace.
π Read Full Paper, Explore Github Repo
2. Birbal: An efficient 7B instruct-model fine-tuned with curated datasets
π Author(s): Ashvini Kumar Jindal, et al. from LinkedIn AI, USA
π Publication Date: Mar 04, 2024
β¨ Key Insights:
Whatβs New? They introduced Birbal, Mistral-7B based winning model of LLM Efficiency Challenge at NeurIPS Workshop, fine-tuned on a single RTX 4090 for 16 hours.
Behind the New. They limited to generate the dataset with only open-source base models so they are concerned about the quality of the dataset. Therefore, they curated a high-quality dataset from existing sources or generate from base LLMs.
So, How can we use this? They won the LLM Efficiency Challenge! If you are interested in efficiency of LLM, please gain some idea from this paper.
π Read Full Paper, Explore Github Repo
3. Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
π Author(s): Lingjiao Chen, et al. from Stanford University
π Publication Date: Mar 04, 2024
β¨ Key Insights:
Whatβs New? They found empirically that across multiple language tasks, the LLMβs performance first increases but then decreases as a function of the number of LLM calls.
Behind the New. More LLM calls lead to higher performance on βeasyβ queries, but lower performance on βhardβ queries, and non-monotone behavior emerges when a task contains both types of queries.
So, How can we use this? They showed that more LLM calls are not necessarily better and underscores the importance of compound system design.
π Read Full Paper
4. In Search of Truth: An Interrogation Approach to Hallucination Detection
π Author(s): Yakir Yehuda, et al. from Microsoft
π Publication Date: Mar 05, 2024
β¨ Key Insights:
Whatβs New? They presented InterrogateLLM method, taking the idea from interrogation technique of repeated interviews assessing the interviewerβs consistency to assess LLM hallucination.
Behind the New. Based off of the human trait of response inconsistency being a strong indication of lie, the proposed method achieved Balanced Accuracy (BACC) of 87%, all without relying on external knowledge
So, How can we use this? Lies only lead for trouble in the long run!
π Read Full Paper
5. How Far Are We from Intelligent Visual Deductive Reasoning?
π Author(s): Yizhe Zhang, et al. from Apple
π Publication Date: Mar 07, 2024
β¨ Key Insights:
Whatβs New? They evaluated various VLMβs deductive reasoning on Ravenβs Progressive Metrics across diverse datasets and diagnosed performance bottleneck of VLMs in the case of perception, deductive reasoning, and hypothesis verification.
Behind the New. They also identified several issues associated with current VLMs such as overconfidence, sensitivity to prompt design and an inability to effectively leverage in-context examples.
So, How can we use this? Want to get a grasp of current VLMβs research and looking for where to start? This is for you!
π Read Full Paper
6. ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
π Author(s): Xin Men, et al. from Baichuan Inc.
π Publication Date: Mar 07, 2024
β¨ Key Insights:
Whatβs New? They defined a metric called Block Influence (BI) to gauge the significance of each layer in LLMs, then proposed a pruning approach of layer removal, in which they directly delete the redundant layers in LLMs based on BI scores
Behind the New. Their experiments shoed that many layers of LLMs exhibit high similarity, and some layers play a negligible role in network functionality. Through the proposed simple layer removal, they demonstrated that their ShortGPT significantly outperforms previous SOTA methods in model pruning.
So, How can we use this? Maybe all that layers is a little too much.
π Read Full Paper
Stay curious, and until next week!