Thank you for being here! This week marked 1-year anniversary of ChatGPT, making LLM the topic of the era. To keep the fire burning, letβs take a deep breath and dive in to the best LLM papers of this week!
1. The Falcon Series of Open Language Models
π Author(s): Ebtesam Almazrouei, et al. from The Falcon LLM Team
π Publication Date: Nov 29, 2023
β¨ Key Insights:
Whatβs New? They introduced the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models trained on a diverse high-quality corpora predominantly assembled from web data.
Behind the New. Falcon-180B outperforms models such as PaLM, Chinchilla, LLaMA 2, or Inflection-1. One of the best open-source LLMs out there!
So, How can we use this? You can use this model in HuggingFace, or train your model with novel ideas in this paper!
π Read Full Paper, HuggingFace
2. ChatGPTβs One-year Anniversary: Are Open-Source Large Language Models Catching up?
π Author(s): Hailin Chen, et al. from Salesforce Research
π Publication Date: Nov 29, 2023
β¨ Key Insights:
Whatβs New? Celebrating ChatGPTβs 1-year anniversary, this paper is a review of the following open source AI models since its release in comparison to ChatGPT.
Behind the New. This paper compares open-source model performance as well as novel practices and issues of open-source LLM. As of now, open source models still looks to be on their way to catching up to GPT-3.5-turbo.
So, How can we use this? If youβre searching for a strong open-source model, this paper may be a good start to getting a grasp of different LLM models and their traits.
π Read Full Paper
3. CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
π Author(s): Zineng Tang, et al. from Microsoft
π Publication Date: Nov 30, 2023
β¨ Key Insights:
Whatβs New? They present CoDi-2, a versatile and interactive Multi-modal Large Language Model (MLLM) that can follow complex multimodal interleaved instructions, conduct in-context learning (ICL), reason, chat, edit, etc., in an any-to-any input-output modality paradigm.
Behind the New. CoDi-2 is made up of multimodal LLM that aligns audio and video modalities for both encoding and decoding. It is said to show strong abilities in zero-shot prompting, few-shot prompting, exemplar learning, concept learning, and subject-driven learning.
So, How can we use this? CoDi-2 may be the answer to multi-round generation for image, audio, and video!
π Read Full Paper, Explore GitHub Repo
4. TaskBench: Benchmarking Large Language Models for Task Automation
π Author(s): Yongliang Shen, et al. from Microsoft Research Asia
π Publication Date: Nov 30, 2023
β¨ Key Insights:
Whatβs New? They introduce TaskBench to evaluate the capability of LLMs in task automation.
Behind the New. TaskBench aims to evaluate three critical stages in task automation: task decomposition, tool invocation, and parameter prediction to fulfill user intent. Experimental results is said that TaskBench effectively reflect the capability of LLMs with high consistency compared to human evaluation.
So, How can we use this? Automating tasks with LLM is a steady need. TaskBench may be a good place to evaluate how well your AI assistant is doing.
π Read Full Paper, Explore GitHub Repo
5. Learning to Skip for Language Modeling
π Author(s): Dewen Zeng, et al. from Google Research
π Publication Date: Nov 26, 2023
β¨ Key Insights:
Whatβs New? For language model pre-training, they propose a method that dynamically skips the execution of a layer (or module) for any input token with a binary router.
Behind the New. With the proposed SkipLayer framework, they use a routing mechanism to execute layers based on the input context. The proposed method is said to significantly improve 1-shot performance across NLP tasks.
So, How can we use this? Is allocating same amount of parameters to each token necessary?
π Read Full Paper
6. AutoKG: Efficient Automated Knowledge Graph Generation for Language Models
π Author(s): Bohan Chen, et al. from UCLA
π Publication Date: Nov 22, 2023
β¨ Key Insights:
Whatβs New? They introduced AutoKG, a lightweight and efficient approach for automated knowledge graph (KG) construction.
Behind the New. AutoKG first extracts keywords using a LLM and then evaluates the relationship weight between each pair of keywords using graph Laplace learning.
So, How can we use this? I think that building the knowledge graph is quite difficult problem. However, with AutoKG, you can easily make the KG! Try this method if you are trying to apply knowledge graph in your system.
π Read Full Paper
7. Language Model Inversion
π Author(s): John X. Morris, et al. from Cornell University
π Publication Date: Nov 22, 2023
β¨ Key Insights:
Whatβs New? They recovered the text in cases where it is hidden from the user, motivating a method for recovering unknown prompts given only the modelβs current distribution output.
Behind the New. Language models produce a distribution over the next token. Can we use this to recover the prompt tokens? The answer is maybe yes. Their inversion method reconstructs prompts with a token-level F1 of 78 and recovers 27% of prompts exactly!
So, How can we use this? You cannot hide your novel prompts forever. Someday, attackers will find your assets with crazy method like this.
π Read Full Paper, Explore Github Repo
8. Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
π Author(s): Yunxin Li, et al. from Harbin Institute of Technology, Shenzhen
π Publication Date: Nov 27, 2023
β¨ Key Insights:
Whatβs New? They proposed an approach called MKS2, aimed at enhancing LLMs through empowering Multimodal Knowledge Storage and Sharing in LLMs.
Behind the New. They used the Modular Visual Memory, a component integrated into the internal blocks of LLMs, designed to store open-world visual information efficiently. MKS2-LLama-2-13b achieves SOTA zero-shot performance on 7 seven natural language reasoning tasks!
So, How can we use this? Use MSK2 for improving LLMs with visual knowledge!
π Read Full Paper
Stay curious, and until next week!