Thank you for being here! This is the eleventh article of newsletter. Letβs take a deep breath and dive in to the best LLM papers of this week!
1. BitNet: Scaling 1-bit Transformers for Large Language Models
π Author(s): Hongyu Wang, et al. from Microsoft Research
π Publication Date: Oct 17, 2023
β¨ Key Insights:
Whatβs New? They introduce BitNet, a scalable and stable 1-bit Transformer architecture for large language models.
Behind the New. BitNet applies low-precision binary weights and quantized activations while maintaining high precision. It is said to show competitive performance with 8-bit quantization methods and FP16 Transformer baselines while dramatically reducing memory and energy consumption.
So, How can we use this? The implementation of the BitNet architecture is quite simple, requiring only the replacement of linear projections (i.e., nn.Linear in PyTorch) in the Transformer.
π Read Full Paper
2. Functional Invariants to Watermark Large Transformers
π Author(s): Pierre Fernandez, et al. from FAIR, Meta
π Publication Date: Oct 17, 2023
β¨ Key Insights:
Whatβs New? They introduce a computational-cost free LLM watermarking methodology, applicable in non-blind white-box setting.
Behind the New. They leverage the inherent invariance of transformers on operations such as dimension permutation, matrix multiplication, scaling/un-scaling to watermark models without effecting the modelβs output.
So, How can we use this? Try marking your model as your own through the weights of transformers!
π Read Full Paper
3. OpenAgents: An Open Platform for Language Agents in the Wild
π Author(s): Tianbao Xie, et al. from Salesforce Research
π Publication Date: Oct 16, 2023
β¨ Key Insights:
Whatβs New? They built OpenAgents, an open platform for using and hosting language agents in the wild of everyday life.
Behind the New. They implement three agents (1) Data Agent for data analysis with Python/SQL and data tools (2) Plugins Agent with 200+ daily API tools (3) Web Agent for autonomous web browsing.
So, How can we use this? If you are in need of language agents without the time to create your own, this framework is for you!
π Read Full Paper, Explore GitHub Repo, Try Demo
4. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
π Author(s): Akari Asai, et al. from University of Washington
π Publication Date: Oct 17, 2023
β¨ Key Insights:
Whatβs New? They present Self-Reflective Retrieval-Augmented Generation (SELF-RAG) framework, that enhances an LMβs quality and factuality through retrieval and self-reflection.
Behind the New. They train a single LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special reflection tokens. This reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements.
So, How can we use this? Try this method for improving factuality and citation accuracy of large models.
π Read Full Paper, Explore GitHub Repo, Download Model
5. Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
π Author(s): Yixao Zhang, et al. from Yamaha Corporation
π Publication Date: Oct 19, 2023
β¨ Key Insights:
Whatβs New? They introduce Loop Copilot, a novel system that integrates LLMs with specialized AI music models. This enables a conversational interface for collaborative human-AI creation of music loops.
Behind the New. They hold several task-specific AI models in the backend and use LLM to interpret user interaction before selecting appropriate specialized model for task execution. To ensure musical coherence, essential attributes are maintained in a centralized table.
So, How can we use this? They are said to be working on the code - it may not be long before we can create music through conversation with AI.
π Read Full Paper, Explore Github Repo
6. VeRA: Vector-based Random Matrix Adaptation
π Author(s): Dawid J. Kopiczko, et al. from QUVA Lab
π Publication Date: Oct 17, 2023
β¨ Key Insights:
Whatβs New? They present Vector-based Random Matrix Adaptation (VeRA), which reduces the number of trainable parameters by 10x compared to LoRA, yet maintains the same performance.
Behind the New. VeRA only uses a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead.
So, How can we use this? You can try PEFT with much smaller training parameters! Please try to adapt VeRA if you donβt have much memory to train.
π Read Full Paper
7. Attack Prompt Generation for Red Teaming and Defending Large Language Models
π Author(s): Boyi Deng, et al. from University of Science and Technology of China
π Publication Date: Oct 19, 2023
β¨ Key Insights:
Whatβs New? They propose an attack framework to instruct LLMs to mimic human-generated prompts through in-context learning. Furthermore, they propose a defense framework that fine-tunes victim LLMs through iterative interactions with the attack framework to enhance their safety.
Behind the New. They constructed five SAP (Semi-automatic Attack Prompts) datasets of attack prompts with varying sizes for safety evaluation and enhancement.
So, How can we use this? Now, we can make LLM to attack and defend itself automatically. Also, we can get a new novel prompt for prompt injection in this evaluation!
π Read Full Paper, Explore Github Repo
8. TableGPT: Table-turned GPT for Diverse Table Tasks
π Author(s): Peng Li, et al. from Microsoft Corporation
π Publication Date: Oct 13, 2023
β¨ Key Insights:
Whatβs New? They propose a new βtable-tuningβ paradigm, where we continue to train/fine-tune language models like GPT-3.5 and ChatGPT, using diverse table-tasks synthesized from real tables. Table-GPT demonstrate better table-understanding capabilities and strong generalizability.
Behind the New. Most LLMs are still sub-optimal in many table-related tasks, likely because they are pre-trained predominantly on one-dimensional natural-language texts, whereas relational tables are two-dimensional objects.
So, How can we use this? We can add a table modality to LLMs! We use a lot of table data in data analysis so this modality would help LLMs to improve many tasks related to code generation and data analysis.
π Read Full Paper
9. Large Language Model Unlearning
π Author(s): Yuanshun Yao, et al. from ByteDance Research
π Publication Date: Oct 14, 2023
β¨ Key Insights:
Whatβs New? Their work is among the first to explore LLM unlearning! Also, they formulate the settings, goals, and evaluations in LLM unlearning.
Behind the New. We can benefit from unlearning (1) removing harmful responses, (2) erasing copyright-protected content, and (3) eliminating hallucinations.
So, How can we use this? Along with the rise of uncontrollable LLMs, there is also a growing interest in machine unlearning. We believe that anyone interested in machine unlearning will find this document an easy introduction.
π Read Full Paper
Stay curious, and until next week!