Thank you for being here. Letβs take a deep breath and dive into the best GenAI papers of this week!
1. Better & Faster Large Language Models via Multi-token Prediction
π Author(s): Fabian Gloeckle, et al. from FAIR at Meta
π Publication Date: Apr 30, 2024
β¨ Key Insights:
Whatβs New? They proposed a simple multi-token prediction architecture with no train time or memory overhead with higher sample efficiency.
Behind the New. They asked the model to predict following n tokens using n independent output heads, operating on top of a shared model trunk. The proposed method is said to be increasingly useful for larger model sizes.
So, How can we use this? Is the era of next token prediction coming to an end?
π Read Full Paper
2. Extending Llama-3βs Context Ten-Fold Overnight
π Author(s): Peitian Zhang, et al. from Beijing Academy of Artificial Intelligence
π Publication Date: Apr 30, 2024
β¨ Key Insights:
Whatβs New? They extended the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The resulting model exhibited superior performance across evaluation tasks such as NIHS, topic retrieval, and long context language understanding.
Behind the New. The proposed training cycle is super efficient, taking only 8 hours on one 8xA100(80G) GPU machine.
So, How can we use this? Try this for more context length on sLLM!
π Read Full Paper
3. PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
π Author(s): Lin Xu, et al. from National University of Singapore
π Publication Date: Apr 29, 2024
β¨ Key Insights:
Whatβs New? They proposed a new model termed Pooling LLaVA(PLLaVA), achieving new SOTA performance on modern video question-answer and captioning benchmark datasets.
Behind the New. The proposed method implements simple but effective pooling strategy to smooth the feature distribution along the temporal dimension and thus reducing dominant impacts from extreme features.
So, How can we use this? Still manually inputting frames for video understanding? Try this pooling approach for more efficient computation and performance!
π Read Full Paper, Explore Project Page
4. Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
π Author(s): Yunhao Ge, et al. from NVIDIA
π Publication Date: Apr 30, 2024
β¨ Key Insights:
Whatβs New? They proposed VisualFactChecker (VFC), a flexible training-free pipeline that generates high-fidelity and detailed captions for both 2D images and 3D objects. Their evaluation showed that VFC outperformed SOTA open-sourced captioning methods for 2D images on the COCO dataset and 3D assets on the Objaverse dataset.
Behind the New. VFC consisted of three steps: 1) proposal, where image-to-text captioning models propose multiple initial captions; 2) verification, where a large language model (LLM) utilizes tools such as object detection and VQA models to fact-check proposed captions; 3) captioning, where an LLM generates the final caption by summarizing caption proposals and the fact check verification results.
So, How can we use this? Additional fact-checking step could improve the performance!
π Read Full Paper
5. AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
π Author(s): Anselm Paulus, et al. from FAIR
π Publication Date: Apr 21, 2024
β¨ Key Insights:
Whatβs New? They presented a novel method that uses another LLM, called the AdvPrompter, to generate human-readable adversarial prompts in seconds.
Behind the New. They trained the AdvPrompter using a novel algorithm that does not require access to the gradients of the TargetLLM.
So, How can we use this? Fine-tune your LLM on a synthetic dataset generated by AdvPrompter, it makes your LLMs can be made more robust against jailbreaking attacks while maintaining performance!
π Read Full Paper, Explore Github Repo
6. Octopus v4: Graph of language models
π Author(s): Wei Chen, et al. from Nexa AI
π Publication Date: Apr 30, 2024
β¨ Key Insights:
Whatβs New? They introduced a novel approach that employs functional tokens to integrate multiple open-source models, each optimized for particular tasks. Their newly developed Octopus v4 model leveraged functional tokens to intelligently direct user queries to the most appropriate vertical model.
Behind the New. By activating models around 10B parameters, they achieved SOTA MMLU score of 74.8 among the same level models.
So, How can we use this? Try to adopt functional tokens. Better function calling makes your LLM more powerful!
π Read Full Paper, Explore Github Repo, Explore HuggingFace
Stay curious, and until next week!