Thank you for being here. Letโs take a deep breath and dive into the best GenAI papers of this week!
1. Chameleon: Mixed-Modal Early-Fusion Foundation Models
๐ Author(s): Chameleon Team from FAIR at Meta
๐ Publication Date: May 16, 2024
โจ Key Insights:
Whatโs New? They presented Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence.
Behind the New. Chameleon outperformed Llama-2 in text-only tasks. It also matched or exceeded the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation.
So, How can we use this? Why are you using modalify specific encoders or decoders? Just use only one for all modalities!
๐ Read Full Paper
2. A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models
๐ Author(s): Yujuan Ding, et al. from The Hong Kong Polytechnic University
๐ Publication Date: May 10, 2024
โจ Key Insights:
Whatโs New? They reviewed existing research studies in retrieval-augmented large language models (RA-LLMs), covering three primary technical perspectives: architectures, training strategies, and applications.
Behind the New. Their survey differs from other RAG surveys in concentrating on technical perspectives and systematically reviewing models according to architecture and training paradigm in RA-LLMs, as well as application tasks.
So, How can we use this? Alleviate the hallucination and out-of-date internal knowledge by leveraging retrieval!
๐ Read Full Paper
3. PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
๐ Author(s): Ziyang Zhang, et al. from Oxford University
๐ Publication Date: May 14, 2024
โจ Key Insights:
Whatโs New? They proposed PARDEN, which avoids the domain shift by simply asking the model to repeat its own outputs. They empirically verified that PARDEN outperformed existing jailbreak detection baselines for Llama-2 and Claude-2.
Behind the New. They generate yโ with the PARDEN prefix and suffix and use the BLEU score between y and yโ, to determine whether the model is attempting to repeat the output or refusing to do so.
So, How can we use this? PARDEN is affordable to detect malicious output. Use this simple guardrail for your LLMs!
๐ Read Full Paper, Explore Github Repo
4. MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
๐ Author(s): Qi Chen, et al. from Microsoft
๐ Publication Date: May 13, 2024
โจ Key Insights:
Whatโs New? They introduced MS MARCO Web Search, an information-rich web dataset, featuring millions of real clicked query-document labels. With 10B documents and 10M queries, the dataset features millions of real clicked query-document labels, mimicking real-world web document and query distribution.
Behind the New. With this dataset, the authors hope to encourage research in areas such as generic end-to-end neural indexer models, generic embedding models, and next generation information access system with large language models.
So, How can we use this? This is a present for people in search for data for integrating LLM with web!
๐ Read Full Paper
5. LogoMotion: Visually Grounded Code Generation for Content-Aware Animation
๐ Author(s): Vivian Liu, et al. from Adobe Research
๐ Publication Date: May 11, 2024
โจ Key Insights:
Whatโs New? They introduced LogoMotion, an LLM-based system that takes in a layered document and generates animated logos through visually-grounded program synthesis.
Behind the New. The introduced techniques include creating HTML representation of a canvas, identifying primary and secondary elements, synthesizing animation code, and visually debugging animation errors. Compared to industry standard tools, LogoMotion is said to produce animations that are more context-aware and of higher quality.
So, How can we use this? One more feature acting as a copilot for designers!
๐ Read Full Paper
Stay curious, and until next week!