📜 Top GenAI Papers of the Week

and

May 17, 2024

Thank you for being here. Let’s take a deep breath and dive into the best GenAI papers of this week!

🌐 Author(s): Chameleon Team from FAIR at Meta

📅 Publication Date: May 16, 2024

✨ Key Insights:

What’s New? They presented Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence.

Behind the New. Chameleon outperformed Llama-2 in text-only tasks. It also matched or exceeded the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation.
So, How can we use this? Why are you using modalify specific encoders or decoders? Just use only one for all modalities!

🌐 Author(s): Yujuan Ding, et al. from The Hong Kong Polytechnic University

📅 Publication Date: May 10, 2024

✨ Key Insights:

What’s New? They reviewed existing research studies in retrieval-augmented large language models (RA-LLMs), covering three primary technical perspectives: architectures, training strategies, and applications.
Behind the New. Their survey differs from other RAG surveys in concentrating on technical perspectives and systematically reviewing models according to architecture and training paradigm in RA-LLMs, as well as application tasks.
So, How can we use this? Alleviate the hallucination and out-of-date internal knowledge by leveraging retrieval!

🌐 Author(s): Ziyang Zhang, et al. from Oxford University

📅 Publication Date: May 14, 2024

✨ Key Insights:

What’s New? They proposed PARDEN, which avoids the domain shift by simply asking the model to repeat its own outputs. They empirically verified that PARDEN outperformed existing jailbreak detection baselines for Llama-2 and Claude-2.
Behind the New. They generate y’ with the PARDEN prefix and suffix and use the BLEU score between y and y’, to determine whether the model is attempting to repeat the output or refusing to do so.
So, How can we use this? PARDEN is affordable to detect malicious output. Use this simple guardrail for your LLMs!

🌐 Author(s): Qi Chen, et al. from Microsoft

📅 Publication Date: May 13, 2024

✨ Key Insights:

What’s New? They introduced MS MARCO Web Search, an information-rich web dataset, featuring millions of real clicked query-document labels. With 10B documents and 10M queries, the dataset features millions of real clicked query-document labels, mimicking real-world web document and query distribution.
Behind the New. With this dataset, the authors hope to encourage research in areas such as generic end-to-end neural indexer models, generic embedding models, and next generation information access system with large language models.
So, How can we use this? This is a present for people in search for data for integrating LLM with web!

🌐 Author(s): Vivian Liu, et al. from Adobe Research

📅 Publication Date: May 11, 2024

✨ Key Insights:

What’s New? They introduced LogoMotion, an LLM-based system that takes in a layered document and generates animated logos through visually-grounded program synthesis.
Behind the New. The introduced techniques include creating HTML representation of a canvas, identifying primary and secondary elements, synthesizing animation code, and visually debugging animation errors. Compared to industry standard tools, LogoMotion is said to produce animations that are more context-aware and of higher quality.
So, How can we use this? One more feature acting as a copilot for designers!

Stay curious, and until next week!

Top GenAI Papers of the Week