Thank you for being here at the beginning of this journey. This is the second article of newsletter. Letβs dive in to the best LLM papers of this week!
1. Self-Alignment with Instruction Backtranslation
π Author(s): Xian Li, et al. from Meta AI
π Publication Date: Aug 14, 2023
β¨ Key Insights:
Annotating instruction following datasets with good quality is hard to scale.
They proposed instruction backtranslation, generating instruction prompts from unlabeled web documents (self-augmentation), then selecting high quality examples (self-curation).
They gave their detailed prompts for doing self-augmentation and self-curation.
On the Alpaca leaderboard, finetuned models with authorsβ approach outperform all other models, while using fewer human annotated examples.
π Read Full Paper
2. A Large Language Model Enhanced Conversational Recommender System
π Author(s): Yue Feng, et al. from Kuaishou Technology
π Publication Date: Aug 11, 2023
β¨ Key Insights:
They introduced 4 sub-tasks of Conversational Recommender System (CRS), user preference elicitation, recommendation, explanation, item information search.
They proposed the LLMCRS, a collaborative conversational recommender system that consists of an LLM.
The workflow of LLMCRS has four stages: sub-task detection, model matching, sub-task execution, and response generation.
They used Reinforcement Learning from Performance Feedback (RLPF) to improve CRS. The reward signal is a mixture of hit rate and BLEU.
π Read Full Paper
3. Platypus: Quick, Cheap, and Powerful Refinement of LLMs
π Author(s): Ariel N. Lee, et al. from Boston University
π Publication Date: Aug 14, 2023
β¨ Key Insights:
New Winner in HuggingFaceβs Open LLM Leaderboard π
They made Open-Platypus, the public text dataset which is a subset of other open datasets focused on improving LLMsβ STEM and logic knowledge.
They trained 13B model with only one A100 for 5 hours with LoRA modules.
They analyzed deeply to check test data leaks and contamination in the train data.
π Read Full Paper, Explore Github Repo
4. OctoPack: Instruction Tuning Code Large Language Models
π Author(s): Niklas Muennighoff, et al. from BigCode Project
π Publication Date: Aug 14, 2023
β¨ Key Insights:
They compiled CommitPack, a dataset which is 4TB of Git commits across 350 programming languages.
They introduced HumanEvalPack, a benchmark of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust).
They proposed OctoCoder, which achieved state-of-the-art performance among only openly licensed models on HumanEvalPack benchmark.
π Read Full Paper, Explore Github Repo
5. GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
π Author(s): Youliang Yuan, et al. from Tencent AI Lab
π Publication Date: Aug 12, 2023
β¨ Key Insights:
They proposed CipherChat, a novel framework to do jail-breaking through cipher prompts.
GPT-4 is too smart so it is able to understand the cipher and respond the answer encrypted.
Moreover, They introduced SelfCipher, which is different from the common ciphers (morse, caesar, etc.). SelfCipher allows LLM encrypt and decrypt itself with its own cipher methods. This method outperformed other common ciphers to induce harmful outputs.
π Read Full Paper, Explore Github Repo
6. Backward Reasoning in Large Language Models for Verification
π Author(s): Weisen Jiang, et al. from HKUST, Huawei Noahβs Ark Lab
π Publication Date: Aug 15, 2023
β¨ Key Insights:
They masked a token in the question by x and ask the LLM to predict the masked token when a candidate answer is provided by a template, βIf we know the answer of the above question is {a candidate answer}, what is the value of unknown variable x?β.
Their approach is quite unique because all of other methods (CoT, Self-Consistency) are based in a forward reasoning manner.
They combined forward and backward reasoning, FOBAR, and it achieved state-of-the-art performance on various reasoning benchmarks.
π Read Full Paper
Stay curious, and until next week!