The Allen Institute for AI (AI2) has recently released the TÜLU 3 models which perform among the best open LLMs. True to AI2's commitment to open science, they have shared everything: the models, datasets, training recipes, evaluation frameworks, and a paper:
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
AI2 has taken Llama 3.1 (8B and 70B models)1 to the next level with post-training that includes supervised fine-tuning, Direct Preference Optimization (DPO), and a new reinforcement learning technique. These improvements are part of their work on TÜLU 3.
To build great models, you need great datasets. AI2 put a ton of effort into finding, creating, and cleaning the datasets used for Llama 3.1’s post-training. Instead of focusing on quantity, they prioritized diversity and quality, which makes a significant difference in how well the models perform.
In this article, we’ll dig into the datasets AI2 put together for each step of TÜLU 3’s post-training. I’ll walk you through how they made these datasets and explain how you can use them to fine-tune your own models.
This article is all about the datasets AI2 has shared. If you’re curious about their training methods, including the new reinforcement learning technique they introduced with TÜLU 3, we’ll cover those in a follow-up article. Stay tuned!