The Salt - Curated AI
Subscribe
Sign in
Home
Notes
AI Notebooks
AI Repositories
Related Articles
deep dive
Archive
About
deep dive
Latest
Top
Discussions
Magistral: Advancing Reasoning with Efficient GRPO Training
No More KL Penalty, No Need for a Reference Model
Jun 12
•
Benjamin Marie
1
Share this post
The Salt - Curated AI
Magistral: Advancing Reasoning with Efficient GRPO Training
Copy link
Facebook
Email
Notes
More
Qwen3 Technical Report: Reasoning in Pre-Training and Post-Training
Plus a Brief Look at the Limitations of the Multilingual Evaluation
May 16
•
Benjamin Marie
6
Share this post
The Salt - Curated AI
Qwen3 Technical Report: Reasoning in Pre-Training and Post-Training
Copy link
Facebook
Email
Notes
More
Qwen2.5-VL: High-Resolution Vision Encoding with Efficient Windowed Attention
Also impressive in language generation tasks!
Mar 6
•
Benjamin Marie
5
Share this post
The Salt - Curated AI
Qwen2.5-VL: High-Resolution Vision Encoding with Efficient Windowed Attention
Copy link
Facebook
Email
Notes
More
TÜLU 3: The Post-Training Recipe
SFT + DPO + RLVR
Dec 19, 2024
•
Benjamin Marie
5
Share this post
The Salt - Curated AI
TÜLU 3: The Post-Training Recipe
Copy link
Facebook
Email
Notes
More
TÜLU 3's High-Quality Synthetic Datasets for Post-Training LLMs
Made by GPT-4o
Dec 5, 2024
•
Benjamin Marie
4
Share this post
The Salt - Curated AI
TÜLU 3's High-Quality Synthetic Datasets for Post-Training LLMs
Copy link
Facebook
Email
Notes
More
Go Zero-Shot for Cheaper LLM Evaluations
Unless you use a generative benchmark
Nov 6, 2024
•
Benjamin Marie
4
Share this post
The Salt - Curated AI
Go Zero-Shot for Cheaper LLM Evaluations
Copy link
Facebook
Email
Notes
More
Evaluating AdEMAMix: A New Optimizer for Faster, More Efficient LLM Training
But with hyperparameter values not easy to find!
Oct 9, 2024
•
Benjamin Marie
6
Share this post
The Salt - Curated AI
Evaluating AdEMAMix: A New Optimizer for Faster, More Efficient LLM Training
Copy link
Facebook
Email
Notes
More
Qwen2-VL: How Does It Work?
One of the best VLMs for image captioning, visual question answering, optical character recognition (OCR), and multimodal chat.
Sep 25, 2024
•
Benjamin Marie
3
Share this post
The Salt - Curated AI
Qwen2-VL: How Does It Work?
Copy link
Facebook
Email
Notes
More
Q-GaLore: Pre-Train 7B Parameter LLMs from Scratch on a 16 GB GPU
Start now, get your model in 50 years!
Sep 4, 2024
•
Benjamin Marie
2
Share this post
The Salt - Curated AI
Q-GaLore: Pre-Train 7B Parameter LLMs from Scratch on a 16 GB GPU
Copy link
Facebook
Email
Notes
More
2
Add Code to Your Training Data for Better LLMs
But not too much!
Aug 28, 2024
•
Benjamin Marie
Share this post
The Salt - Curated AI
Add Code to Your Training Data for Better LLMs
Copy link
Facebook
Email
Notes
More
How Generative LLMs Achieve Top MMLU Scores without Generating Anything
what you think MMLU evaluates ≠ what MMLU really evaluates
Aug 7, 2024
•
Benjamin Marie
4
Share this post
The Salt - Curated AI
How Generative LLMs Achieve Top MMLU Scores without Generating Anything
Copy link
Facebook
Email
Notes
More
CriticGPT: How OpenAI Is Improving GPT-4 with GPT-4
And why GPT-4 is getting better at coding tasks
Jul 10, 2024
•
Benjamin Marie
2
Share this post
The Salt - Curated AI
CriticGPT: How OpenAI Is Improving GPT-4 with GPT-4
Copy link
Facebook
Email
Notes
More
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts