The Salt - Curated AI
Subscribe
Sign in
Home
Notes
AI Notebooks
AI Repositories
Related Articles
deep dive
Archive
About
deep dive
Latest
Top
Discussions
Jet-Nemotron: Searching for the Best Attention Architecture
DeltaNet + Hardware-aware Search
Sep 23
•
Benjamin Marie
2
Magistral: Advancing Reasoning with Efficient GRPO Training
No More KL Penalty, No Need for a Reference Model
Jun 12
•
Benjamin Marie
2
Qwen3 Technical Report: Reasoning in Pre-Training and Post-Training
Plus a Brief Look at the Limitations of the Multilingual Evaluation
May 16
•
Benjamin Marie
6
Qwen2.5-VL: High-Resolution Vision Encoding with Efficient Windowed Attention
Also impressive in language generation tasks!
Mar 6
•
Benjamin Marie
5
TÜLU 3: The Post-Training Recipe
SFT + DPO + RLVR
Dec 19, 2024
•
Benjamin Marie
5
TÜLU 3's High-Quality Synthetic Datasets for Post-Training LLMs
Made by GPT-4o
Dec 5, 2024
•
Benjamin Marie
4
Go Zero-Shot for Cheaper LLM Evaluations
Unless you use a generative benchmark
Nov 6, 2024
•
Benjamin Marie
4
Evaluating AdEMAMix: A New Optimizer for Faster, More Efficient LLM Training
But with hyperparameter values not easy to find!
Oct 9, 2024
•
Benjamin Marie
6
Qwen2-VL: How Does It Work?
One of the best VLMs for image captioning, visual question answering, optical character recognition (OCR), and multimodal chat.
Sep 25, 2024
•
Benjamin Marie
3
Q-GaLore: Pre-Train 7B Parameter LLMs from Scratch on a 16 GB GPU
Start now, get your model in 50 years!
Sep 4, 2024
•
Benjamin Marie
2
2
Add Code to Your Training Data for Better LLMs
But not too much!
Aug 28, 2024
•
Benjamin Marie
How Generative LLMs Achieve Top MMLU Scores without Generating Anything
what you think MMLU evaluates ≠ what MMLU really evaluates
Aug 7, 2024
•
Benjamin Marie
4
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts