The Salt - Curated AI

The Salt - Curated AI

Share this post

The Salt - Curated AI
The Salt - Curated AI
Qwen3 Technical Report: Reasoning in Pre-Training and Post-Training

Qwen3 Technical Report: Reasoning in Pre-Training and Post-Training

Plus a Brief Look at the Limitations of the Multilingual Evaluation

Benjamin Marie's avatar
Benjamin Marie
May 16, 2025
∙ Paid
6

Share this post

The Salt - Curated AI
The Salt - Curated AI
Qwen3 Technical Report: Reasoning in Pre-Training and Post-Training
Share
Image generated with ChatGPT

Qwen3 was released last month, and I can confirm that Qwen3 is just as easy to use as Qwen2.5, while offering better performance on many tasks.

How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?

How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?

Benjamin Marie
·
May 1
Read full story
Fine-Tuning Qwen3: Base vs. Reasoning Models

Fine-Tuning Qwen3: Base vs. Reasoning Models

Benjamin Marie
·
May 8
Read full story

One recurring complaint I have about the model is its verbosity. It often produces unnecessarily long responses, even when reasoning is turned off. The Qwen3 technical report, released this week, helps explain why: the model is pre-trained to reason.

The Salt - Curated AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, I review the technical report and highlight the main design choices behind Qwen3. Architecturally, the models are quite similar to Qwen2.5. The key differences lie in the multi-stage pre-training and post-training pipelines. I’ll also dedicate the final section to some critical thoughts on the multilingual evaluation.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Benjamin Marie
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share