The Salt - Curated AI

The Salt - Curated AI

Share this post

The Salt - Curated AI
The Salt - Curated AI
Qwen2.5-VL: High-Resolution Vision Encoding with Efficient Windowed Attention

Qwen2.5-VL: High-Resolution Vision Encoding with Efficient Windowed Attention

Also impressive in language generation tasks!

Benjamin Marie's avatar
Benjamin Marie
Mar 06, 2025
∙ Paid
5

Share this post

The Salt - Curated AI
The Salt - Curated AI
Qwen2.5-VL: High-Resolution Vision Encoding with Efficient Windowed Attention
Share
source

The Qwen2-VL models are some of the most advanced vision-language models (VLMs) available, consistently outperforming other open-source models in most benchmarks. The largest version, Qwen2-VL-72B, even competes with commercial models like GPT-4o. We reviewed the models in this article:

Qwen2-VL: How Does It Work?

Qwen2-VL: How Does It Work?

Benjamin Marie
·
September 25, 2024
Read full story

Building on this, the Qwen team has released Qwen2.5-VL, which uses the latest Qwen2.5 LLMs and has been trained on more complex tasks. Qwen2.5-VL are today among the best open VLMs that you can run on your computer. It seems that the Qwen team has found a nearly optimal recipe, and datasets, to train excellent VLMs.

The Salt - Curated AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we will review this new version. We will focus on the main improvements over the previous version, especially regarding its architecture and training pipeline.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Benjamin Marie
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share