The Salt - Curated AI

The Salt - Curated AI

Share this post

The Salt - Curated AI
The Salt - Curated AI
Nemotron-H: The Mamba/Transformer Models by NVIDIA
Copy link
Facebook
Email
Notes
More

Nemotron-H: The Mamba/Transformer Models by NVIDIA

How can this work?

Benjamin Marie's avatar
Benjamin Marie
May 02, 2025
∙ Paid
1

Share this post

The Salt - Curated AI
The Salt - Curated AI
Nemotron-H: The Mamba/Transformer Models by NVIDIA
Copy link
Facebook
Email
Notes
More
Share
Image generated with ChatGPT

I think the last time I reviewed a hybrid architecture was with Jamba, and that was over a year ago!

Jamba: The New Hybrid Transformer/Mamba

Jamba: The New Hybrid Transformer/Mamba

Benjamin Marie
·
April 25, 2024
Read full story

Now, with the release of NVIDIA’s new Nemotron-H models, which show strong performance in terms of accuracy, inference speed, and memory efficiency, it's a great opportunity to revisit the evolving landscape of hybrid LLMs.

To be clear, I don’t believe hybrid models will surpass standard Transformer architectures in quality or popularity. Transformers continue to become more efficient and widely adopted, while hybrid models have been around for some time without achieving mainstream traction.

However, hybrid models remain a valuable area of research. Hybrid approaches often reveal unique behaviors and insights that are both intriguing and useful.

The Salt - Curated AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we’ll take a closer look at the Nemotron-H models, exploring what’s new, how NVIDIA trained them, and what you need to fine-tune them. The good news? They’re easy to try out, which is rarely the case with non-standard LLMs.

Since the Nemotron-H models are only base models, i.e., they have only been pre-trained on a large dataset, they must be fine-tuned to be useful. In this notebook, I propose the fine-tuning code (full fine-tuning and LoRA) based on TRL and Transformers:

Get the notebook (#16)

Why Hybrid Models Can Be Useful?

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Benjamin Marie
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More