Nemotron-H: The Mamba/Transformer Models by NVIDIA

How can this work?

Benjamin Marie

May 02, 2025

∙ Paid

I think the last time I reviewed a hybrid architecture was with Jamba, and that was over a year ago!

Jamba: The New Hybrid Transformer/Mamba

Benjamin Marie

April 25, 2024

Read full story

Now, with the release of NVIDIA’s new Nemotron-H models, which show strong performance in terms of accuracy, inference speed, and memory efficiency, it's a great opportunity to revisit the evolving landscape of hybrid LLMs.

To be clear, I don’t believe hybrid models will surpass standard Transformer architectures in quality or popularity. Transformers continue to become more efficient and widely adopted, while hybrid models have been around for some time without achieving mainstream traction.

However, hybrid models remain a valuable area of research. Hybrid approaches often reveal unique behaviors and insights that are both intriguing and useful.

In this article, we’ll take a closer look at the Nemotron-H models, exploring what’s new, how NVIDIA trained them, and what you need to fine-tune them. The good news? They’re easy to try out, which is rarely the case with non-standard LLMs.

Since the Nemotron-H models are only base models, i.e., they have only been pre-trained on a large dataset, they must be fine-tuned to be useful. In this notebook, I propose the fine-tuning code (full fine-tuning and LoRA) based on TRL and Transformers:

Get the notebook (#16)

The Salt - Curated AI

Nemotron-H: The Mamba/Transformer Models by NVIDIA

How can this work?

Jamba: The New Hybrid Transformer/Mamba

Why Hybrid Models Can Be Useful?

This post is for paid subscribers