Archive - The Salt - Curated AI

Improving Generalization of MoEs with Routing Manifold Alignment

The Weekly Salt #95

Nov 12 •

INT vs FP Data Types for Quantization

The Weekly Salt #94

Nov 5 •

October 2025

Reasoning with Random Resampling to Match RL's Accuracy

The Weekly Salt #93

Oct 29 •

A Recipe to Convert Existing Models Into Fine-Tuned BitNet

The Weekly Salt #92

Oct 22 •

When Quantization Improves Reinforcement Learning

The Weekly Salt #91

Oct 15 •

Could GRPO Be an "Off-Policy" Algorithm?

The Weekly Salt #90

Oct 9 •

Pre-Training Updates: NVFP4 and Thinking Augmented

The Weekly Salt #89

Oct 1 •

September 2025

How Poor SFT Data Overwrites Learned Knowledge

The Weekly Salt #87

Sep 24 •

Jet-Nemotron: Searching for the Best Attention Architecture

DeltaNet + Hardware-aware Search

Sep 23 •

MMBERT as a Drop-in Successor to XLM-R

The Weekly Salt #86

Sep 17 •

LLMs Hallucinate and That's a Benchmarking Problem

The Weekly Salt #85

Sep 10 •

What Breaks When You Quantize for Translation? A Deep Dive Across 55 Languages

Evaluating LLM translation under quantization with COMET, BLEU, GGUF models, and more

Sep 8 •

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts