The Salt - Curated AI
Subscribe
Sign in
Home
Notes
AI Notebooks
AI Repositories
Related Articles
Archive
About
Latest
Top
Discussions
Mixture-of-Experts: Mixture-of-Head Attention and Embedding Model
The Weekly Salt #40
13 hrs ago
•
Benjamin Marie
1
Share this post
Mixture-of-Experts: Mixture-of-Head Attention and Embedding Model
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
Cancelling Attention Noise with Differential Transformer
The Weekly Salt #39
Oct 15
•
Benjamin Marie
6
Share this post
Cancelling Attention Noise with Differential Transformer
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
Evaluating AdEMAMix: A New Optimizer for Faster, More Efficient LLM Training
But with hyperparameter values not easy to find!
Oct 9
•
Benjamin Marie
5
Share this post
Evaluating AdEMAMix: A New Optimizer for Faster, More Efficient LLM Training
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
Cross Capabilities of LLMs and Contextual Document Embeddings
The Weekly Salt #38
Oct 8
•
Benjamin Marie
7
Share this post
Cross Capabilities of LLMs and Contextual Document Embeddings
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
LLMs Can Follow Instructions Without Instruction Tuning
The Weekly Salt #37
Oct 1
•
Benjamin Marie
4
Share this post
LLMs Can Follow Instructions Without Instruction Tuning
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
September 2024
Qwen2-VL: How Does It Work?
One of the best VLMs for image captioning, visual question answering, optical character recognition (OCR), and multimodal chat.
Sep 25
•
Benjamin Marie
3
Share this post
Qwen2-VL: How Does It Work?
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
SCoRe: Teach LLMs to Self-Correct
The Weekly Salt #36
Sep 24
•
Benjamin Marie
1
Share this post
SCoRe: Teach LLMs to Self-Correct
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
New Advances in Linear-time Sequence Modeling
The Weekly Salt #35
Sep 17
•
Benjamin Marie
1
Share this post
New Advances in Linear-time Sequence Modeling
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
Efficient Long Context Generalization with LongRecipe
The Weekly Salt #34
Sep 10
•
Benjamin Marie
2
Share this post
Efficient Long Context Generalization with LongRecipe
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
4
Q-GaLore: Pre-Train 7B Parameter LLMs from Scratch on a 16 GB GPU
Start now, get your model in 50 years!
Sep 4
•
Benjamin Marie
2
Share this post
Q-GaLore: Pre-Train 7B Parameter LLMs from Scratch on a 16 GB GPU
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
2
Enhanced SSM Training Through Initialization with a Pre-trained Transformer
The Weekly Salt #33
Sep 3
•
Benjamin Marie
1
Share this post
Enhanced SSM Training Through Initialization with a Pre-trained Transformer
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
August 2024
Add Code to Your Training Data for Better LLMs
But not too much!
Aug 28
•
Benjamin Marie
Share this post
Add Code to Your Training Data for Better LLMs
thesalt.substack.com
Copy link
Facebook
Email
Note
Other
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts