Chain-of-Thought with a Token Budget

Dec 31, 2024

The Weekly Salt #49

3 Comments

I haven't taken a closer look at ReMoE yet, but if ReLU is used as the expert selection principle, it seems that there is a possibility that every expert will have a negative "score" early in the training and "no experts will be selected".

And happy New Year Benjamin! As always, I look forward to your more sharing in the New Year!

Expand full comment

Reply (1)

Remixa

Dec 31

The way to merge the fine-tuning and base model, I've seen some geeks mention it on reddit, and it seems that the point is to prevent forgetting.

Expand full comment

Reply (1)

Benjamin Marie

Dec 31

Yes, I have read the same, several times. It was a Reddit recipe: we don't know why it works, but it works. It's nice to have a "scientific" study confirming it.

Happy New Year to you too!

Expand full comment

The Salt - Curated AI

Chain-of-Thought with a Token Budget