Discussion about this post

User's avatar
Remixa's avatar

I haven't taken a closer look at ReMoE yet, but if ReLU is used as the expert selection principle, it seems that there is a possibility that every expert will have a negative "score" early in the training and "no experts will be selected".

And happy New Year Benjamin! As always, I look forward to your more sharing in the New Year!

Expand full comment
2 more comments...

No posts