Note: It’s relatively quiet on arXiv now. I only read two papers that I think are worth sharing this week.
Reviewed this week
⭐Self-Taught Evaluators
Transformer Explainer: Interactive Learning of Text-Generative Models
⭐: Papers that I particularly recommend reading.
New code repositories:
No new code repository this week.
I maintain a curated list of AI code repositories here:
Developing strong evaluator models typically requires large amounts of high-quality preference data from human annotations, which is both costly and time-consuming, especially for complex tasks like coding and mathematics.
This reliance on human-made data presents significant challenges for scaling to new tasks or evaluation criteria. Additionally, as newer models surpass older ones, existing annotations often become outdated, since they are based on judgments of responses from less advanced models.
The authors of this paper explore an iterative self-training approach that eliminates the need for human-annotated preferences in the training loop, instead using only synthetically generated data. Starting with a seed model, they generate synthetic preference pairs for a given input, where one response is intentionally designed to be inferior. The model then acts as a judge, generating reasoning traces and judgments for these pairs. Training on this labeled data results in a superior model, which can then self-improve through further iterations.
Using synthetic data to train LLMs for evaluation is not new. For instance, many datasets on the Hugging Face, evaluated for instance by GPT-4, are used to train RLHF’s reward models. However, these datasets often compare outputs that are very close in terms of quality which makes them difficult to rank.
Experiments by the authors demonstrate that their method significantly improves accuracy on the RewardBench benchmark.
Transformer Explainer: Interactive Learning of Text-Generative Models
This paper is more a demo paper than a scientific paper. It is mainly interesting for the tool it is presenting.
TRANSFORMER EXPLAINER is an open-source, web-based interactive tool designed to help non-experts understand the Transformer model, both in terms of its overall structure and the detailed mathematical operations involved.
I found it entertaining to play with.
The tool focuses on text generation, a common application of Transformers, and uses a Sankey diagram to visually represent how data flows through the model's components. This visualization helps users grasp how information is processed and transformed within the model.
It’s based on GPT-2’s weights. I hope that someday we will be able to plug any LLM into it.
If you have any questions about one of these papers, write them in the comments. I will answer them.