CriticGPT: How OpenAI Is Improving GPT-4 with GPT-4
And why GPT-4 is getting better at coding tasks
The GPT-4 models, including the models powering ChatGPT, are designed to be helpful and interactive, using a technique called “Reinforcement Learning from Human Feedback” (RLHF). In RLHF, humans rate and compare different ChatGPT responses to gather valuable feedback that is then used to improve the GPT models.
As improvements in reasoning and model behavior are made through RLHF, ChatGPT becomes more precise, and its errors become less obvious. This increased accuracy makes it difficult for humans to detect mistakes. This poses a significant challenge to RLHF, as aligning models may become harder when they surpass human knowledge.
In this article, I review how OpenAI trained LLM critics, such as CriticGPT, to generate critiques that point out inaccuracies in ChatGPT’s responses, assisting humans in the RLHF pipeline.