Discussion about this post

User's avatar
Tomaž Savodnik's avatar

Wondering if LLM as a judge could be finetuned to better score LLM translation errors that Comet and MetricX (even hybrid) miss... Even in 0-shot w/o fine tuning some prompts show promissing results. Will see, maybe even Gemma 270M could be FT for this task with well curated DS.

Expand full comment
1 more comment...

No posts