In this paper, we present VerifyBench, a benchmark specifically designed to evaluate the accuracy of reference-based reward systems. To create VerifyBench, we curated a diverse collection of ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results