Benchmark Results
Accuracy comparison across models and benchmarks from the paper
Ordering:
ARC Challenge
OpenBookQA
GSM8K
MMLU-Pro
MATH
NameIndex
MiddleMatch
Example Comparisons
Example Comparisons
See how prompt repetition changes actual model responses