Prompt Repetition Improves Non-Reasoning LLMs
Yaniv Leviathan · Matan Kalman · Yossi Matias · Google Research
Simply repeating the input prompt verbatim improves LLM accuracy across models and benchmarks, with zero latency cost.
47/70
combinations improved
0 lossesHow It Works
In causal language models, tokens can only attend to past tokens. Repeating the prompt ensures every token can attend to every other token at least once - improving comprehension with no increase in output length or latency.
Standard
<QUERY>Repeated
<QUERY><QUERY>Verbose
<QUERY>
Let me repeat that:
<QUERY>Models Tested
7 models across 4 providers, on 7 diverse benchmarks
OpenAI
GPT-4o, GPT-4o Mini
Google
Gemini 2.0 Flash, Flash Lite
Anthropic
Claude 3 Haiku, 3.7 Sonnet
DeepSeek
DeepSeek V3