Prompt Repetition Improves Non-Reasoning LLMs

Yaniv Leviathan · Matan Kalman · Yossi Matias · Google Research

Simply repeating the input prompt verbatim improves LLM accuracy across models and benchmarks, with zero latency cost.

47/70

combinations improved

0 losses

How It Works

In causal language models, tokens can only attend to past tokens. Repeating the prompt ensures every token can attend to every other token at least once - improving comprehension with no increase in output length or latency.

Standard
<QUERY>
Repeated
<QUERY><QUERY>
Verbose
<QUERY>
Let me repeat that:
<QUERY>

Models Tested

7 models across 4 providers, on 7 diverse benchmarks

OpenAI
GPT-4o, GPT-4o Mini
Google
Gemini 2.0 Flash, Flash Lite
Anthropic
Claude 3 Haiku, 3.7 Sonnet
DeepSeek
DeepSeek V3