Prompt Repetition Improves Non-Reasoning LLMs

Yaniv Leviathan · Matan Kalman · Yossi Matias · Google Research

Simply repeating the input prompt verbatim improves LLM accuracy across models and benchmarks, with zero latency cost.

47/70

combinations improved

0 losses

How It Works

In causal language models, tokens can only attend to past tokens. Repeating the prompt ensures every token can attend to every other token at least once - improving comprehension with no increase in output length or latency.

Standard

<QUERY>

Repeated

<QUERY><QUERY>

Verbose

<QUERY>
Let me repeat that:
<QUERY>

Models Tested

7 models across 4 providers, on 7 diverse benchmarks

OpenAI

GPT-4o, GPT-4o Mini

Google

Gemini 2.0 Flash, Flash Lite

Anthropic

Claude 3 Haiku, 3.7 Sonnet

DeepSeek

DeepSeek V3

Try It Yourself

Test prompt repetition with your own prompts using OpenAI, Gemini, or Anthropic

Playground

Benchmark Results

Accuracy charts and example comparisons from the paper's 7 benchmarks

View Results