The Salt - Curated AI

The Salt - Curated AI

Share this post

The Salt - Curated AI
The Salt - Curated AI
LongRoPE: Towards Unlimited Context Length for the Transformer

LongRoPE: Towards Unlimited Context Length for the Transformer

Experiments with up to 2 million tokens

Benjamin Marie's avatar
Benjamin Marie
Mar 06, 2024
∙ Paid
10

Share this post

The Salt - Curated AI
The Salt - Curated AI
LongRoPE: Towards Unlimited Context Length for the Transformer
2
Share

Transformer models have a limited context size that can be too small for a wide range of applications, such as summarization, information retrieval, or in-context learning with numerous examples.

A transformer model can’t accurately model a context longer than the examples it has seen during training. We must increase the sequence length at training time to get better accuracy for longer sequences. However, this is often impractical due to the cost of training on long examples and the scarcity of long examples for training.

Several methods have been proposed to generalize beyond the sequence lengths seen during training. ALiBi (Attention with Linear Biases) and RoPE (Rotary Position Embedding) are among the most popular but still have severe limitations preventing them from dealing with a context of millions of tokens.

The Salt - Curated AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, I review LongRoPE.

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

LongRoPE is a recent work by Microsoft extending RoPE to far larger contexts. It shows promising performance by maintaining a low perplexity with context from 4,000 to 2 million tokens. LongRoPE can be applied to any LLMs trained with RoPE (e.g., Llama 2, Mistral 7B, Mixtral-7x8B).

We will see what the main limitations of current methods are and how LongRoPE improves RoPE for extending the LLM context window beyond 2 million tokens.

Since RoPE itself can be quite complex, the first section of this article is a short explainer of RoPE.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Benjamin Marie
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share