LeVanLoi miscellaneous articles

  1. Trang chủ
  2. Lưu
  3. Thẻ
  4. Hỏi - Đáp

 
 
LeVanLoi'log, ⌚ 2025-02-17
***
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Tác giả: Lê Văn Lợi sưu tầm
Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein

Link: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Abs:

We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

--- 

[Top Papers of the week]

This work introduces a latent recurrent-depth transformer, a model that scales test-time reasoning without relying on additional token generation. Instead of increasing the context window or fine-tuning for Chain-of-Thought (CoT), this approach enables iterative latent space reasoning at inference, achieving improvements comparable to a 50B parameter model despite having only 3.5B parameters. Key insights include:

  • Recurrent test-time computation – The model unrolls a recurrent block at inference, running for an arbitrary number of steps, allowing more computational depth without modifying the input sequence. Unlike standard CoT methods, which externalize reasoning via tokens, this technique keeps reasoning in latent space, making it more efficient.

  • No need for CoT-specific training – Unlike CoT prompting or fine-tuning, this method doesn’t require specialized datasets. It works with standard pretraining corpora and generalizes across reasoning tasks.

  • Improved memory & compute efficiency – Latent reasoning allows the model to scale without increasing parameter count, requiring less memory than long-context transformers. Additionally, this method improves per-token adaptive compute, speculative decoding, and KV-cache sharing, making it highly efficient.

  • Scales like a 50B parameter model – Benchmarks show that with sufficient test-time recurrence, the model matches or surpasses much larger LLMs on complex reasoning tasks (ARC, GSM8K, OpenBookQA).

  • Emergent behaviors in latent space – Analysis reveals self-organizing computation patterns, such as latent-space orbits for numerical tasks and context-dependent “deliberation” on difficult queries, suggesting the model learns non-verbal cognitive strategies.

This approach adds a third axis to LLM scaling—beyond model size and context length—by focusing on test-time compute. It suggests that future models may reason in continuous latent space rather than rely solely on token-based reasoning, potentially unlocking new AI reasoning and efficiency frontiers.