LeVanLoi miscellaneous articles

  1. Trang chủ
  2. Lưu
  3. Thẻ
  4. Hỏi - Đáp

 
 
LeVanLoi'log, ⌚ 2025-02-17
***
Competitive Programming with Large Reasoning Models
Tác giả: Lê Văn Lợi sưu tầm
OpenAI: Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou

Link: Competitive Programming with Large Reasoning Models

Abs:

We show that reinforcement learning applied to large language models (LLMs) significantly boosts performance on complex coding and reasoning tasks. Additionally, we compare two general-purpose reasoning models - OpenAI o1 and an early checkpoint of o3 - with a domain-specific system, o1-ioi, which uses hand-engineered inference strategies designed for competing in the 2024 International Olympiad in Informatics (IOI). We competed live at IOI 2024 with o1-ioi and, using hand-crafted test-time strategies, placed in the 49th percentile. Under relaxed competition constraints, o1-ioi achieved a gold medal. However, when evaluating later models such as o3, we find that o3 achieves gold without hand-crafted domain-specific strategies or relaxed constraints. Our findings show that although specialized pipelines such as o1-ioi yield solid improvements, the scaled-up, general-purpose o3 model surpasses those results without relying on hand-crafted inference heuristics. Notably, o3 achieves a gold medal at the 2024 IOI and obtains a Codeforces rating on par with elite human competitors. Overall, these results indicate that scaling general-purpose reinforcement learning, rather than relying on domain-specific techniques, offers a robust path toward state-of-the-art AI in reasoning domains, such as competitive programming.

---

[Top papers of the week]

OpenAI’s latest study puts a specialized coding AI against a scaled-up general model on competitive programming challenges to explore efficiency vs. specialization. Key findings:

  • Generalist vs. specialist: A tailored model (o1-ioi) with hand-crafted strategies for coding competitions achieved decent results (placing ~50th percentile at IOI 2024 with some relaxed competition constraints). However, a larger, general-purpose model (o3) attained gold medal-level performance without any domain-specific tricks.

  • Reinforcement learning payoff: Both models were improved via RL fine-tuning, but the scaled general model outperformed the expert pipeline, solving programming tasks at a level comparable to elite human coders (even matching top human ratings on Codeforces).

  • Efficiency through scale: The results suggest that investing compute in a bigger, broadly-trained transformer can yield greater efficiency and performance than building task-specific optimizations. In other words, scaling up a model’s reasoning ability can supersede manual efficiency tweaks for complex tasks.

  • Implication: For difficult reasoning tasks like coding, a single large model with sufficient training can simplify deployment (no custom inference routines needed) and still beat highly optimized specialist systems, pointing toward a trend of “scale over special-case” in transformer design.