-
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
An overview of our paper, SPIRAL - Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning. In this paper, we introduce SPIRAL, a framework where self-play on zero-sum games incentivizes models to develop reasoning capabilities by automatically selecting generalizable Chain-of-Thought patterns from pretrained LLMs. This framework demonstrates that competitive game dynamics drive the discovery of reasoning strategies that transfer to mathematical and general reasoning benchmarks, serving as an initial exploration toward integrating self-play into the LLM self-improvement pipeline.
-
TorchOpt: An Efficient Library for Differentiable Optimization
An overview for our NeurIPS 2022 Workshop OPT paper, TorchOpt - An Efficient Library for Differentiable Optimization. In this paper, we introduce TorchOpt, a PyTorch-based efficient library for differentiable optimization. This library provides unified and expressive differentiable optimization programming abstraction, and high-performance distributed execution runtime.