Bo Liu (Benjamin Liu)

Email: benjaminliu [dot] eecs [at] gmail [dot] com

prof.png

I am currently a Visiting Researcher at the University of Washington, advised by Prof. Natasha Jaques. My research interest lies in the intersection of Reinforcement Learning, Reasoning and Machine Learning Systems with their applications in complex, real-world environments.

I recently worked at Meta FAIR as a Research Scientist Intern with Jason Weston, working on scalable self-improvement with self-play for LLMs. Before that, I worked at DeepSeek as a Student Researcher on foundation models. I hope to build upon scaling laws with my research to create an autonomous decision-making system that can act intelligently in any unknown environment.

Before that, I worked as a Research Assistant with Prof. Jun Wang. I also had the privilege of working closely with Prof. Yaodong Yang. I received my B.S. in Machine Intelligence and B.A. in Economics from Peking University in 2020, where I was advised by Prof. Zongqing Lu.

I love playing soccer in my free time. I am also open to collaborating on exploring the potential of reinforcement learning across various fields. If you’re interested in discussing new ideas or collaborating, feel free to drop me an email or schedule a meeting with me here!

News

May 1, 2026 One paper Agent Learning via Early Experience accepted at ICML 2026.
Jan 26, 2026 Four papers SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning, Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play, GEM: A Gym for Generalist LLMs and Scaling Agent Learning via Experience Synthesis accepted at ICLR 2026.
Dec 2, 2025 Co-organizing the Workshop on Multi-Agent Learning and Its Opportunities in the Era of Generative AI accepted at ICLR 2026. Workshop scheduled for April 27, 2026.
Jul 5, 2025 Co-organizing the Workshop on Multi-Turn Interactions in Large Language Models accepted at NeurIPS 2025. Workshop scheduled for December 6, 2025.
Mar 8, 2025 One paper Natural Language Reinforcement Learning accepted at ICLR 2025 Workshop SSI-FM.
Jan 23, 2025 One paper DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search accepted at ICLR 2025.
Aug 15, 2024 One model DeepSeek-Prover-V1.5 has been released. This model enhances theorem proving in Lean 4 with state-of-the-art performance, including a 63.5% success rate on miniF2F. Available for research and commercial use.
May 23, 2024 One model DeepSeek-Prover achieves new state-of-the-art results in theorem proving, leveraging large-scale synthetic data to outperform GPT-4 on benchmarks like miniF2F and FIMO. Available for research and commercial use.
May 7, 2024 One model DeepSeek-V2, a 236B Mixture-of-Experts language model, has been released. It offers stronger performance with 42.5% lower training costs and 93.3% reduced KV cache. Available for research and commercial use.
Mar 8, 2024 One model DeepSeek-VL, a state-of-the-art Vision-Language model designed for real-world applications, is now available. Supports multimodal tasks like logical diagrams, scientific literature, and more. Released for research and commercial use.

Selected Publications

  1. Preprint
    SPICE: Self-Play In Corpus Environments Improves Reasoning
    ArXiv preprint, 2025.
  2. ICML
    Agent Learning via Early Experience
    The 43rd International Conference on Machine Learning, 2026.
  3. ICLR
    Scaling Agent Learning via Experience Synthesis
    The 14th International Conference on Learning Representations, 2026.
  4. Preprint
    The Era of Real-World Human Interaction: RL from User Conversations
    ArXiv preprint, 2025.
  5. ICLR
    Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
    The 14th International Conference on Learning Representations, 2026.
  6. ICLR
    SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
    The 14th International Conference on Learning Representations, 2026.
  7. Preprint
    TextArena
    ArXiv preprint, 2025.
  8. ICLR Workshop
    Natural Language Reinforcement Learning
    The 13th International Conference on Learning Representations SSI-FM Workshop, 2025.
  9. Preprint
    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
    ArXiv preprint, 2024.
  10. Preprint
    DeepSeek-VL: Towards Real-World Vision-Language Understanding
    ArXiv preprint, 2024.
  11. RA-L & IROSOral
    Grasp Multiple Objects With One Hand
    The IEEE Robotics and Automation Letters, 2024. Abridged in the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024.
    Oral Presentation
  12. ICLR
    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
    The 13th International Conference on Learning Representations, 2025.
  13. JMLR
    TorchOpt: An Efficient Library for Differentiable Optimization
    The Journal of Machine Learning Research, 2023. Abridged in the 36th Conference on Neural Information Processing Systems OPT Workshop, 2022.
    PyTorch Ecosystem Project
  14. NeurIPS
    A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
    The 36th Conference on Neural Information Processing Systems, 2022.
  15. NeurIPS
    EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
    The 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  16. AAMASOral
    Learning Correlated Communication Topology in Multi-Agent Reinforcement learning
    The 20th International Conference on Autonomous Agents and Multiagent Systems, 2021.
    Oral Presentation
Last updated: May 11, 2026.