Publications

(*) denotes equal contribution.

2026

  1. Preprint
    Reasoning over mathematical objects: on-policy reward modeling and test time aggregation
    Pranjal Aggarwal, Marjan Ghazvininejad, Seungone KimIlia KulikovJack LanchantinXian Li, Tianjian Li, Bo Liu, Graham Neubig, Anaelia Ovalle, Swarnadeep Saha, Sainbayar Sukhbaatar, Sean Welleck, Jason Weston, Chenxi Whitehouse, Adina Williams, Jing Xu, Ping Yu, Weizhe Yuan, Jingyu Zhang, and Wenting Zhao
    ArXiv preprint, 2026.
  2. Preprint
    From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
    Yuqiao Tan, Minzheng Wang, Bo LiuZichen Liu, Tian Liang, Shizhu He, Jun Zhao, and Kang Liu
    ArXiv preprint, 2026.

2025

  1. Preprint
    SPICE: Self-Play In Corpus Environments Improves Reasoning
    ArXiv preprint, 2025.
  2. ICML
    Agent Learning via Early Experience
    The 43rd International Conference on Machine Learning, 2026.
  3. ICLR
    Scaling Agent Learning via Experience Synthesis
    The 14th International Conference on Learning Representations, 2026.
  4. ICLR
    GEM: A Gym for Agentic LLMs
    Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Simon Yu, Xiangxin Zhou, Haotian Xu, Shaopan Xiong, Bo Liu, Chenmien Tan, Chuen Yang Beh, Weixun Wang, Hao Zhu, Weiyan Shi, Diyi Yang, Michael Shieh, Yee Whye Teh, Wee Sun Lee, and Min Lin
    The 14th International Conference on Learning Representations, 2026.
  5. Preprint
    BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
    Terry Yue Zhuo, Xiaolong Jin, Hange Liu, Juyong Jiang, Tianyang Liu, Chen Gong, Bhupesh Bishnoi, Vaisakhi Mishra, Marek Suppa, Noah Ziems, Saiteja Utpala, Ming Xu, Guangyu Song, Kaixin Li, Yuhan Cao, Bo Liu, Zheng Liu, Sabina Abdurakhmanova, Wenhao Yu, Mengzhao Jia, Jihan Yao, Kenneth Hamilton, Kumar Shridhar, Minh Chien Vu, Dingmin Wang, Jiawei Liu, Zijian Wang, Qian Liu, Binyuan Hui, Meg Risdal, Ahsen Khaliq, Atin Sood, Zhenchang Xing, Wasi Uddin Ahmad, John Grundy, David Lo, Banghua Zhu, Xiaoning Du, Torsten Scholak, and Leandro Werra
    ArXiv preprint, 2025.
  6. Preprint
    The Era of Real-World Human Interaction: RL from User Conversations
    ArXiv preprint, 2025.
  7. ICLR
    Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
    The 14th International Conference on Learning Representations, 2026.
  8. Preprint
    LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
    Xiyao Wang, Chunyuan Li, Jianwei YangKai ZhangBo Liu, Tianyi Xiong, and Furong Huang
    ArXiv preprint, 2025.
  9. ICLR
    SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
    The 14th International Conference on Learning Representations, 2026.
  10. Preprint
    TextArena
    ArXiv preprint, 2025.
  11. ICLR Workshop
    Natural Language Reinforcement Learning
    The 13th International Conference on Learning Representations SSI-FM Workshop, 2025.

2024

  1. AAAIOral
    Differentiable Information Enhanced Model-Based Reinforcement Learning
    Xiaoyuan Zhang, Xinyan Cai, Bo Liu, Weidong Huang, Song-Chun Zhu, Siyuan Qi, and Yaodong Yang
    The 39th Annual AAAI Conference on Artificial Intelligence, 2025.
    Oral Presentation
  2. Preprint
    DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
    Huajian XinDaya GuoZhihong ShaoZhizhou RenQihao ZhuBo LiuChong Ruan, Wenda Li, and Xiaodan Liang
    ArXiv preprint, 2024.
  3. Preprint
    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
    ArXiv preprint, 2024.
  4. Preprint
    DeepSeek-VL: Towards Real-World Vision-Language Understanding
    ArXiv preprint, 2024.
  5. RA-L & IROSOral
    Grasp Multiple Objects With One Hand
    The IEEE Robotics and Automation Letters, 2024. Abridged in the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024.
    Oral Presentation
  6. ICLR
    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
    The 13th International Conference on Learning Representations, 2025.
  7. Preprint
    DeepSeek-LLM: Scaling Open-Source Language Models with Longtermism
    Xiao BiDeli ChenGuanting ChenShanhuang ChenDamai DaiChengqi DengHonghui DingKai DongQiushi Du, Zhe Fu, Huazuo GaoKaige GaoWenjun GaoRuiqi GeKang GuanDaya GuoJianzhong GuoGuangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan HuangErhang LiGuowei LiJiashi Li, Yao Li, Y. K. Li, Wenfeng LiangFangyun Lin, Alex X. Liu, Bo LiuWen LiuXiaodong Liu, Xin Liu, Yiyuan Liu, Haoyu LuShanghao LuFuli LuoShirong MaXiaotao NieTian Pei, Yishi Piao, Junjie QiuHui QuTongzheng Ren, Zehui Ren, Chong Ruan, Zhangli Sha, Zhihong ShaoJunxiao Song, Xuecheng Su, Jingxiang SunYaofeng SunMinghui TangBingxuan WangPeiyi WangShiyu Wang, Yaohui Wang, Yongji Wang, Tong Wu, Y. Wu, Xin Xie, Zhenda Xie, Ziwei Xie, Yiliang Xiong, Hanwei Xu, R. X. Xu, Yanhong Xu, Dejian Yang, Yuxiang You, Shuiping Yu, Xingkai Yu, B. ZhangHaowei ZhangLecong ZhangLiyue ZhangMingchuan ZhangMinghua ZhangWentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan ZhouShunfeng ZhouQihao Zhu, and Yuheng Zou
    ArXiv preprint, 2024.

2023

  1. JMLR
    TorchOpt: An Efficient Library for Differentiable Optimization
    The Journal of Machine Learning Research, 2023. Abridged in the 36th Conference on Neural Information Processing Systems OPT Workshop, 2022.
    PyTorch Ecosystem Project

2022

  1. NeurIPS
    A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
    The 36th Conference on Neural Information Processing Systems, 2022.
  2. NeurIPS
    EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
    The 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.

2021

  1. NeurIPS
    Neural Auto-Curricula in Two-Player Zero-Sum Games
    The 35th Conference on Neural Information Processing Systems, 2021.

2020

  1. AAMASOral
    Learning Correlated Communication Topology in Multi-Agent Reinforcement learning
    The 20th International Conference on Autonomous Agents and Multiagent Systems, 2021.
    Oral Presentation
Last updated: May 23, 2026.