Lijun Wu is a Researcher in Shanghai AI Laboratory, an adjunct Ph.D. supervisor in Shanghai Jiao Tong University,Fudan University, Zhongguancun Academy. Previously, he was a Research Scientist in ByteDance Seed, a Senior Researcher in Microsoft Research. He was a member of joint Ph.D. program between Sun Yat-sen University and MSRA, advised by Dr. Tie-Yan Liu and Prof. Jianhuang Lai.

His research interests cover LLMs/MLLMs and AI4Science. His works are published in Nature Communications, Nature Machine Intelligence, TPAMI, NeurIPS, ICML, ICLR, ACL, KDD and so on, with near 10000 citations. He served as Evaluations and Datasets Track Chair for NeurIPS 2026. He has also served as AC in ICML, ICLR, NeurIPS, KDD, ACL, EMNLP, NAACL and so on. He has received numerous prestigious awards, including the 2018 MSRA Ph.D. Fellowship. He secured 8 championships in the WMT2019 Competition. He led team to create the BioT5 series of multimodal biomolecular models, with more than 300k downloads, winning 1st and 2nd place in the ACL 2024 Language+Molecule Shared Task. His directed to develop OpenDataArena, the first LLM post-training data value benchmarking platform in the world. His team secured 2nd place in the 2025 NeurIPS CURE-Bench Internal Reasoning Competition. His research innovations have been translated into practical products. Notably, R-Drop was deployed in Microsoft Translator across over 20 translation tasks and was widely used in business scenarios at companies like Tencent, Baidu, Meituan. Furthermore, he participated in the development of the world’s first Chinese-English translation system to achieve human parity in 2018.

πŸ“„ Download CV (PDF)

We are hiring AI researchers working on LLM/MLLM and AI4Science, contact me if you are interested!

πŸ”₯ News

  • 2026.3 We release the second version of OpenDataArena-Scored-Data, for researchers to work on data-centric research.
  • 2026.2 πŸŽ‰ Invited to serve as Evaluations and Datasets Track Chair for NeurIPS-2026!
  • 2026.2 πŸ”₯ We release MMFineReason! Our 4B VLM model achieves 30B model performance! The superior reasoning dataset MMFineReason-1.8M is also released, which has been on the HuggingFace Datasets Trending Top 2! See the tech report.
  • 2026.2 Invited to serve as Area Chair for KDD-2026.
  • 2026.1 We introduce SciGenBench and ImgCoder for scientific image synthesis aim to accelerating the understanding and reasoning of VLMs for scientific visual tasks.
  • 2026.1 3 papers are accepted by ICLR-2026, including long context modeling, mixture of SFT data composition, and LLM for scientific reasoning.
  • 2026.1 πŸ”₯ We release ChartVerse-1.8M dataset for strong Chart reasoning, which has been on the HuggingFace Datasets Trending Top 1! Also see the tech report.
  • 2026.1 πŸ”₯ The datasets ODA-Math-460k and ODA-Mixture-100k/500k have been on the HuggingFace Datasets Trending Top 2!
  • 2026.1 πŸ”₯ We release the superior datasets ODA-Math-460k and ODA-Mixture-100k/500k, created by the guidence from OpenDataArena. See the report for details.
  • 2026.1 We have updated OpenDataArena-Tool with multimodal data training and evaluation support.
  • 2025.12 πŸ”₯ We release OpenDataArena, the first benchmarking platform for post-training data! See our tech report.
  • 2025.11 πŸŽ‰ Congratulations to my students, Qizhi Pei, Yi Duan, Honglin Lin, Yu Li and Xin Gao, achieve 2nd place in Internal Reasoning Track of CURE-Bench@NeurIPS2025!

πŸ’» Projects & Models & Datasets

  • BioT5/BioT5+, , Hugging Face, multimodal biomolecular foundation models, HF models > 300k downloads!
  • OpenDataArena, , Hugging Face, the first Arena for post-training data value benchmarking in the world. The OpenDataArena-Scored-Data have >20k downloads!
  • MMFineReason-2.3M/1.8M, Hugging Face, high-quality reasoning dataset for VLM, HF datasets Trending Top 2, >20k downloads!
  • ChartVerse-SFT-1.8M/600k, Hugging Face, synthetic chart QA dataset for chart reasoning, HF datasets Trending Top 1, 10k downloads!
  • ODA-Math-460k, Hugging Face, strong math reasoning dataset for LLM, HF datasets Trending Top 2, >10k downloads!
  • ODA-Mixture-100k/500k, Hugging Face, strong general reasoning dataset for LLM, HF datasets Trending Top 2, >15k downloads!
  • InternVL, , a series of leading VLM models developed by Shanghai AI Laboratory.

πŸ“ƒ Repos

πŸ“ Selected Publications

⭐️ LLM/VLM

πŸ”¬ AI4Science

⌨️ AI

πŸŽ– Selected Honors and Awards

πŸ“– Experience

  • 2024.08-Now, Young Scientist, Shanghai Artificial Intelligence Laboratory
  • 2024.05-2024.08, Research Scientist, ByteDance Seed
  • 2022.07-2024.05, Senior Researcher, MSR AI4Science
  • 2020.6-2022.07, Senoir Researcher, MSRA
  • 2014.07-2020.06, Research Intern, MSRA

πŸ’¬ Academic Services

  • PC: Evaluations and Datasets Track Chair for NeurIPS-26
  • AC: KDD-26, ICML-26, ICLR-26, NeurIPS-25, ACL-21/22/23/24/25, EMNLP-23/24/25, NNACL-22/23/24/25, EACL-24, COLING-23, ARR-21/22/23/24/25
  • SPC: AAAI-22/23/24/25/26, IJCAI-21
  • Conference reviewers: ICLR, ICML, NeurIPS, AAAI, IJCAI, ACL, CVPR, EMNLP, KDD, NAACL, COLING, EACL, AACL
  • Journal reviewers: TPAMI, TASLP, KBS, Neurocomputing, CSL