Lijun Wu is a Researcher in Shanghai AI Laboratory. Previously, he was a Research Scientist in ByteDance, a Senior Researcher in Microsoft Research. He got the Ph.D. degree from Sun Yat-sen University (SYSU), and was a member of joint Ph.D. program between SYSU and MSRA, advised by Dr. Tie-Yan Liu and Prof. Jianhuang Lai.

His research interests cover LLMs/MLLMs (e.g., data-centric AI, SFT/RL), AI4Science (e.g., LLM4Science, scientific reasoning). His works are published in top conferences and journals, such as Nature Communications, Nature Machine Intelligence, TPAMI, NeurIPS, ICML, ICLR, ACL, KDD and so on, with more than 8500+ citations. He has served as AC/SPC in top conferences, e.g., ICML, ICLR, NeurIPS, ACL, EMNLP, NAACL, AAAI, IJCAI and so on. He has received numerous prestigious awards, including the 2018 MSRA Ph.D. Fellowship. He secured 8 championships in the WMT2019 Competition. He led team to develop the BioT5 series of multimodal biomolecular models, the models have more than 240k downloads, winning 1st and 2nd place in the ACL 2024 Language+Molecule Shared Task. He guided students to secure 2nd place in the 2025 NeurIPS CURE-Bench Internal Reasoning Competition. His research innovations have been translated into practical products. Notably, R-Drop was deployed in Microsoft Translator across over 20 translation tasks and was widely used in business scenarios at companies like Meituan. Furthermore, he participated in the development of the world’s first Chinese-English translation system to achieve human parity in 2018.

πŸ“„ Download CV (PDF)

We are hiring AI researchers working on LLM/MLLM and AI4Science, contact me if you are interested!

πŸ”₯ News

  • 2026.1 πŸŽ‰ We introduce SciGenBench and ImgCoder for scientific image synthesis aim to accelerating the understanding and reasoning of VLMs for scientific visual tasks.
  • 2026.1 πŸ”₯πŸ”₯πŸ”₯ We released ChartVerse-1.8M dataset for strong Chart reasoning, which has been on the HuggingFace Datasets Trending Top 1. Also see the tech report.
  • 2026.1 πŸ”₯πŸ”₯πŸ”₯ The datasets ODA-Math-460k and ODA-Mixture-100k/500k have been on the HuggingFace Datasets Trending Top 2.
  • 2026.1 πŸŽ‰ We release the superior datasets ODA-Math-460k and ODA-Mixture-100k/500k, created by the guidence from OpenDataArena. See the report for details.
  • 2026.1 πŸŽ‰ We have updated OpenDataArena-Tool with multimodal data training and evaluation support, you can easily benchmark your multimodal datasets with VLMs.
  • 2025.12 πŸŽ‰ We release OpenDataArena, the first open, fair, transparent benchmarking platform for post-training data! Also see our tech report.
  • 2025.11 πŸŽ‰ Congratulations to my students achieve 2nd place in Internal Reasoning Track of CURE-Bench@NeurIPS2025!
  • 2025.11 Invited to serve as Area Chair for ICML-2026.
  • 2025.11 Our Mol-StruTok is accepted by KDD-2026, a novel tokenization framework for 3D molecule structures.
  • 2025.9 Our Caco is accepted by NeurIPS-2025, which aims to scaling the reasoning data by code-assisted verfications.
  • 2025.8 3 papers are accepted by EMNLP-2025,topics cover math reasoning and advanced data synthesis. Check CFT, MetaLadder, Middo.
  • 2025.8 Invited to serve as Area Chair for ICLR-2026.

πŸ’» Open-source Projects

  • OpenDataArena , Hugging Face, a fair, open, and transparent Arena for data value benchmarking.
  • InternVL , a series of leading VLM models developed by Shanghai AI Laboratory.

πŸ“ƒ Surveys/Repos

πŸ“ Selected Publications

⭐️ LLM/MLLMs

πŸ”¬ AI4Science

⌨️ AI

πŸŽ– Honors and Awards

πŸ“– Experience

  • 2025.08-Now, Young Scientist, Shanghai Artificial Intelligence Laboratory
  • 2024.05-2024.08, Research Scientist, ByteDance,
  • 2022.07-2024.05, Senior Researcher, MSR AI4Science
  • 2020.6-2022.07, Senoir Researcher, MSRA
  • 2014.07-2020.06, Research Intern, MSRA

πŸ’¬ Academic Services

  • AC: ICML-2026, ICLR-26, NeurIPS-25, ACL-21/22/23/24/25, EMNLP-23/24/25, NNACL-22/23/24/25, EACL-24, COLING-23, ARR-21/22/23/24/25
  • SPC: AAAI-22/23/24/25/26, IJCAI-21
  • Conference reviewers: ICLR, ICML, NeurIPS, AAAI, IJCAI, ACL, CVPR, EMNLP, KDD, NAACL, COLING, EACL, AACL
  • Journal reviewers: TPAMI, TASLP, KBS, Neurocomputing, CSL