Lijun Wu

shlab

About me

Keep Learning & Be Positive!

Lijun Wu is a Young Scientist/Researcher in Shanghai AI Laboratory. Previously, he was a Research Scientist in ByteDance, a Senior Researcher in Microsoft Research. He got the Ph.D. degree from Sun Yat-sen University (SYSU), School of Data and Computer Science, and was a member of joint Ph.D. program between SYSU and MSRA, advised by Dr. Tie-Yan Liu and Prof. Jianhuang Lai. He was honored to be awarded with MSRA Ph.D. Fellowship. His team has won 8 champions in WMT19 machine translation competition.

His research interests are on AI/Large Language Model (e.g., AI4data, SFT, post-training), AI4Science (e.g., LLM4Science, Drug Discovery). He has rich experiences on neural machine translation. He has published many papers in top conferences and journals, such as ICLR, NeurIPS, ACL, TPAMI. He has served as AC/SPC in toptier conferences, e.g., ACL, EMNLP, NAACL, AAAI, IJCAI and so on.

We are hiring AI researchers working on LLM and AI4Science, drop me an email if you are interested!

Highlights

MSRA Ph.D. Fellowship in 2018 (12 in Asia-Pacific Universities)
First human parity achieved in Chinese-English machine translation system in 2018
WMT 2019 champion in 8 translation directions
OGB-LSC@KDD cup 2021 Runner up
1st/2nd in Language+Molecule@ACL2024 shared tasks!
(Tech Transfer) R-Drop successfully applied over 20+ language translations in Microsoft Translator, Meituan products
(Tech Transfer) CTRec successfully applied in Tencent news recommendation

News

🔥2024.11 Our TamGen is accepted by Nature Communications!
🔥2024.7 Super excited that our BioT5+ achieves 1st/2nd in Language+Molecule@ACL2024 shared tasks!
🔥2024.3 We have released an AI4Science Research Project page with multiple different research projects, check it if you are interested!
2025.1 Three papers are accepted by ICLR-2025, including FABFlex, the extension of FABind/FABind+ to the flexible docking scenario; 3D-MolT5, the extension of BioT5/BioT5+ in 3D molecular space .
2025.1 One paper about LLM Hallucination is accepted by NAACL-2025 Findings.
2024.12 Our FABind+, a much stronger extension of FABind is accepted by KDD-2025.
2024.10 Congratulation that Hot Pluggable Federated Learning is selected as Outstanding Student Paper Award by the FL@FM-NeurIPS’24 workshop!
2024.10 I am honered to serve as Area Chair for AAAI-2025 AI4Science workshop.
2024.9 One paper about protein sequence representation learning is accepted by EMNLP-2024.
2024.9 One paper about quantum hamiltonian prediction is accepted by NeurIPS-2024.
2024.9 One paper about federated learning is accepted as Oral presentation by the FL@FM-NeurIPS’24 workshop!
2024.7 Our solution about the champion in Language+Molecule@ACL2024 shared task is accepted as Oral presentation!
2024.7 Our kNN-DTA about drug-target affinity prediction is accepted by CIKM-2024.

Surveys/Reports

🔥2024.3 We have released a comprehensive survey about Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey. Check it!
🔥2023.11 We have released a report on Large Language Models (GPT-4) on Scienctific Discovery, check it!
🔥2022.4 We have released a comprehensive survey about Non-Autoregressive Generation for Neural Machine Translation and Beyond. Check it!

Awesome Repos

Selected Research

AI/LLM
- R-Drop, UniDrop (unified dropout) (dropout and sub-model consistency)
- RL4NMT (the first RL for NMT survey, early version of RLHF), BERT-NMT (BERT for NMT), Mono-NMT (the first large scale monolingual data for NMT)
AI/LLM4Science
- BioT5, BioT5+, 3D-MolT5 (pre-trained LLLM for bio-chemistry)
- FABind, FABind+, FABFlex (Fast and Accurate for Protein-Ligand Binding)