Leon Liangyu Chen   
Hi! 👋
I'm Leon, a first-year Computer Science Ph.D. student at Stanford University. I am fortunate to work with Prof. Nick Haber, Prof. Ludwig Schmidt, and Prof. Serena Yeung for the rotations.
Before Ph.D., I did my undergrad and worked at Nanyang Technological University, advised by Prof.
Ziwei Liu. I've also worked with Prof. Alan Yuille and Dr. Zongwei Zhou at Johns Hopkins University.
My research focuses on developing scalable frameworks for training and deploying AI agents. I work on multimodal reasoning agents, data-efficient training methodologies, data collection pipelines, and practical architectures that enable agents to perform complex reasoning tasks. My interests span from foundational model development to building robust pipelines for agent training and evaluation.
I'm passionate about bridging the gap between research and real-world applications of reasoning systems. Whether you're working on agent architectures, exploring collaborative research opportunities, or developing practical reasoning applications, I'd love to connect and discuss potential synergies. You can schedule a meeting.
Email  / 
Google
Scholar  / 
Semantic
Scholar  / 
Github  / 
Linkedin
|
|
Publications (Selected Publications)
|
|
Open Thoughts
Open Thoughts Team
Summary: The first open-source model trained on public reasoning data to match DeepSeek-R1-Distill's performance through 1000+ systematic data curation/synthesis experiments.
|
|
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Alejandro Lozano*, Min Woo Sun*, James Burgess*, Liangyu Chen, Jeffrey J Nirschl, Jeffrey Gu, Ivan Lopez, Josiah Aklilu, Austin Wolfgang Katzer, Collin Chiu, Anita Rau, Xiaohan Wang, Yuhui Zhang, Alfred Seunghoon Song, Robert Tibshirani, Serena Yeung-Levy
Computer Vision and Pattern Recognition (CVPR), 2025
Summary: A large categorized dataset with 24+ million biomedical image-text pairs across multiple domains with expert-guided annotations.
|
|
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement
Simon Yu*, Liangyu Chen*, Sara Ahmadian, Marzieh Fadaee
ICML Workshop on DataWorld: Unifying Data Curation Frameworks Across Domains
, 2025
Summary: Prioritizing global data diversity over local instance quality for instruction data selection using k-means clustering iteratively.
|
|
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Yubo Ma*, Yuhang Zang*, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun
Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2024 (Spotlight)
Summary: A challenging long-context multi-modal document understanding benchmark with 1,062 expert-annotated questions requiring cross-page reasoning.
|
|
MMInA: Benchmarking Multihop Multimodal Internet Agents
Shulin Tian*, Ziniu Zhang*, Liangyu Chen*, Ziwei Liu
Association for Computational Linguistics (ACL) Findings, 2025
Summary: The first benchmark evaluating AI agents on evolving real-world websites with 1,050 human-written multihop tasks requiring long-range reasoning.
|
|
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Aurora-M Team
International Conference on Computational Linguistics (COLING) Industry Track, 2025
Summary: An open-source multilingual model fine-tuned on human-reviewed safety instructions aligned with the Biden-Harris AI safety executive order.
|
|
Benchmarking and Analyzing Generative Data for Visual Recognition
Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Summary: An extensive benchmark systematically analyzing generative data across visual recognition tasks with a novel CLER Score metric.
|
|
Otter: A Multi-Modal Model with In-Context Instruction Tuning
Bo Li*, Yuanhan Zhang*, Liangyu Chen*, Jinghao Wang*, Fanyi Pu*, Joshua Adrian Cahyono, Jingkang Yang, Ziwei Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Summary: The first large-scale multi-modal instruction tuning dataset with 2.8 million instruction-response pairs derived from images and videos. A model is trained on this dataset to achieve state-of-the-art performance on vision-language tasks.
|
|
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen*, Bo Li*, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, Ziwei Liu
Neural Information Processing Systems (NeurIPS), 2023
ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023
Summary: The first multimodal agent study to use natural language as communication medium for LLMs to coordinate multiple vision-language models for complex reasoning.
|
|
Panoptic Video Scene Graph Generation
Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo,
Liangyu Chen, Bo Li, Zheng Ma, Wayne Zhang, Kaiyang Zhou, Chen Change Loy, Ziwei Liu
Computer Vision and Pattern Recognition (CVPR), 2023
Summary: Extended scene graph generation from static images to dynamic videos with unified panoptic understanding of objects and stuff.
|
|
Making Your First Choice: To Address Cold Start Problem in Vision Active Learning
Liangyu Chen, Yutong Bai, Siyu Huang, Yongyi Lu, Bihan Wen, Alan Yuille, Zongwei Zhou
Medical Imaging with Deep Learning (MIDL), 2023
Radiological Society of North America (RSNA), Abstracts, 2024
NeurIPS Workshop on Human in the Loop Learning, 2022
Summary: A systematic approach to solving the cold start problem in vision active learning using self-supervised contrastive features without labels.
|
|
Automatic Calcification Morphology and Distribution Classification for Breast Mammograms with Multi-task Graph Convolutional Neural Network
Hao Du, Melissa Min-Szu Yao, Siqi Liu,
Liangyu Chen, Wing P. Chan, Mengling Feng
Journal of Biomedical and Health Informatics (JBHI), 2023
Summary: Modeling spatial and visual relationships among calcifications using graph convolutional networks for breast cancer diagnosis.
|
|
Baconian: A Unified Open-source Framework for Model-Based Reinforcement Learning
Linsen Dong, Guanyu Gao, Xinyi Zhang,
Liangyu Chen, Yonggang Wen
arXiv preprint, 2020
Summary: A unified open-source framework specifically designed for model-based reinforcement learning research with modular components.
|
I love teaching AI and was honored to coach Singapore's teams for the first-ever International Olympiad in Artificial Intelligence in 2024. Our two teams excelled on the global stage, securing two of the four gold medals awarded in the Scientific Round.
Reviewer for The Visual Computer, IET Computer Vision, IJCV, TMLR, NeurIPS 2025/23, ICLR 2025, COLM 2025, AAAI 2025, ECCV 2024, CVPR 2024, MLHC 2025, ICCV CVAMD 2023, CVPR CVinW 2023, ICML IMLH 2023/22, NeurIPS GenAI4Health 2024.
I host virtual office hours for anyone who wants to share thoughts on AI research, reading (either scientific or not), grad school applications/support, or any other topics of interest. Please schedule via my calendar.
|