About me
I serve as an algorithm expert at Alibaba’s Tongyi Laboratory. I received my Ph.D. from Fudan University in 2023, under the supervision of Professor Jianqing Fan and Professor Zhongyu Wei. Before that, I received my Bachelor’s degree in Mathematics and Applied Mathematics from Fudan University. During my Ph.D., I also spent time with MReaL (Machine Reasoning and Learning) at Nanyang Technological University, advised by Hanwang Zhang.
My primary research interests lie in Vision-Language Reasearch including large vision-language model and multi-modal agents. Currently, I focus on the Qwen-VL.
We are hiring self-motivated research interns about Video LLM, Multi-Modal Agent and Embodied AI. Feel free to contant me.
Professional Service
- Area Chair: ACL 2023/2024, EMNLP 2024
- Reviewer: NeurIPS, ICML, ICLR, ACL, EMNLP, IJCAI, AAAI.
Honors and Awards
- Alibaba Star (Top 1%), Alibaba Group, 2023
- ByteDance Scholarship (10 in China), 2021
- National Scholarship for Doctoral Students (Top 1%) in Fudan University, Ministry of Education, 2019
Preprint
Qwen2-VL Technical Report
Core Contributor.
Technical Report. [code]Qwen2 Technical Report
Technical Report. [code]AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis
Zhihao Fan, Jialong Tang, Wei Chen, Siyuan Wang, Zhongyu Wei, Jun Xie, Fei Huang, Jingren Zhou.
Arxiv. [code]Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei.
Arxiv. [code]
Selected Publications
(Full list see Google Scholar)
Reform-Eval: Evaluating Large Vision Language Models via Unified Re-formulation of Task-Oriented Benchmarks
Zejun Li, Ye Wang, Mengfei Du, Qingwen Liu, Binhao Wu, Jiwen Zhang, Chengxing Zhou, Zhihao Fan, Jie Fu, Jingjing Chen, Xuanjing Huang, Zhongyu Wei.
ACMMM 2024. [code]AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Tong Wu*, Zhihao Fan* (Equal Contribution), Xiao Liu, Hai-Tao Zheng, Yeyun Gong, Jian Jiao, Juntao Li, Zhongyu Wei, Jian Guo, Nan Duan, Weizhu Chen.
NeurIPS 2023. [code]Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training
Zejun Li, Zhihao Fan, Jingjing Chen, Qi Zhang, Xuanjing Huang, Zhongyu Wei.
ACM MM 2022. [code]Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Zhihao Fan, Zhongyu Wei, Zejun Li, Siyuan Wang, Jianqing Fan.
NAACL 2022 Findings. [code]MVP: Multi-Stage Vision-Language Pre-training via Multi-Level Semantic Alignment
Zejun Li, Zhihao Fan, Huaixiao Tou, Zhongyu Wei.
ACM MM 2022. [code]TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning
Zhihao Fan, Zhongyu Wei, Siyuan Wang, Ruize Wang, Zejun Li, Haijun Shan, Xuanjing Huang.
IJCAI 2021. [code]Mask Attention Networks: Rethinking and Strengthen Transformer
Zhihao Fan, Yeyun Gong, Dayiheng Liu, Zhongyu Wei, Siyuan Wang, Jian Jiao, Nan Duan, Xuanjing Huang.
NAACL 2021. [code]Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning
Zhihao Fan, Zhongyu Wei, Siyuan Wang, Xuanjing Huang.
ACL 2019. [code]A Reinforcement Learning Framework for Natural Question Generation Using Bi-Discriminators
Zhihao Fan, Yeyun Gong, Dayiheng Liu, Zhongyu Wei, Siyuan Wang, Jian Jiao, Nan Duan, Xuanjing Huang.
COLING 2018.A Question Type Driven Framework to Diversify Visual Question Generation.
Zhihao Fan, Zhongyu Wei, Piji Li, Yanyan Lan, Xuanjing Huang.
IJCAI 2018.
Experience
- Alibaba. Tongyi Lab. Sep 2023 - Present
- Algorithm Expert.
- Focus:
- Large Vision-Language Model
- Multimodal Agents
- Fudan University. DISC Lab. Oct 2016 - Sep 2023
- Graduate Research Assistant. Advisor: Zhongyu Wei
- Focus:
- Vision-Language Generation and Retrieval
- Vision-Language Pre-training
- Nanyang Technological University. MReal Lab. Oct 2022 - Apr 2023
- Visiting Scholar. Advisor: Hangwang Zhang
- Focus: Diffusion Model
- Microsoft Research Asia. NLC Group. Sep 2019 - Mar. 2020
- Research Intern. Advisor: Yeyun Gong
- Focus: Transformer Architecture