Xiaojun Wan

Professor

Wangxuan Institute of Computer Technology
Peking University
Office: Room 210, No.128 Zhongguancun North Street, Haidian District, Beijing 100080, China
Phone: (86) 10-82529548
Email: wanxiaojun AT pku DOT edu DOT cn


I am a Professor with Wangxuan Institute of Computer Technology (WICT, formerly known as Institute of Computer Science and Technology), Peking University (PKU), China. I received B.S. in Information Sciences from Department of Information Management of PKU in 2000, M.S. and Ph.D. in Computer Science from Department of Computer Science and Technology of PKU in 2003 and 2006 respectively.

Research

My research interests include Natural Language Processing and Deep Learning. Previously, I was broadly interested in several research topics including document summarization, text generation, sentiment analysis, semantic parsing, multilingual and multimodal NLP. Recently, I am interested in exploring NLG evaluation, faithfulness and safety of LLMs, and cross-modal generation.

Publications

Honors and Awards

  • WangXuan Outstanding Young Scholar Award, 2022
  • Tsang Hin-chi Faculty Fellowship Award, 2019
  • PKU Teaching Excellence Award, 2019
  • PKU Industry-Academy-Research (ChanXueYan) Cooperation Award, 2019
  • Distinguished Paper Award, IJCAI 2018
  • Outstanding Paper Award, ACL 2017
  • PKU Yang&Wang Faculty Fellowship Award, 2018
  • CCF NLPCC Distinguished Young Scientist, 2017
  • CAAI WUWENJUN Technical Innovation Award, 2017
  • IBM Faculty Award, 2015
  • PKU Baosteel Faculty Fellowship Award, 2013
  • PKU WangXuan Young Investigator Award, 2010
  • Best Reviewer, EMNLP 2010
  • Excellent Doctoral Dissertation of PKU, 2009
  • PKU P&G Faculty Fellowship Award, 2008

Selected Academic Services

  • Action Editor: TACL (2019-), ACL Rolling Review (2021-)
  • Editorial Board Member: Journal of Computer Science and Technology (2020-2023), Natural Language Engineering (2019-), Computational Linguistics (2016-2018)
  • PC Chair: EMNLP-IJCNLP 2019
  • Area Chair: IJCAI 2024, ICLR 2024, NeurIPS 2023, ACL 2023, EMNLP 2022 (SAC), EMNLP 2021 (SAC), ACL 2021, NAACL 2021, IJCAI 2021, EACL 2021, EMNLP 2020 (SAC), ACL 2020, AACL/IJCNLP 2020, ACL 2019, EMNLP 2018, NAACL-HLT 2018, IJCNLP 2017, NAACL-HLT 2016, ACL-IJCNLP 2015, ACL 2011, IJCNLP 2011
  • Senior PC Member(SPC): IJCAI 2016/2018~2020/2022~2023, AAAI 2019~2023
  • Best Paper Committee: IJCNLP-AACL 2023, ACL 2023, EMNLP 2015
  • Workshop Organizer: ACL Workshop on Multilingual Modeling (MM-2012)
  • Organization Chair: NLPCC 2020~2023
  • Evaluation Chair: NLPCC 2012~2018

Shared Tasks, Evaluations and Datasets

  • CSS: A New Dataset for Sentence Simplification in Chinese (paper, data)
  • CC-Riddle: A Question Answering Dataset of Chinese Character Riddles (paper, data)
  • BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles (paper, data)
  • CodeQA: A free-form question answering dataset for source code comprehension (paper, data)
  • ParaSCI: A large-scale paraphrase dataset in the scientific field (paper, data)
  • Single Document Summarization (Chinese) @ NLPCC 2017 (link, overview paper, data)
  • Sports News Generation from Live Webcast scripts @ NLPCC-ICCPOL 2016 (link, overview paper, data)
  • Weibo-Oriented Chinese News Summarization @ NLPCC 2015 (link, overview paper, data)
  • Cross-Lingual Sentiment Classification @ NLPCC 2013 (link, overview slides, data)

Selected Systems

  • XiaoKe : A cross-language news generation system for generating scientific news stories about the latest discoveries from the world's leading science journals. The system has been deployed in China Science Daily and you can see examples here. You can refer to the news report to know more about XiaoKe.
  • XiaoNan : A data-to-text generation system for writing people's livelihood news. The system has been deployed in Southern Metropolis Daily and it can write news articles in various domains, including weather reports, train tickets, etc. See examples here.
  • PKUWriter : A system for constructing sports news from live text commentary (paper). The technology has been transferred to Toutiao's AI reporter - Xiaomingbot and further used by Guangmingwang in the GuangMing AI reporter.
  • PKUSUMSUM : A Java platform for unsupervised multilingual document summarization (link)

Teaching

  • Large Language Models and Natural Language Generation (Undergraduate course, Spring 2024)
  • Web Data Mining (Undergraduate course, Fall 2011~2023)
  • Semantic Computing and Knowledge Retrieval (Postgraduate course, Fall 2011~2014, Spring 2016~2022)
  • Social Media Measurement (Postgraduate course, Co-directed with Prof. Xiuli Wang, Spring 2016, 2018~2020)

Students

  • PhD students: Hui Liu, Xinyu Hu, Junzhe Zhang, Xu Zhang, Baizhou Huang, Fan Xu
  • Master students: Xiang Chen, Zhaohong Wan, Mingqi Gao, Xunjian Yin
  • Visiting students/Interns: Jie Ruan(PKU), Liming Yang(THU), Xiao Pu(PKU), Huixuan Zhang(PKU), Li Lin (PKU), Yun Lin (PKU), Zhenliang Zhang (BUAA), Yixiang Liu (PKU), Jing Xiong (PKU), Jiatao Li (PKU)

Alumni

  • PhD: Xiaojiang Huang(Microsoft), Jin-ge Yao(MSRA), Xinjie Zhou(Microsoft), Jiwei Tan(Alibaba), Ke Wang(Huawei), Zi Chai(ByteDance)
  • MS: Tengfei Ma(Phd@Tokyo U), Liqiang Guo(Microsoft), Houping Jia(IBM), Shanshan Huang(China Life Insurance), Xuewei Tang(WeCash), Yue Hu(Alibaba), Su Yan(Alibaba), Yantao Du(Google), Shiyang Wen(Alibaba), Yang Yu(A Certain Unit), Xun Zhang(Alibaba), Jianmin Zhang(Pony.ai), Kui Xu(JD), Junjie Cao(Alibaba), Tianming Wang(Alibaba), Hongyu Zang(Microsoft), Mengyu Zhang(Huawei), Lixin Liu(Alibaba), Zhiwei Yu(MSRA), Minghao Chen(Meituan), Zefeng Lin(IA, CAS), Xinyu Xing(ByteDance), Hanqi Jin(Alibaba), Yue Cao(Meituan), Shaowei Yao(Alibaba), Yajie Ye(ByteDance), Yuanyuan Zhao(Shandong Provincial Department of Education), Yitao Cai(ByteDance), Zhe Lin(Alibaba), Sheng Xu(Meituan), Renliang Sun(Phd@Waterloo U), Zhixian Yang(Baidu)
  • I have also worked with many diligent, honest and talented undergraduate students and visiting graduate students.

Links


Last Updated: Mar. 2024. Visitor number: URL Counter