Hi! I am Tianyi :)
I conduct machine learning research on AI safety and alignment, with a special focus on how it interacts with societal systems and human moral progress.
I was a research intern at the Center for Human-Compatible AI, UC Berkeley. I am a senior at Peking University, as a member of the Turing Class in Computer Science and the PKU Alignment Team.
Feel free to check out my CV.
My answer to the Hamming question (“What are the most important problems [that you should probably work on]?”)
How do we prevent worst-case negative impact of AI on human social epistemology? How do we avoid premature lock-in of our current values into AI systems, and, on a more positive note, potentially facilitate societal-scale moral progress?
How do we combine theory & experimental validation to help resolve fundamental disagreements (e.g. those on misgeneralization and deceptive alignment) in the field of AI safety and alignment, just like the way physicists resolve their disagreements?
How do we discover currently neglected challenges facing AI safety and alignment?
I strive to become a “full-stack researcher”, hoping to combine experimental, mathematical, and interdisciplinary methods to approach these problems.
You may head for my Google Scholar profile to view my other publications and the stats!
Project: Progress alignment (to prevent premature value lock-in)
Risk (pre-mature value lock-in): Frontier AI systems hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale.
Solution (progress alignment): We introduce progress alignment as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.
Infrastructure (ProgressGym): To empower research in progress alignment, we introduce ProgressGym, an experimental framework allowing the learning of moral progress mechanics from history, in order to facilitate future progress in real-world moral decisions. Leveraging 9 centuries of historical text and 18 historical LLMs, ProgressGym enables codification of real-world progress alignment challenges into concrete benchmarks. (Hugging Face, GitHub, Leaderboard)
ProgressGym: Alignment with a Millennium of Moral Progress (NeurIPS'24 Spotlight, Dataset & Benchmark Track)
Tianyi Qiu†*, Yang Zhang*, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang (2024) (†Project lead, *Equal technical contribution)
Project: Theoretical deconfusion
Classical social choice theory assumes complete information over all preferences of all stakeholders. It's not true for AI alignment, nor for legislation, indirect elections, etc. Dropping such an assumption, this work designs the representative social choice formalism that models social choice decisions based on a mere finite sample of preferences. Its analytical tractability is established with statistical learning theory, while at the same time, Arrow-like impossibility theorems are proved.
Representative Social Choice: From Learning Theory to AI Alignment (top 5 papers, Oral/Contributed Talk, NeurIPS 2024 Pluralistic Alignment Workshop)
Tianyi Qiu (2024)
It is well known that classical generalization analysis doesn't work on deep neural nets without prohibitively strong assumptions. This work tries to develop an alternative: an empirically grounded model of reward generalization in RLHF that can derive formal generalization bounds while taking into account fine-grained information topologies.
Reward Generalization in RLHF: A Topological Perspective (preprint)
Tianyi Qiu†*, Fanzhi Zeng*, Jiaming Ji*, Dong Yan*, Kaile Wang, Jiayi Zhou, Han Yang, Juntao Dai, Xuehai Pan, Yaodong Yang (2024) (†Project lead, *Equal technical contribution)
Alignment training is easily undone with finetuning. Why so? This work proves that further finetuning degrades alignment performance far faster than it degrades pretraining performance, due to the much smaller amount of alignment training data. We operate under a compression theory-based model of multi-stage training.
Language Models Resist Alignment (NeurIPS 2024 SoLaR Workshop)
Jiaming Ji*, Kaile Wang*, Tianyi Qiu*, Boyuan Chen*, Jiayi Zhou, Changye Li, Hantao Lou, Yaodong Yang (2024) (*Equal contribution)
Project: Surveying the AI safety & alignment field
Since early 2023 when the alignment field started to undergo rapid growth, there has not yet been a comprehensive review article surveying the field. We have thus conducted a review that aims to be as comprehensive as possible, all the while constructing a unified framework (the alignment cycle). We emphasize the alignment of both contemporary AI systems and more advanced systems that pose more serious risks. Since its publication, it has seen citation by important AI safety works from Dalrymple, Skalse, Bengio, Russell et al. and NIST, and has been featured in various high-profile venues in China and Singapore.
I co-led this project.
AI Alignment: A Comprehensive Survey (preprint)
Jiaming Ji*, Tianyi Qiu*, Boyuan Chen*, Borong Zhang*, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao (2023) (*Equal contribution)
Aug 2020: Won a gold medal in the Chinese National Olympiad in Informatics 2020
Mar 2021: Started as a visiting student at Peking University
Nov 2021: Started reading and thinking a lot about AI safety/alignment
Sept 2022: Officially started as an undergraduate student at Peking University, as a member of the Turing Class
June 2023: Started working with the PKU Alignment Group, advised by Prof. Yaodong Yang
June 2024: Started as a research intern at Center for Human-Compatible AI, UC Berkeley, co-advised by Micah and Cam
Sept 2024: Start as an exchange student at University of California, via the UCEAP reciprocity program with PKU
June 2026 (est.): Graduation, and hopefully starting my PhD journey :)
[Talk] Value Alignment: History, Frontiers, and Open Problems (May 2024)
[Talk] ProgressGym: Alignment with a Millennium of Moral Progress (May 2024)
[Talk] Towards Moral Progress Algorithms Implementable in the Next GPT (Apr 2024)
Please feel free to reach out! If you are on the fence about getting in touch, consider yourself encouraged to do so :)
I can be reached at the email address qiutianyi.qty@gmail.com, or on Twitter via the handle @Tianyi_Alex_Qiu. You could also book a quick call with me via this Calendly link - it's as simple as a click!