Publications
I specialize in developing principled, controllable and efficient generative models for various modalities.
|
Continuous Diffusion Scales Competitively with Discrete Diffusion for Language
Zhihan Yang, Wei Guo, Shuibai Zhang, Subham Sekhar Sahoo, Yongxin Chen, Arash Vahdat, Morteza Mardani†, John Thickstun† (Joint Senior Authors)
arXiv, 2026
arxiv /
website /
NV research page
We establish the first scaling law for continuous diffusion language models that rivals discrete DLMs.
|
|
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
Shuibai Zhang*, Caspian Zhuang*, Chihan Cui*, Zhihan Yang, Zhangzhi Peng, Yanxin Zhang, Haoyue Bai, Zack Jia, Yang Zhou, Guanhua Chen, Ming Liu
Under review at COLM 2026
arxiv /
code
We establish EC routing as a superior paradigm for DLM MoE models.
|
|
Scaling Beyond Masked Diffusion Language Models
Subham Sekhar Sahoo, Jean-Marie Lamercier*, Zhihan Yang*, Justin Deschenaux* (Joint Second Authors), Jingyu Liu, John Thickstun, Ante Jukić
ICML 2026
arxiv /
code /
website /
twitter /
my talk
We demonstrate that uniform-state diffusion could beat masked diffusion on likelihood evaluation benchmarks and GSM8K. I led the full SFT pipeline for AR, MDLM, and Eso-LMs.
|
|
Esoteric Language Models: A Family of Any-Order Diffusion LLMs
Zhihan Yang*, Subham Sekhar Sahoo* (Joint First Authors), Yash Akhauri†, Johnna Liu†, Deepansha Singh†, Zhoujun Cheng† (Joint Second Authors), Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat
ICML 2026, Oral at ICLR 2026 - Workshop on Multimodal Intelligence
arxiv /
code /
website /
twitter /
my talk
We are the first to propose exact KV-caching and exact likelihood for masked diffusion language models.
|
|
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Marianne Arriola, Aaron Gokaslan, Justin Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sahoo, Volodymyr Kuleshov
ICLR (Oral), 2025
arxiv /
code /
website
We introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models.
|
|
Adversarial Bandits for Drawing Generalizable Conclusions in Non-Adversarial Experiments: An Empirical Study
Zhihan Yang, Shiyue Zhang, Anna Rafferty
EDM (Short Paper), 2022
arxiv /
code /
website
We empirically analyse how adversarial bandit algorithms can enhance the reliability of conclusions drawn from large-scale educational experiments.
|
|
Hierarchical Reinforcement Learning under Mixed Observability
Hai Nguyen*, Zhihan Yang* (Joint First Authors), Andrea Baisero, Xiao Ma, Robert Platt†, Christopher Amato† (Joint Senior Authors)
WAFR, 2022
arxiv /
website
We present a hierarchical RL framework that handles mixed observability settings, enabling modular policies that scale to complex robotic tasks.
|
|
Recurrent Off-policy Baselines for Memory-based Continuous Control
Zhihan Yang*, Hai Nguyen* (Joint First Authors)
Deep RL Workshop @ NeurIPS, 2021
arxiv /
code
We establish strong recurrent off-policy baselines for tasks requiring long-term memory.
|
|
Game Level Clustering and Generation using Gaussian Mixture VAEs
Zhihan Yang, Anurag Sarkar, Seth Cooper
AIIDE (Oral), 2020
arxiv
We leverage the Gaussian-Mixture VAE framework to cluster game levels in an unsupervised manner and synthesize novel game levels from the learned clusters.
|
|