Hi, my name is

Yiqi "Viscent" Zhang (张懿麒)

MLSys / AI infrastructure researcher for scalable LLM systems.

I design AI infrastructure for large language models: training systems, RL post-training pipelines, scheduling, offloading, and heterogeneous serving.

My systems work includes MuxRL for cluster-level RLVR service multiplexing, SpeedLoader for I/O-efficient distributed LLM operation, and SortedRL for efficient RL rollout scheduling.

About Me

I am a Data Science PhD candidate at the NUS HPC-AI Lab, advised by Dr. Yang You. My current research is MLSys and AI infrastructure: making LLM training, RLVR/post-training, inference, and serving efficient across clusters and heterogeneous memory/computation.

I usually work close to the systems boundary: algorithm-system co-design, runtime and scheduler design, data movement, model-state management, and performance debugging. Recent systems projects reflect this path: MuxRL on cluster-level RLVR multiplexing, SpeedLoader (NeurIPS 2024) on heterogeneous/offloaded LLM operation, and SortedRL on rollout scheduling for LLM RL.

Before moving into MLSys, I worked in neuroimaging and computational neuroscience. That earlier path still shapes how I think about noisy data, measurement, and complex systems.

Research focus:
  • AI Infrastructure
  • LLM Training Systems
  • RL Post-training Systems
  • Distributed and Heterogeneous Computing
  • Scheduling and Runtime Systems
  • Model Serving and Offloading
  • Performance Engineering

Experience

Research Intern - Qiji Zhifeng
Jun 2025 - Jan 2026

I worked on infrastructure for large-scale agentic post-training. My projects centered on cluster-level multiplexing for RLVR, service-oriented RL training, and unified model-state management, including MuxRL, Weaver, and the NexRL/Nex-N1 ecosystem.

The core systems theme was to turn post-training into an AI infrastructure problem: scheduling heterogeneous jobs, recycling idle GPUs across workloads, decoupling rollout/training/tool services, and hiding low-level parallelism details from algorithm designers.

Research Intern - Microsoft Research Asia
Jan 2025 - Jun 2025

I developed SortedRL, a system for online length-aware rollout scheduling in LLM reinforcement learning. It exploits output-length distribution during training to reduce rollout bubbles, construct efficient update batches, and maintain policy freshness.

The resulting scheduler reduced rollout scheduling bubbles by about 70% and cut the number of training steps for reasoning-model workloads by about 50%.

PhD Researcher - NUS HPC-AI Lab
Nov 2021 - Present

My PhD research focuses on systems for LLM training and inference under constrained accelerator memory. In the NeurIPS 2024 paper SpeedLoader, I redesigned data movement for offloaded and sharded LLM operation across heterogeneous hardware.

The broader goal is to make large-model infrastructure less bottlenecked by memory hierarchy, communication, and runtime scheduling overhead.

Graduate Researcher - KCL IoPPN
Apr 2022 - Jan 2023
Earlier in my research path, I worked on neuroimaging systems and medical image reconstruction. At King’s College London’s Institute of Psychiatry, Psychology & Neuroscience, I designed an explainable super-resolution framework for ultra-low-field MRI and applied Bayesian model selection to choose reliable models at inference time.
Student Researcher - SUSTech MEDICAL Lab
Jan 2021 - Jan 2022
I contributed to medical image augmentation for domain generalization, using statistical evidence to maximize data diversity and improve model generalizability. This work sits earlier in my ML trajectory, before my current focus on LLM systems and AI infrastructure.
Student Researcher - SUSTech CCSE
Jan 2019 - Sept 2023
This was my foundation in high-performance computing and performance engineering. I benchmarked new devices and software stacks, optimized scientific applications such as Relion, Presto, and DLRM, and ported the discrete unified gas kinetic scheme to heterogeneous platforms with up to 16x speedup.

Education

Jan 2023 - Present
PhD in Data Science
National University of Singapore
Advisor: Dr. Yang You.
Sept 2021 - Sept 2022
MSc in Neuroscience
King's College London
Grade: Distinction
Advisor: Prof. Rosalyn Moran.
Sept 2018 - Jun 2022
BSc in Bioscience
Southern University of Science and Technology
Grade: Summa Cum Laude
Advisor: Prof. Shengtao Hou. Joint program with King’s College London.

Publications

MuxRL: Cluster-Level Multiplexing for Unified LLM Services in RLVR
AI Infra RLVR
MuxRL: Cluster-Level Multiplexing for Unified LLM Services in RLVR

Manuscript under submission, 2026

AI infrastructure work for RLVR workloads. MuxRL is a cluster-level runtime that multiplexes unified LLM services across RLVR jobs by centrally managing model placement, state transitions, and function-level scheduling under strict affinity constraints, improving effective cluster capacity and reducing user GPU-hour cost by up to 37.58%.
SortedRL: Accelerating RL Training for LLMs through Online Length-aware Scheduling
LLM RL Scheduling
SortedRL: Accelerating RL Training for LLMs through Online Length-aware Scheduling

ICML 2025 ES-FoMo III Workshop / arXiv 2026

Online length-aware scheduling work for LLM RL. SortedRL treats rollout as the systems bottleneck, dynamically groups trajectories by generated length, supports large rollout batches with flexible update batches, and reduces RL training bubble ratios by over 50% while preserving or improving reasoning performance.
SpeedLoader: An I/O-efficient Scheme for Heterogeneous and Distributed LLM Operation
AI Infra Offloading
SpeedLoader: An I/O-efficient Scheme for Heterogeneous and Distributed LLM Operation

NeurIPS 2024

NeurIPS 2024 work on AI infrastructure for constrained LLM operation. SpeedLoader redesigns data movement across heterogeneous memory and sharded workers, improving effective compute utilization and reducing training/inference overhead in distributed LLM workloads.
AADG: Automatic Augmentation for Domain Generalization on Retinal Image Segmentation
AADG: Automatic Augmentation for Domain Generalization on Retinal Image Segmentation

IEEE Transactions on Medical Imaging

Earlier ML and medical-imaging work on automatic augmentation for domain generalization. AADG samples augmentation policies that create novel domains and diversify retinal image segmentation training data across fundus and OCTA datasets.
40 Hz Light Flicker Alters Human Brain EEG Microstates and Complexity
40 Hz Light Flicker Alters Human Brain EEG Microstates and Complexity

Frontiers in Neuroscience

Neuroscience work from my earlier research path. This study used 64-channel EEG to investigate how 40 Hz flicker stimulation changes oscillation power, microstate dynamics, and complexity in healthy young adults.

Achievements

President's Graduate Scholarship
National University of Singapore, 2023
Summa Cum Laude
Southern University of Science and Technology, 2022
3rd Place Winner of ISC22
ISC22 Student Cluster Competition, 2022
Overall Champion of 4th APAC HPC-AI Competition
HPC-AI Advisory Council, 2021
AI Special Prize of 4th APAC HPC-AI Competition
HPC-AI Advisory Council, 2021
Highest Linpack Benchmark Winner
SC21 Virtual Student Cluster Competition, ACM SIGHPC/IEEE, 2021
1st Prize of ASC20-21
ASC20-21 Student Supercomputer Challenge, 2021
3rd Prize of 3rd APAC HPC-AI Competition
HPC-AI Advisory Council, 2020
Accelerated Computing C++/Python
Certified by NVIDIA DLI, 2020

Get In Touch

I am always happy to talk about MLSys, AI infrastructure, LLM training, RL post-training systems, model serving, or research collaborations.