Viscent's Homepage

Hi, my name is

Yiqi "Viscent" Zhang (张懿麒)

MLSys / AI infrastructure researcher for scalable LLM systems.

I design AI infrastructure for large language models: training systems, RL post-training pipelines, scheduling, offloading, and heterogeneous serving.

My systems work includes MuxRL for cluster-level RLVR service multiplexing, SpeedLoader for I/O-efficient distributed LLM operation, and SortedRL for efficient RL rollout scheduling.

Resume

About Me

I am a Data Science PhD candidate at the NUS HPC-AI Lab, advised by Dr. Yang You. My current research is MLSys and AI infrastructure: making LLM training, RLVR/post-training, inference, and serving efficient across clusters and heterogeneous memory/computation.

I usually work close to the systems boundary: algorithm-system co-design, runtime and scheduler design, data movement, model-state management, and performance debugging. Recent systems projects reflect this path: MuxRL on cluster-level RLVR multiplexing, SpeedLoader (NeurIPS 2024) on heterogeneous/offloaded LLM operation, and SortedRL on rollout scheduling for LLM RL.

Before moving into MLSys, I worked in neuroimaging and computational neuroscience. That earlier path still shapes how I think about noisy data, measurement, and complex systems.

Research focus:

AI Infrastructure
LLM Training Systems
RL Post-training Systems
Distributed and Heterogeneous Computing
Scheduling and Runtime Systems
Model Serving and Offloading
Performance Engineering

Experience

Research Intern - Qiji Zhifeng

Jun 2025 - Jan 2026

I worked on infrastructure for large-scale agentic post-training. My projects centered on cluster-level multiplexing for RLVR, service-oriented RL training, and unified model-state management, including MuxRL, Weaver, and the NexRL/Nex-N1 ecosystem.

The core systems theme was to turn post-training into an AI infrastructure problem: scheduling heterogeneous jobs, recycling idle GPUs across workloads, decoupling rollout/training/tool services, and hiding low-level parallelism details from algorithm designers.

Research Intern - Microsoft Research Asia

Jan 2025 - Jun 2025

SortedRL

I developed SortedRL, a system for online length-aware rollout scheduling in LLM reinforcement learning. It exploits output-length distribution during training to reduce rollout bubbles, construct efficient update batches, and maintain policy freshness.

The resulting scheduler reduced rollout scheduling bubbles by about 70% and cut the number of training steps for reasoning-model workloads by about 50%.

PhD Researcher - NUS HPC-AI Lab

Nov 2021 - Present

SpeedLoader

My PhD research focuses on systems for LLM training and inference under constrained accelerator memory. In the NeurIPS 2024 paper SpeedLoader, I redesigned data movement for offloaded and sharded LLM operation across heterogeneous hardware.

The broader goal is to make large-model infrastructure less bottlenecked by memory hierarchy, communication, and runtime scheduling overhead.

Graduate Researcher - KCL IoPPN

Apr 2022 - Jan 2023

View SR-UNet

Earlier in my research path, I worked on neuroimaging systems and medical image reconstruction. At King’s College London’s Institute of Psychiatry, Psychology & Neuroscience, I designed an explainable super-resolution framework for ultra-low-field MRI and applied Bayesian model selection to choose reliable models at inference time.

Student Researcher - SUSTech MEDICAL Lab

Jan 2021 - Jan 2022

View AADG

I contributed to medical image augmentation for domain generalization, using statistical evidence to maximize data diversity and improve model generalizability. This work sits earlier in my ML trajectory, before my current focus on LLM systems and AI infrastructure.

Student Researcher - SUSTech CCSE

Jan 2019 - Sept 2023

SC Asia talk

This was my foundation in high-performance computing and performance engineering. I benchmarked new devices and software stacks, optimized scientific applications such as Relion, Presto, and DLRM, and ported the discrete unified gas kinetic scheme to heterogeneous platforms with up to 16x speedup.

Education

Jan 2023 - Present

PhD in Data Science

National University of Singapore

Advisor: Dr. Yang You.

Sept 2021 - Sept 2022

MSc in Neuroscience

King's College London

Grade: Distinction

Advisor: Prof. Rosalyn Moran.

Sept 2018 - Jun 2022

BSc in Bioscience

Southern University of Science and Technology

Grade: Summa Cum Laude

Advisor: Prof. Shengtao Hou. Joint program with King’s College London.

Publications

AI Infra RLVR

MuxRL: Cluster-Level Multiplexing for Unified LLM Services in RLVR

Manuscript under submission, 2026

AI infrastructure work for RLVR workloads. MuxRL is a cluster-level runtime that multiplexes unified LLM services across RLVR jobs by centrally managing model placement, state transitions, and function-level scheduling under strict affinity constraints, improving effective cluster capacity and reducing user GPU-hour cost by up to 37.58%.

LLM RL Scheduling

SortedRL: Accelerating RL Training for LLMs through Online Length-aware Scheduling

ICML 2025 ES-FoMo III Workshop / arXiv 2026

Online length-aware scheduling work for LLM RL. SortedRL treats rollout as the systems bottleneck, dynamically groups trajectories by generated length, supports large rollout batches with flexible update batches, and reduces RL training bubble ratios by over 50% while preserving or improving reasoning performance.

Project page

AI Infra Offloading

SpeedLoader: An I/O-efficient Scheme for Heterogeneous and Distributed LLM Operation

NeurIPS 2024

NeurIPS 2024 work on AI infrastructure for constrained LLM operation. SpeedLoader redesigns data movement across heterogeneous memory and sharded workers, improving effective compute utilization and reducing training/inference overhead in distributed LLM workloads.

Full text

AADG: Automatic Augmentation for Domain Generalization on Retinal Image Segmentation

IEEE Transactions on Medical Imaging

Earlier ML and medical-imaging work on automatic augmentation for domain generalization. AADG samples augmentation policies that create novel domains and diversify retinal image segmentation training data across fundus and OCTA datasets.

Full text

40 Hz Light Flicker Alters Human Brain EEG Microstates and Complexity

Frontiers in Neuroscience

Neuroscience work from my earlier research path. This study used 64-channel EEG to investigate how 40 Hz flicker stimulation changes oscillation power, microstate dynamics, and complexity in healthy young adults.

Full text

Achievements

President's Graduate Scholarship

National University of Singapore, 2023

Summa Cum Laude

Southern University of Science and Technology, 2022

3rd Place Winner of ISC22

ISC22 Student Cluster Competition, 2022

Overall Champion of 4th APAC HPC-AI Competition

HPC-AI Advisory Council, 2021

AI Special Prize of 4th APAC HPC-AI Competition

HPC-AI Advisory Council, 2021

Highest Linpack Benchmark Winner

SC21 Virtual Student Cluster Competition, ACM SIGHPC/IEEE, 2021

1st Prize of ASC20-21

ASC20-21 Student Supercomputer Challenge, 2021

3rd Prize of 3rd APAC HPC-AI Competition

HPC-AI Advisory Council, 2020

Accelerated Computing C++/Python

Certified by NVIDIA DLI, 2020

Get In Touch

I am always happy to talk about MLSys, AI infrastructure, LLM training, RL post-training systems, model serving, or research collaborations.

Mail me