Research Summary
I have broad interests in computer architecture and systems. My research focuses on more efficient and resilient system architecture design. My current focused topics include:
- Large-scale Datacenter Optimization: How to enhance the shared datacenters in …
- Efficient Management Facilities: How to eliminate the additional costs of …
- Resilient Architecture Design: How to design low-cost hardware fault tolerance architecture for complex LLM training and serving workloads in future AI infrastructure? [Ongoing]
Research Projects
- Workload-aware Reliability Enhancement of AI Infrastructure. (Alibaba Innovative Research Program, 2025, Project Leader)
- Software-Hardware Co-optimization for Performance Bottleneck in AI Infrastructure. (Alibaba Innovative Research Program, 2024, Project Leader)
- Performance Bottleneck Diagnosis Framework of Cloud-Native Infrastructure. (Alibaba Innovative Research Program, 2023, Project Leader)