Ahmad Humayun
4th Floor Gilbert Place
220 Gilbert St
Blacksburg, VA 24060
I’m a 5th year PhD candidate in Computer Science at Virginia Tech, advised by Prof. Muhammad Ali Gulzar on automated software testing and security of distributed data-intensive scalable computing (DISC) and Machine Learning (ML) applications. I also collaborate closely with Prof. Miryung Kim @ UCLA.
My research focuses on developing novel methods to improve testing and debugging across two domains: (1) big data analytics applications, targeting DISC frameworks like Apache Spark and Apache Flink, and (2) large language models (LLMs), where I investigate code reasoning capabilities, fault localization, and provenance tracking in federated learning settings. I’ve published my work at top-tier venues including ESEC/FSE and IEEE/ACM ASE. My tools have discovered multiple previously unknown faults in popular distributed frameworks such as Apache Spark, Apache Flink, Polars, and Dask. My recent work introduces token-level attribution techniques for federated LLMs to enable debugging, malicious client detection, and trust verification in collaborative learning environments.
I recently completed an internship as an Applied Scientist at Amazon Web Services (Summer 2025), where I developed an LLM-powered application to automate the modeling of complex distributed algorithms in low-resource programming languages, deployed both as a standalone application and an MCP server. Previously, as an Applied Scientist intern at AWS (Summer 2024), I enhanced the automated testing infrastructure of critical AWS Services.
news
| Feb 10, 2026 | 🎉 Our paper “ProToken: Token-Level Attribution for Federated Large Language Models” has been accepted to MLSys 2026! |
|---|---|
| Feb 10, 2026 | 🎉 Our paper “Generating and Understanding Tests via Path-Aware Symbolic Execution with LLMs” has been accepted to ICPC 2026! |
| Jan 26, 2026 | 🎉 We just submitted exciting work on DAG-based fuzzing for Dataflow Frameworks to ISSTA ‘26! Our work has exposed several faults across four different frameworks: Apache Spark, Apache Flink, Dask and Polars! |
| Aug 15, 2025 | 🚀 I was offered to return for another Applied Science internship at AWS! |
| Nov 07, 2024 | 🎖️ Honored to serve on the Program Committee for TaPP 2024 - Workshop on the Theory and Practice of Provenance! |
| Aug 15, 2024 | 🚀 Excited to start my internship at AWS as an Applied Scientist working with Ankush Desai and Aman Goel! |
| Apr 15, 2024 | 🎉 Our paper on natural symbolic execution-based testing for big data analytics has been accepted at ESEC/FSE 2024! |
| Sep 13, 2023 | 🎤 I presented our work on Natural Input Generation for Big Data Analytics at ASE 2023 in Luxembourg! |
| Sep 11, 2023 | 🎤 I presented Co-dependence Aware Fuzzing for Dataflow-based Big Data Analytics at ESEC/FSE 2023 in San Francisco you can find my talk here! |
| Aug 21, 2023 | 🏆 I was awarded a SIGSOFT grant to present my work at the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023) in Luxembourg. |
| Jul 17, 2023 | 🎉 Our paper on natural input generation for data intensive applications has been accepted at ASE ‘23! |
| May 04, 2023 | 🎉 Our paper on co-dependence aware fuzzing for dataflow-based big data analytics has been accepted at ESEC/FSE 2023! |
selected publications
- MLSys 2026ProToken: Token-Level Attribution for Federated Large Language ModelsNinth Annual Conference on Machine Learning and Systems, 2026
- ICPC 2026Generating and Understanding Tests via Path-Aware Symbolic Execution with LLMsThe 34th IEEE/ACM International Conference on Program Comprehension, 2026
- [under review]How Accurately Do Large Language Models Understand Code?Jul 2025