Ahmad Humayun
4th Floor Gilbert Place
220 Gilbert St
Blacksburg, VA 24060
I’m a 5th year PhD candidate in Computer Science at Virginia Tech, advised by Prof. Muhammad Ali Gulzar on automated software testing and security of distributed data-intensive scalable computing (DISC) applications. I also collaborate closely with Prof. Miryung Kim @ UCLA.
My research focuses on developing novel methods to improve testing and debugging across two domains: (1) big data analytics applications, targeting DISC frameworks like Apache Spark and Apache Flink, and (2) large language models (LLMs), where I investigate code reasoning capabilities, fault localization, and provenance tracking in federated learning settings. I’ve published my work at top-tier venues including ESEC/FSE and IEEE/ACM ASE. My tools have discovered multiple previously unknown faults in popular distributed frameworks such as Apache Spark, Apache Flink, Polars, and Dask, and my recent work establishes new benchmarking methodologies for evaluating LLM robustness in software engineering tasks.
I recently completed an internship as an Applied Scientist at Amazon Web Services (Summer 2025), where I developed an LLM-powered application to automate the modeling of complex distributed algorithms in low-resource programming languages, deployed both as a standalone application and an MCP server. Previously, as an Applied Scientist intern at AWS (Summer 2024), I enhanced the automated testing infrastructure of critical AWS Services.
news
| Feb 10, 2026 | 🎉 Our paper “ProToken: Token-Level Attribution for Federated Large Language Models” has been accepted to ICPC 2026! |
|---|---|
| Feb 10, 2026 | 🎉 Our paper “Generating and Understanding Tests via Path-Aware Symbolic Execution with LLMs” has been accepted to ICPC 2026! |
| Jan 26, 2026 | 🎉 We just submitted exciting work on DAG-based fuzzing for Dataflow Frameworks to ISSTA ‘26! Our work has exposed several faults across four different frameworks: Apache Spark, Apache Flink, Dask and Polars! |
| Aug 15, 2025 | 🚀 I was offered to return for another Applied Science internship at AWS! |
| Nov 07, 2024 | 🎖️ Honored to serve on the Program Committee for TaPP 2024 - Workshop on the Theory and Practice of Provenance! |
| Aug 15, 2024 | 🚀 Excited to start my internship at AWS as an Applied Scientist working with Ankush Desai and Aman Goel! |
| Apr 15, 2024 | 🎉 Our paper on natural symbolic execution-based testing for big data analytics has been accepted at ESEC/FSE 2024! |
| Sep 13, 2023 | 🎤 I presented our work on Natural Input Generation for Big Data Analytics at ASE 2023 in Luxembourg! |
| Sep 11, 2023 | 🎤 I presented Co-dependence Aware Fuzzing for Dataflow-based Big Data Analytics at ESEC/FSE 2023 in San Francisco you can find my talk here! |
| Aug 21, 2023 | 🏆 I was awarded a SIGSOFT grant to present my work at the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023) in Luxembourg. |
| Jul 17, 2023 | 🎉 Our paper on natural input generation for data intensive applications has been accepted at ASE ‘23! |
| May 04, 2023 | 🎉 Our paper on co-dependence aware fuzzing for dataflow-based big data analytics has been accepted at ESEC/FSE 2023! |
selected publications
- MLSys 2026ProToken: Token-Level Attribution for Federated Large Language ModelsNinth Annual Conference on Machine Learning and Systems, 2026
- ICPC 2026Generating and Understanding Tests via Path-Aware Symbolic Execution with LLMsThe 34th IEEE/ACM International Conference on Program Comprehension, 2026
- [under review]How Accurately Do Large Language Models Understand Code?Jul 2025