Ahmad Humayun

prof_pic.jpg

4th Floor Gilbert Place

220 Gilbert St

Blacksburg, VA 24060

ahmad35@vt.edu
LinkedIn
GitHub
Google Scholar
CV

I’m a 5th year PhD candidate in Computer Science at Virginia Tech, advised by Prof. Muhammad Ali Gulzar on automated software testing and security of distributed data-intensive scalable computing (DISC) applications. I also collaborate closely with Prof. Miryung Kim @ UCLA.

My research focuses on developing novel methods to improve testing and debugging across two domains: (1) big data analytics applications, targeting DISC frameworks like Apache Spark and Apache Flink, and (2) large language models (LLMs), where I investigate code reasoning capabilities, fault localization, and provenance tracking in federated learning settings. I’ve published my work at top-tier venues including ESEC/FSE and IEEE/ACM ASE. My tools have discovered multiple previously unknown faults in popular distributed frameworks such as Apache Spark, Apache Flink, Polars, and Dask, and my recent work establishes new benchmarking methodologies for evaluating LLM robustness in software engineering tasks.

I recently completed an internship as an Applied Scientist at Amazon Web Services (Summer 2025), where I developed an LLM-powered application to automate the modeling of complex distributed algorithms in low-resource programming languages, deployed both as a standalone application and an MCP server. Previously, as an Applied Scientist intern at AWS (Summer 2024), I enhanced the automated testing infrastructure of critical AWS Services.

news

Feb 10, 2026 🎉 Our paper “ProToken: Token-Level Attribution for Federated Large Language Models” has been accepted to ICPC 2026!
Feb 10, 2026 🎉 Our paper “Generating and Understanding Tests via Path-Aware Symbolic Execution with LLMs” has been accepted to ICPC 2026!
Jan 26, 2026 🎉 We just submitted exciting work on DAG-based fuzzing for Dataflow Frameworks to ISSTA ‘26! Our work has exposed several faults across four different frameworks: Apache Spark, Apache Flink, Dask and Polars!
Aug 15, 2025 🚀 I was offered to return for another Applied Science internship at AWS!
Nov 07, 2024 🎖️ Honored to serve on the Program Committee for TaPP 2024 - Workshop on the Theory and Practice of Provenance!
Aug 15, 2024 🚀 Excited to start my internship at AWS as an Applied Scientist working with Ankush Desai and Aman Goel!
Apr 15, 2024 🎉 Our paper on natural symbolic execution-based testing for big data analytics has been accepted at ESEC/FSE 2024!
Sep 13, 2023 🎤 I presented our work on Natural Input Generation for Big Data Analytics at ASE 2023 in Luxembourg!
Sep 11, 2023 🎤 I presented Co-dependence Aware Fuzzing for Dataflow-based Big Data Analytics at ESEC/FSE 2023 in San Francisco you can find my talk here!
Aug 21, 2023 🏆 I was awarded a SIGSOFT grant to present my work at the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023) in Luxembourg.
Jul 17, 2023 🎉 Our paper on natural input generation for data intensive applications has been accepted at ASE ‘23!
May 04, 2023 🎉 Our paper on co-dependence aware fuzzing for dataflow-based big data analytics has been accepted at ESEC/FSE 2023!

selected publications

  1. MLSys 2026
    ProToken: Token-Level Attribution for Federated Large Language Models
    Waris Gill, Ahmad Humayun, Ali Anwar, and 1 more author
    Ninth Annual Conference on Machine Learning and Systems, 2026
  2. ICPC 2026
    Generating and Understanding Tests via Path-Aware Symbolic Execution with LLMs
    Yaoxuan Wu, Xiaojie Zhou, Ahmad Humayun, and 2 more authors
    The 34th IEEE/ACM International Conference on Program Comprehension, 2026
  3. ESEC/FSE 2024
    Natural Symbolic Execution-Based Testing for Big Data Analytics
    Yaoxuan Wu, Ahmad Humayun, Muhammad Ali Gulzar, and 1 more author
    Proceedings of the 32nd ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Jul 2024
  4. ESEC/FSE 2023
    Co-dependence Aware Fuzzing for Dataflow-Based Big Data Analytics
    Ahmad Humayun, Miryung Kim, and Muhammad Ali Gulzar
    In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA, Jul 2023
  5. ASE 2023
    NaturalFuzz: Natural Input Generation for Big Data Analytics
    Ahmad Humayun, Yaoxuan Wu, Miryung Kim, and 1 more author
    In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), Jul 2023
  6. [under review]
    How Accurately Do Large Language Models Understand Code?
    Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun, and 5 more authors
    Jul 2025