Automated Bug Discovery in Cloud Infrastructure-as-Code Updates with LLM Agents. In AIOps 2025 (ICSE Workshop)

Cloud environments are increasingly managed by Infrastructure-as-Code (IaC) platforms like Terraform, which let developers define infrastructure as configuration code. While IaC automates deployment, its update logic is error-prone, often introducing subtle yet impactful bugs. IaC updates are common because cloud infrastructures are long-lived but user requirements fluctuate over time. Testing updates is challenging due to the vast and evolving search space of infrastructure setups and resources. We introduce TerraFault, an efficient, LLM-guided system for discovering update bugs. Our prototype optimizes search and testing to systematically detect bugs, even in simple updates, improving Cloud reliability.

January 2025 · Yiming Xiang, Zhenning Yang, Jingjia Peng, Hermann Bauer, Patrick Tser Jern Kon, Yiming Qiu, and Ang Chen

Automated Lifting for Cloud Infrastructure-as-Code Programs. In AIOps 2025 (ICSE Workshop)

While effective for greenfield (new) cloud deployments, existing IaC platforms struggle with brownfield migration—translating existing non-IaC infrastructure into IaC programs. This limits Cloud adoption, as current tools rely on error-prone, rule-based reverse engineering. We introduce Lilac, a novel approach that automates IaC lifting by combining LLMs for rule extraction with symbolic methods for correctness assurance. Lilac aims to enable an automated, provider-agnostic lifting tool with broad coverage and high accuracy, streamlining IaC adoption.

January 2025 · Jingjia Peng, Yiming Qiu, Patrick Tser Jern Kon, Pinhan Zhao, Yibo Huang, Zheng Guo, Xinyu Wang, and Ang Chen

IaC-Eval: A code generation benchmark for Cloud Infrastructure-as-Code programs. In NeurIPS 2024

While LLMs show potential in general code generation, their efficacy in IaC development remains unknown. To address this, we developed the first dataset and benchmark capable of evaluating IaC code generation. Our dataset comprises 458 human-curated scenarios spanning various AWS services, involving over 1,720 hours of human effort. Our results reveal significant performance gaps.

September 2024 · Patrick Tser Jern Kon, Jiachen Liu, Yiming Qiu, Weijun Fan, Ting He, Lei Lin, Haoran Zhang, Owen M. Park, George Sajan Elengikal, Yuxin Kang, Ang Chen, Mosharaf Chowdhury, Myungjin Lee, and Xinyu Wang

Unearthing Semantic Checks for Cloud Infrastructure-as-Code Programs. In SOSP 2024

Zodiac automatically unearths complex cloud IaC semantic checks/rules that state-of-the-art IaC tools cannot easily capture, allowing us to reduce runtime error violations that can take very long to debug, into simple compile time checks.

August 2024 · Yiming Qiu, Patrick Tser Jern Kon, Ryan Beckett, and Ang Chen