IaC-Eval: A code generation benchmark for Infrastructure-as-Code programs. In NeurIPS 2024

While LLMs show potential in general code generation, their efficacy in IaC development remains unknown. To address this, we developed the first dataset and benchmark capable of evaluating IaC code generation. Our dataset comprises 458 human-curated scenarios spanning various AWS services, involving over 1,720 hours of human effort. Our results reveal significant performance gaps.

September 2024 · Patrick Tser Jern Kon, Jiachen Liu, Yiming Qiu, Weijun Fan, Ting He, Lei Lin, Haoran Zhang, Owen M. Park, George Sajan Elengikal, Yuxin Kang, Ang Chen, Mosharaf Chowdhury, Myungjin Lee, and Xinyu Wang

Unearthing Semantic Checks for Cloud Infrastructure-as-Code Programs. In SOSP 2024

Zodiac automatically unearths complex cloud IaC semantic checks/rules that state-of-the-art IaC tools cannot easily capture, allowing us to reduce runtime error violations that can take very long to debug, into simple compile time checks.

August 2024 · Yiming Qiu, Patrick Tser Jern Kon, Ryan Beckett, and Ang Chen