Compiler correctness Source: en.wikipedia.org/wiki/Compiler_correctness

In computing, compiler correctness is a subfield of computer science focused on proving that a compiler's output program preserves the semantics defined by the source language's specification.^[1] Techniques include developing the compiler using formal methods and using rigorous testing (often called compiler validation) on an existing compiler.

Formal verification

Formal verification of compiler correctness involves providing a machine-checked proof that a compiler preserves the semantics of the source program through all compilation phases. Unlike testing, which only samples behavior on specific inputs, formal verification ensures semantic preservation for all valid programs.

A prominent example is CompCert, a formally verified optimizing compiler for a large subset of the C programming language. CompCert uses the Coq proof assistant to ensure that for any source program S and compiled code C, if CompCert compiles S successfully, then every observable behavior of C corresponds to an allowed behavior of S.^[2]

The compiler is structured into multiple verified passes, including transformations from Clight (a simplified C subset) to intermediate languages and then to assembly code. Each transformation stage is proven to preserve semantics using simulation or bisimulation arguments.^[3]

CompCert’s correctness relies on:

A formal operational semantics for Clight, which models undefined behaviors and the execution model of C programs.^[4]
Formal semantics for each internal language.
Machine-checked proofs in Coq verifying that the semantics is preserved across each pass.^[5]

Empirical evaluation has shown that CompCert is significantly more reliable than mainstream C compilers: in testing with the Csmith program generator, CompCert was the only compiler for which no wrong-code bugs were found.^[6]

Other formal compiler projects include:

The CakeML project, which verifies a compiler for a functional programming language all the way to machine code.^[7]
Œuf, a verified compiler that translates a subset of Gallina (the language of Coq) to x86 assembly.^[8]

Advantages

Eliminates miscompilation: verified compilers provably do not introduce bugs absent in the source code.
Verified compilation allows properties proved about source programs (e.g., safety or termination) to remain valid after compilation.
The formalization process often leads to improved understanding of programming language semantics and compiler design.

Challenges

Formal verification of compilers is labor-intensive and requires deep expertise in formal methods.
Verified compilers often support only subsets of full programming languages, limiting adoption for production-level software.
Changes in source or target language semantics may require re-verification of all compiler stages.

Relation to other formal methods

Compiler verification is part of the broader field of formal methods, which also includes model checking, abstract interpretation, and deductive verification using tools like Coq, Isabelle, and ACL2. Verified compilers help secure the toolchains used in critical systems, where correctness is essential.

Testing

Testing represents a significant portion of the effort in shipping a compiler, but receives comparatively little coverage in the standard literature. The 1986 edition of Aho, Sethi, & Ullman has a single-page section on compiler testing, with no named examples.^[9] The 2006 edition omits the section on testing, but does emphasize its importance: “Optimizing compilers are so difficult to get right that we dare say that no optimizing compiler is completely error-free! Thus, the most important objective in writing a compiler is that it is correct.”^[10] Fraser & Hanson 1995 has a brief section on regression testing; source code is available.^[11] Bailey & Davidson 2003 cover testing of procedure calls^[12] A number of articles confirm that many released compilers have significant code-correctness bugs.^[13] Sheridan 2007 is probably the most recent journal article on general compiler testing.^[14] For most purposes, the largest body of information available on compiler testing are the Fortran^[15] and Cobol^[16] validation suites.

Further common techniques when testing compilers are fuzzing^[17] (which generates random programs to try to find bugs in a compiler) and test case reduction (which tries to minimize a found test case to make it easier to understand).^[18]

References

^ Leroy, Xavier (2009). "Formal verification of a realistic compiler". Communications of the ACM. 52 (7): 107–115. doi:10.1145/1538788.1538814.
^ Leroy, Xavier (2009). "Formal verification of a realistic compiler". Communications of the ACM. 52 (7): 107–115. doi:10.1145/1538788.1538814.
^ "CompCert C Verified Compiler". CompCert. Retrieved 25 June 2025.
^ "The CompCert C Verified Compiler – Reference Manual". CompCert. Retrieved 25 June 2025.
^ Leroy, Xavier. "A formally verified compiler back-end" (PDF). Inria. Retrieved 25 June 2025.
^ "CompCert Performance and Safety". AbsInt. Retrieved 25 June 2025.
^ Kumar, Ramana (2014). "Compiling Higher-Order Programs to Machine Code". ACM SIGPLAN Notices. 49 (9): 1–12. doi:10.1145/2660193.2628144.
^ "Oeuf: A small proof-producing compiler". GitHub. Retrieved 25 June 2025.
^ Compilers: Principles, Techniques and Tools, infra 1986, p. 731.
^ ibid, 2006, p. 16.
^ Christopher Fraser; David Hanson (1995). A Retargetable C compiler: Design and Implementation. Benjamin/Cummings Publishing. ISBN 978-0-8053-1670-4., pp. 531–3.
^ Mark W. Bailey; Jack W. Davidson (2003). "Automatic Detection and Diagnosis of Faults in Generated Code for Procedure Calls" (PDF). IEEE Transactions on Software Engineering. 29 (11): 1031–1042. CiteSeerX 10.1.1.15.4829. doi:10.1109/tse.2003.1245304. Archived from the original (PDF) on 2003-04-28. Retrieved 2009-03-24., p. 1040.
^ E.g., Christian Lindig (2005). "Random testing of C calling conventions" (PDF). Proceedings of the Sixth International Workshop on Automated Debugging. ACM. ISBN 1-59593-050-7. Archived from the original (PDF) on 2011-07-11. Retrieved 2009-03-24., and Eric Eide; John Regehr (2008). "Volatiles are miscompiled, and what to do about it" (PDF). Proceedings of the 7th ACM international conference on Embedded software. ACM. ISBN 978-1-60558-468-3. Retrieved 2009-03-24.
^ Flash Sheridan (2007). "Practical Testing of a C99 Compiler Using Output Comparison" (PDF). Software: Practice and Experience. 37 (14): 1475–1488. arXiv:2202.07390. doi:10.1002/spe.812. S2CID 9752084. Retrieved 2009-03-24. Bibliography at "Compiler Testing Bibliography". Retrieved 2009-03-13..
^ "Source of Fortran validation suite". Retrieved 2011-09-01.
^ "Source of Cobol validation suite". Retrieved 2011-09-01.
^ Chen, Yang; Groce, Alex; Zhang, Chaoqiang; Wong, Weng-Keen; Fern, Xiaoli; Eide, Eric; Regehr, John (2013). "Taming compiler fuzzers". Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI '13. New York, NY, USA: ACM. pp. 197–208. CiteSeerX 10.1.1.308.5541. doi:10.1145/2491956.2462173. ISBN 9781450320146. S2CID 207205614.
^ Regehr, John; Chen, Yang; Cuoq, Pascal; Eide, Eric; Ellison, Chucky; Yang, Xuejun (2012). "Test-case reduction for C compiler bugs". Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI '12. New York, NY, USA: ACM. pp. 335–346. doi:10.1145/2254064.2254104. ISBN 9781450312059. S2CID 1025409.

[1] Leroy, Xavier (2009). "Formal verification of a realistic compiler". Communications of the ACM. 52 (7): 107–115. doi:10.1145/1538788.1538814.

[2] Leroy, Xavier (2009). "Formal verification of a realistic compiler". Communications of the ACM. 52 (7): 107–115. doi:10.1145/1538788.1538814.

[3] "CompCert C Verified Compiler". CompCert. Retrieved 25 June 2025.

[4] "The CompCert C Verified Compiler – Reference Manual". CompCert. Retrieved 25 June 2025.

[5] Leroy, Xavier. "A formally verified compiler back-end" (PDF). Inria. Retrieved 25 June 2025.

[6] "CompCert Performance and Safety". AbsInt. Retrieved 25 June 2025.

[7] Kumar, Ramana (2014). "Compiling Higher-Order Programs to Machine Code". ACM SIGPLAN Notices. 49 (9): 1–12. doi:10.1145/2660193.2628144.

[8] "Oeuf: A small proof-producing compiler". GitHub. Retrieved 25 June 2025.

[9] Compilers: Principles, Techniques and Tools, infra 1986, p. 731.

[10] ibid, 2006, p. 16.

[11] Christopher Fraser; David Hanson (1995). A Retargetable C compiler: Design and Implementation. Benjamin/Cummings Publishing. ISBN 978-0-8053-1670-4., pp. 531–3.

[12] Mark W. Bailey; Jack W. Davidson (2003). "Automatic Detection and Diagnosis of Faults in Generated Code for Procedure Calls" (PDF). IEEE Transactions on Software Engineering. 29 (11): 1031–1042. CiteSeerX 10.1.1.15.4829. doi:10.1109/tse.2003.1245304. Archived from the original (PDF) on 2003-04-28. Retrieved 2009-03-24., p. 1040.

[13] E.g., Christian Lindig (2005). "Random testing of C calling conventions" (PDF). Proceedings of the Sixth International Workshop on Automated Debugging. ACM. ISBN 1-59593-050-7. Archived from the original (PDF) on 2011-07-11. Retrieved 2009-03-24., and Eric Eide; John Regehr (2008). "Volatiles are miscompiled, and what to do about it" (PDF). Proceedings of the 7th ACM international conference on Embedded software. ACM. ISBN 978-1-60558-468-3. Retrieved 2009-03-24.

[14] Flash Sheridan (2007). "Practical Testing of a C99 Compiler Using Output Comparison" (PDF). Software: Practice and Experience. 37 (14): 1475–1488. arXiv:2202.07390. doi:10.1002/spe.812. S2CID 9752084. Retrieved 2009-03-24. Bibliography at "Compiler Testing Bibliography". Retrieved 2009-03-13..

[15] "Source of Fortran validation suite". Retrieved 2011-09-01.

[16] "Source of Cobol validation suite". Retrieved 2011-09-01.

[17] Chen, Yang; Groce, Alex; Zhang, Chaoqiang; Wong, Weng-Keen; Fern, Xiaoli; Eide, Eric; Regehr, John (2013). "Taming compiler fuzzers". Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI '13. New York, NY, USA: ACM. pp. 197–208. CiteSeerX 10.1.1.308.5541. doi:10.1145/2491956.2462173. ISBN 9781450320146. S2CID 207205614.

[18] Regehr, John; Chen, Yang; Cuoq, Pascal; Eide, Eric; Ellison, Chucky; Yang, Xuejun (2012). "Test-case reduction for C compiler bugs". Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI '12. New York, NY, USA: ACM. pp. 335–346. doi:10.1145/2254064.2254104. ISBN 9781450312059. S2CID 1025409.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]