Poster here. I re-read this paper about once a year. I continue to think that it may be one of the most important papers I've read. As someone who works on high-reliability safety-critical real-time systems (automotive, avionics), being able to work in an environment where I could prove semantic properties of assembly code is pretty close to my dream -- the cost of demonstrating code correct is already so much higher than the cost of writing the code that the incremental cost of going to assembly seems minor, if there's even a small payback in time to demonstrate correctness. In practice, I think my dream language would be a relatively rich macro assembler plus a (guided, greedy) register allocator, designed entirely to simplify and allow semantic proofs. I don't really have a sense as to whether Coq (Rocq) is the right answer today, vs other options; or if there's a newer literature advancing this approach; but I'd truly love to see a deeper focus of making low level languages more useful in this space, rather than moving towards more constrained, higher level languages.
During the days I was studying/working with Coq, one visiting professor gave a presentation on defense software design. An example presented was control logic for F-16, which the professor presumably worked on. A student asked how do you prove "correctness", i.e. operability, of a jet fighter and its control logic? I don't think the professor had a satisfying answer.
My question is the same, albeit more technically refined. How do you prove the correctness of a numerical algorithm (operating on a quantized continuum) using type-theoretic/category-theoretic tools like theorem provers like Coq? There are documented tragedies where numerical rounding error of the control logic of a missile costed lives. I have proved mathematical theorems before (Curry-Howard!) but they were mathematical object driven (e.g. sets, groups) not continuous numbers.
You use floating point numbers instead of real numbers in your theorems and function definitions.
This sounds flippant, but I'm being entirely earnest. It's a significantly larger pain because floating point numbers have some messy behavior, but the essential steps remain the same. I've proved theorems about floating point numbers, not reals. Although, again, it's a huge pain, and when I can get away with it I'd prefer to prove things with real numbers and assume magically they transfer to floating point. But if the situation demands it and you have the time and energy, it's perfectly fine to use Coq/Rocq or any other theorem prover to prove things directly about floating point arithmetic.
The article itself is talking about an approach sufficiently low level that you would be proving things about floating point numbers because you would have to be since it's all assembly!
There's nothing category theoretic or even type theoretic about the entities you are trying to prove with the theorem prover. Type theory is merely the "implementation language" of the prover. (And even if there was there's nothing tying type theory or category theory to the real numbers and not to floats)
> when I can get away with it I'd prefer to prove things with real numbers and assume magically they transfer to floating point.
True for some approaches, but numerical analysis does account for machine epsilon and truncation errors.
I am aware that Inria works with Coq as your link shows. However, the link itself does not answer my question. As a concrete example, how would you prove an implementation of a Kalman filter is correct?
There is nothing inherently difficult about practical implementations of continuous numbers for automated reasoning compared to more discrete mathematical structures. They are handleable by standard FOL itself.
I'm curious about how you'd do that, too. I haven't tried doing anything like that, but I'd think you'd start by trying to formalize the usual optimality proof for the Kalman filter, transfer it to actual program logic on the assumption that the program manipulates real numbers, try to get the proof to work on floating-point numbers, and finally extract the program from the proof to run it.
https://youtu.be/_LjN3UclYzU has a different attempt to formalize Kalman filters which I think we can all agree was not a successful formalization.
Just to clarify for others because it’s a tiny bit clickbaity:
The Patriot missile didn’t kill 28 people accidentally, it simply failed to intercept an enemy missile.
And it wasn’t launched on an incorrect trajectory either, the radar was looking at a slightly wrong distance window and lost track. Furthermore, the error only starts having an effect after 100 hours of operation, and it seems to have only been problematic with the faster missiles in Iraq that the system wasn’t designed for. They rushed the update and they did actually write a function to deal with this exact numerical issue, but during the refactor they missed one place where it should have been used.
28 lives are obviously significant, but just to note that there are many mitigating factors.
No, I haven't, but I want to. I haven't had the opportunity to go deep into formal methods for a work project yet, and on personal projects I've mostly played with Ada SPARK, not Coq. I've played with Coq a little (to the extent of re-implementing the linked paper for a subset of a different ISA), but there's definitely a learning curve. One of my personal benchmark projects is to implement a (fixed-maximum-size) MMM heap in a language, and convince myself it's both correct and of high performance; and I've yet to achieve that in any language, even SPARK, without leaning heavily on testing and hand-waving along with some amount of formal proof. I've written the code and comprehensive testing in various assembly languages (again, this is sort of my benchmark bring-up-a-new-platform-in-the-brain example), and I /think/ that for the way I think proving properties as invariants maintained across individual instructions maps pretty well to how I convince myself it's correct, but I haven't had the time, depth, or toolchain to really bottom any of this out.
The second your macro-assembler has a significantly complex syntax, you are done: feature creep for 2042 features then 2043 then 2044... and it mechanically reduces the number the alternative implementations and make the effort required to develop a new more and more unreasonable.
In this end, it is all about the complexity and stability in time for this matter.
This paper reminds me of a class assignment in grad school where the prof asked the students to write a compiler in Coq for some toy Turning-complete language in a week. Having no background in compiler design or functional programming, I found it daunting at first, but eventually managed it. The Coq language's rigor really helps with something like this.
I wonder if AI's compute graph would benefit from a language-level rigor as of Coq.
It's a bit of an awkward syntax to get a reliable assembler. Does it at least allow you to prove the behaviour of a larger block of assembly? For example, could I use it to prove that a block of assembly is equivalent to a given set of operations?
That doesn't seem right to me. If the problem is that it has too many instructions and addressing modes, you can decide to only use a small subset of those instructions and addressing modes, which really isn't much of a handicap for implementing functionality. (It doesn't help with analyzing existing code, but neither does a powerful assembler.)
I'm no expert on assembly-language programming, but probably 90% of the assembly I write on i386, amd64, RISC-V, and ARM is about 40 instructions: ldr, mov, bl, cmp, movs, push, pop, add, b/jmp, bl/blx/call, ret, str, beq/jz, bne/jnz, bhi/ja, bge/jge, cbz, stmia, ldmia, ldmdb, add/adds, addi, sub/subs, bx, xor/eor, and, or/orr, lsls/shl, lsrs/sar/shr, test/tst, inc, dec, lea, and slt, I think. Every once in a while you need a mul or a div or something. But the other 99% of the instruction set is either for optimized vectorized inner loops or for writing operating system kernels.
I think that the reason that i386 assembly (or amd64 assembly) is error-prone is something else, something it has in common with very simple architectures and instruction sets like that of the PDP-8.
> I think that the reason that i386 assembly (or amd64 assembly) is error-prone is something else, something it has in common with very simple architectures and instruction sets like that of the PDP-8.
What reason is that? (And, if it's not obvious, what are ARM/RISC-V doing that make them less bad?)
I don't think they're particularly less bad. All five architectures just treat memory as a single untyped array of integers, and registers as integer global variables. Their only control structure is goto (and conditional goto.) If you forget to pass an argument to a subroutine, or pass a pointer to an integer instead of the integer, or forget to save a callee-saved register you're using, or do a signed comparison on an unsigned integer, or deallocate memory and then keep using it, or omit or jump past the initialization of a local variable, conventional assemblers provide you no help at all in finding the bug. You'll have to figure it out by watching the program give you wrong answers, maybe single-stepping through it in a debugger.
There are various minor details of one architecture or the other that make them more or less bug-prone, but those are minor compared to what they have in common.
None of this is because the instruction sets are complex. It would be closer to the mark to say that it's because they are simple.
I have a postit note idea that says simply "typesafe macro assembler".
I've not fleshed this out yet, but I think a relatively simple system would help deal with all the issues you mention in the first paragraph while allowing escape hatches.
Poster here. I re-read this paper about once a year. I continue to think that it may be one of the most important papers I've read. As someone who works on high-reliability safety-critical real-time systems (automotive, avionics), being able to work in an environment where I could prove semantic properties of assembly code is pretty close to my dream -- the cost of demonstrating code correct is already so much higher than the cost of writing the code that the incremental cost of going to assembly seems minor, if there's even a small payback in time to demonstrate correctness. In practice, I think my dream language would be a relatively rich macro assembler plus a (guided, greedy) register allocator, designed entirely to simplify and allow semantic proofs. I don't really have a sense as to whether Coq (Rocq) is the right answer today, vs other options; or if there's a newer literature advancing this approach; but I'd truly love to see a deeper focus of making low level languages more useful in this space, rather than moving towards more constrained, higher level languages.
During the days I was studying/working with Coq, one visiting professor gave a presentation on defense software design. An example presented was control logic for F-16, which the professor presumably worked on. A student asked how do you prove "correctness", i.e. operability, of a jet fighter and its control logic? I don't think the professor had a satisfying answer.
My question is the same, albeit more technically refined. How do you prove the correctness of a numerical algorithm (operating on a quantized continuum) using type-theoretic/category-theoretic tools like theorem provers like Coq? There are documented tragedies where numerical rounding error of the control logic of a missile costed lives. I have proved mathematical theorems before (Curry-Howard!) but they were mathematical object driven (e.g. sets, groups) not continuous numbers.
You use floating point numbers instead of real numbers in your theorems and function definitions.
This sounds flippant, but I'm being entirely earnest. It's a significantly larger pain because floating point numbers have some messy behavior, but the essential steps remain the same. I've proved theorems about floating point numbers, not reals. Although, again, it's a huge pain, and when I can get away with it I'd prefer to prove things with real numbers and assume magically they transfer to floating point. But if the situation demands it and you have the time and energy, it's perfectly fine to use Coq/Rocq or any other theorem prover to prove things directly about floating point arithmetic.
The article itself is talking about an approach sufficiently low level that you would be proving things about floating point numbers because you would have to be since it's all assembly!
But even at a higher level you can have theorems about floating point numbers. E.g. https://flocq.gitlabpages.inria.fr/
There's nothing category theoretic or even type theoretic about the entities you are trying to prove with the theorem prover. Type theory is merely the "implementation language" of the prover. (And even if there was there's nothing tying type theory or category theory to the real numbers and not to floats)
I have been curious about this. Where can you find definitions for the basic operations to build up from?
IEE754 does a good job explaining the representation, but it doesn't define all the operations and possible error codes as near as I can tell.
Is it just assumed "closest representable number to the real value" always?
What about all the various error codes?
> when I can get away with it I'd prefer to prove things with real numbers and assume magically they transfer to floating point.
True for some approaches, but numerical analysis does account for machine epsilon and truncation errors.
I am aware that Inria works with Coq as your link shows. However, the link itself does not answer my question. As a concrete example, how would you prove an implementation of a Kalman filter is correct?
There is nothing inherently difficult about practical implementations of continuous numbers for automated reasoning compared to more discrete mathematical structures. They are handleable by standard FOL itself.
See ACL2's support for floating point arithmetic.
https://www.cs.utexas.edu/~moore/publications/double-float.p...
SMT solvers also support real number theories:
https://shemesh.larc.nasa.gov/fm/papers/nfm2019-draft.pdf
Z3 also supports real theories:
https://smt-lib.org/theories-Reals.shtml
I'm curious about how you'd do that, too. I haven't tried doing anything like that, but I'd think you'd start by trying to formalize the usual optimality proof for the Kalman filter, transfer it to actual program logic on the assumption that the program manipulates real numbers, try to get the proof to work on floating-point numbers, and finally extract the program from the proof to run it.
https://youtu.be/_LjN3UclYzU has a different attempt to formalize Kalman filters which I think we can all agree was not a successful formalization.
You use reducing rationals everywhere you can, not floast.
> There are documented tragedies where numerical rounding error of the control logic of a missile costed lives.
curious about this
This is likely a reference to the Patriot missile rounding issue that arguably led to 28 deaths.
https://www-users.cse.umn.edu/~arnold/disasters/Patriot-dhar...
https://www.gao.gov/assets/imtec-92-26.pdf
Just to clarify for others because it’s a tiny bit clickbaity:
The Patriot missile didn’t kill 28 people accidentally, it simply failed to intercept an enemy missile.
And it wasn’t launched on an incorrect trajectory either, the radar was looking at a slightly wrong distance window and lost track. Furthermore, the error only starts having an effect after 100 hours of operation, and it seems to have only been problematic with the faster missiles in Iraq that the system wasn’t designed for. They rushed the update and they did actually write a function to deal with this exact numerical issue, but during the refactor they missed one place where it should have been used.
28 lives are obviously significant, but just to note that there are many mitigating factors.
Have you found Coq or other formal-methods tooling useful?
No, I haven't, but I want to. I haven't had the opportunity to go deep into formal methods for a work project yet, and on personal projects I've mostly played with Ada SPARK, not Coq. I've played with Coq a little (to the extent of re-implementing the linked paper for a subset of a different ISA), but there's definitely a learning curve. One of my personal benchmark projects is to implement a (fixed-maximum-size) MMM heap in a language, and convince myself it's both correct and of high performance; and I've yet to achieve that in any language, even SPARK, without leaning heavily on testing and hand-waving along with some amount of formal proof. I've written the code and comprehensive testing in various assembly languages (again, this is sort of my benchmark bring-up-a-new-platform-in-the-brain example), and I /think/ that for the way I think proving properties as invariants maintained across individual instructions maps pretty well to how I convince myself it's correct, but I haven't had the time, depth, or toolchain to really bottom any of this out.
What's a MMM heap? Is it a typo on min-max heap?
A min-max-median heap, I would assume.
I have found huge value in CBMC and KLEE for statically verifying C code for real time safety critical embedded applications.
ACL2 is also VERY powerful and capable.
The second your macro-assembler has a significantly complex syntax, you are done: feature creep for 2042 features then 2043 then 2044... and it mechanically reduces the number the alternative implementations and make the effort required to develop a new more and more unreasonable.
In this end, it is all about the complexity and stability in time for this matter.
This paper reminds me of a class assignment in grad school where the prof asked the students to write a compiler in Coq for some toy Turning-complete language in a week. Having no background in compiler design or functional programming, I found it daunting at first, but eventually managed it. The Coq language's rigor really helps with something like this.
I wonder if AI's compute graph would benefit from a language-level rigor as of Coq.
The canonical URL, dating back to at least 02015, may be http://research.microsoft.com/en-us/um/people/nick/coqasm.pd..., which now redirects to https://www.microsoft.com/en-us/research/publication/coq-wor.... This site presumably belongs to the Nick Benton who is one of its authors.
It's a bit of an awkward syntax to get a reliable assembler. Does it at least allow you to prove the behaviour of a larger block of assembly? For example, could I use it to prove that a block of assembly is equivalent to a given set of operations?
The x86 architecture and instruction set is complex - so it absolutely needs a powerful assembler to help prevent mistakes.
That doesn't seem right to me. If the problem is that it has too many instructions and addressing modes, you can decide to only use a small subset of those instructions and addressing modes, which really isn't much of a handicap for implementing functionality. (It doesn't help with analyzing existing code, but neither does a powerful assembler.)
I'm no expert on assembly-language programming, but probably 90% of the assembly I write on i386, amd64, RISC-V, and ARM is about 40 instructions: ldr, mov, bl, cmp, movs, push, pop, add, b/jmp, bl/blx/call, ret, str, beq/jz, bne/jnz, bhi/ja, bge/jge, cbz, stmia, ldmia, ldmdb, add/adds, addi, sub/subs, bx, xor/eor, and, or/orr, lsls/shl, lsrs/sar/shr, test/tst, inc, dec, lea, and slt, I think. Every once in a while you need a mul or a div or something. But the other 99% of the instruction set is either for optimized vectorized inner loops or for writing operating system kernels.
I think that the reason that i386 assembly (or amd64 assembly) is error-prone is something else, something it has in common with very simple architectures and instruction sets like that of the PDP-8.
> I think that the reason that i386 assembly (or amd64 assembly) is error-prone is something else, something it has in common with very simple architectures and instruction sets like that of the PDP-8.
What reason is that? (And, if it's not obvious, what are ARM/RISC-V doing that make them less bad?)
I don't think they're particularly less bad. All five architectures just treat memory as a single untyped array of integers, and registers as integer global variables. Their only control structure is goto (and conditional goto.) If you forget to pass an argument to a subroutine, or pass a pointer to an integer instead of the integer, or forget to save a callee-saved register you're using, or do a signed comparison on an unsigned integer, or deallocate memory and then keep using it, or omit or jump past the initialization of a local variable, conventional assemblers provide you no help at all in finding the bug. You'll have to figure it out by watching the program give you wrong answers, maybe single-stepping through it in a debugger.
There are various minor details of one architecture or the other that make them more or less bug-prone, but those are minor compared to what they have in common.
None of this is because the instruction sets are complex. It would be closer to the mark to say that it's because they are simple.
I have a postit note idea that says simply "typesafe macro assembler".
I've not fleshed this out yet, but I think a relatively simple system would help deal with all the issues you mention in the first paragraph while allowing escape hatches.
They are going by Rocq today
I understand and respect the renaming, Coq was a much cooler name though.
It was.
Renaming languages to suit American taste really grates.
Nimrod to Nim was even worse. Apparently Americans cannot get Biblical references.
The linked PDF was created on 8/5/13.
Then the title needs (2013), no?
Yep, sorry. Added.
A Lisp can mimic Prolog and the rigour of Coq with ease.