Isabelle's code generator -- Isabelle Club Meeting on 28 Sep. 2023

Personnes présentes

Burkhart Wolff
Frédéric Voisin
Frédéric Boulanger
Claude Stolze
Adrien Durier
Nicolas Meric
Benoit Ballenghuin
Frederic Blanqui
...

Presentators

Burkhart Wolff

Introductory Example:

We start with the definition of a data type in Isabelle:

theory Scratch
  imports Main
begin

text‹For browsing purposes: @{file "$ISABELLE_HOME/src/Pure/ROOT.ML"}}›

section‹Introductory Example›

type_synonym atom = bool
type_synonym var = string

datatype binop = And | Or | Impl

(* inductive definition of abstract syntax trees for formula *)

datatype form = 
     Var  "var" 
   | Atom "atom"
   | Neg  "form"
   | Bin  "binop" "form" "form"

By opening an ML cartouche, we switch to the Isabelle/ML interface and can access the SML code generated from the type definition through antiquotations (the {code xyz} stuff):

ML‹ 
   val and_op = @{code And}
   val or_op  = @{code Or}
   val var =  @{code Var}
   val atom = @{code Atom}
   val bin =  @{code Bin};

   (* ... and some computing : *)
   bin (and_op, atom true, atom false) 
›

The round trip Isabelle-SML-Isabelle is achieved through the value command that exploits the code generator to compute SML values and interpret them back as terms in Isabelle/HOL:

value‹Bin And (Atom True) (Atom False)›

Well, so far, these have only been constructors: But this works for function definitions, too:

fun revert :: "form ⇒ form" 
  where "revert (Bin X A B) = Bin X B A"
  |     "revert X = X"

value "revert (Bin And (Atom True) (Atom False))" ― ‹ground term evaluation›

This even works if the term is non-ground, by using the refined technique of "normalization-by-evaluation" (nbe) described in

   Klaus Aehlig, Florian Haftmann, Tobias Nipkow:
   A compiled implementation of normalisation by evaluation. 
   J. Funct. Program. 22(1): 9-30 (2012)

value "revert (Bin And (Atom A) (Atom B))"        ― ‹term evaluation via nbe›

Generating Modules in different Target Languages

The generated code can also be exported to a file in the virual file system of Isabelle. This file can be accessed by clicking on the See theory exports link that appears in the Output windows of Isabelle/JEdit when the cursor is next to the export_code command:

Ok, this is a bit clumsy for every day use. A more high-level construct to generate code is this:

export_code revert in SML module_name TTT

... which gives in the Isabelle's virtual file-system the export1.ML file. Navigation to it shows:

structure TTT : sig
  type char
  type binop
  type form
  val revert : form -> form
end = struct

datatype char = Chara of bool * bool * bool * bool * bool * bool * bool * bool;

datatype binop = And | Or | Impl;

datatype form = Var of char list | Atom of bool | Neg of form |
  Bin of binop * form * form;

fun revert (Bin (x, a, b)) = Bin (x, b, a)
  | revert (Var v) = Var v
  | revert (Atom v) = Atom v
  | revert (Neg v) = Neg v;

end; (*struct TTT*)

The target language can be changed, the supported languages are for now : SML, Ocaml, Haskell, and Scala. There is also experimental support for F#.

Changing the target language works like this:

export_code revert in OCaml module_name TTT

... which produces this:›

module TTT : sig
  type char
  type binop
  type form
  val revert : form -> form
end = struct

type char = Chara of bool * bool * bool * bool * bool * bool * bool * bool;;

type binop = And | Or | Impl;;

type form = Var of char list | Atom of bool | Neg of form |
  Bin of binop * form * form;;

let rec revert = function Bin (x, a, b) -> Bin (x, b, a)
                 | Var v -> Var v
                 | Atom v -> Atom v
                 | Neg v -> Neg v;;

end;; (*struct TTT*)

Configurations for Compiling to Target Language

It is possible to configure the code generator such that it maps HOL-types and functions there over to (built-in) types and operations in the target language.

In the Isabelle documentation panel, there is an extensive description of the code generator:

   Code generation from Isabelle/HOL theories
   by Florian Haftmann with contributions from Lukas Bulwahn

An architectural overview drawn from it looks like this:

It describes the entire process as a three phase process consisting of:

"pre-processing"
"translation"
and "serialization"

and even gives a methodology of data refinement. Translations are usually derived theorems and the translation is conceived as a rewriting process over these.

As for the serialization of terms related to lists, see the "code-printing" configuration in theory ""HOL.List"", line 7925 ff., "code-printing".

code_printing
  type_constructor list ⇀
    (SML) "_ list"
    and (OCaml) "_ list"
    and (Haskell) "![(_)]"
    and (Scala) "List[(_)]"
| constant Nil ⇀
    (SML) "[]"
    and (OCaml) "[]"
    and (Haskell) "[]"
    and (Scala) "!Nil"
| class_instance list :: equal ⇀
    (Haskell) -
| constant "HOL.equal :: 'a list ⇒ 'a list ⇒ bool" ⇀
    (Haskell) infix 4 "=="

setup ‹fold (List_Code.add_literal_list) ["SML", "OCaml", "Haskell", "Scala"]›

For more couples - and brutal - code-generator setup, see entire theory ‹HOL.Code_Abstract_Char›, see theory ‹Library.Code_Abstract_Float› which maps reals to floats, thus clearly incorrect code.

Code generation and evaluation can be 5 orders of magnitude faster than simplification, and nbe is somewhere in between. On the other hand, the eample of mapping reals to floats shows that these kind of computations have to be considered harmful, i.e. as untrusted results in principle. However, standard configurations tend to be a good pragmatic compromise between logical safeness and efficiency.

Proof Methods using the Code Generator:

There are proof-methods based on the code-generator and resulting evaluation principes.›

lemma L1 : "revert (Bin And (Atom True) (Atom False)) =  (Bin And (Atom False) (Atom True))"
  by(eval)

lemma L2 : "revert (Bin And (Atom X) (Atom Y)) =  (Bin And (Atom Y) (Atom X))"
  by(normalization)

lemma L3 : "revert (Bin And (Atom X) (Atom Y)) =  (Bin And (Atom Y) (Atom X))"
  by(simp)

Seen from the Isabelle kernel, eval and nbe are registered as extra-logical inferences, as ‹oracles› that contributed to the proof.

Because trust-issues, thm's are "tainted" with their oracles, which can be revealed as follow by the following inspection commands:

thm_oracles L1

thm_oracles L2

thm_oracles L3

... which each displays which oracles have been used to construct the "thm".

Reflection for Isabelle Extensions ...

The described technology has been used in our DOF implementation to generate parts of its implementation: an AFP entry providing an executable theory of Regular expressions, their languages, their non-deterministic and deterministic automata, and the resulting acceptance function has been generated to code, exported from the virtual file-system to the running SML environment, and integrated for some functionality in DOF. More details to be seen in the theory RegExpInterface used in DOF.

You can download the whole theory file used during this session.

Previous meeting -- Next meeting