OCaml code generation tutorial

From Gallium

Jump to: navigation, search

This page deals with OCaml code generation in Camlp4. It is recommended that you read the Syntax extension tutorial first (if this is not already done).



What we want to do here is generate OCaml code from a DSL (Domain Specific Language) that allows easy manipulation of vectors using the following syntax:

  • 42 is the scalar 42.
  • [1, 2, 3] is the list [ 1. ; 2. ; 3. ], and we allow mixing types: [1, 2., 3], [1., 2., 3.] or [1,2,3] all represent the same list.
  • a + b is the sum of the two vectors a and b.
  • a * b is the scalar multiplication of the vector b by the scalar a.

First step: a syntax extension

Just like in the Syntax extension tutorial, we will start from an empty grammar, and use original syntax. Another tutorial will deal with extending OCaml syntax.

So we start with the following syntax extension:


open Camlp4.PreCast

type vec =
    | Scalar of string
    | Vector of string list
    | Sum of vec * vec
    | ScalarProduct of vec * vec

let expression = Gram.Entry.mk "expression"

We define our vector type as being either a scalar, a vector, a sum of two vectors, or a scalar product. As before, we have our global expression entry "expression".

EXTEND statement

  GLOBAL: expression;

    [ "sum" LEFTA
      [ x = SELF; "+"; y = SELF -> Sum (x, y) ]
    | "scalar" LEFTA
      [ x = SELF; "*"; y = SELF -> ScalarProduct (x, y) ]
    | "simple" NONA
      [ "("; e = SELF; ")" -> e
      | s = scalar -> Scalar s
      | v = vector -> v ] ];

    [ [ `INT (i, _) -> string_of_float (float i)
    | `FLOAT (_, f) -> f ] ];

    [ [ "["; v = LIST1 [ s = scalar -> s ] SEP ","; "]" -> Vector v ] ];


In addition to what we've seen in the Syntax extension tutorial, we have `FLOAT (whose meaning should not be too hard to guess), and LIST1. LIST1 will try to build a (non-empty) list of "scalar"s separated by ",". LIST0 does the same, but allows an empty list as well.

Please note that `INT and `FLOAT have two arguments: the first one is the interpreted content (an "int" for `INT and a "float" for `FLOAT), and the second one is the un-interpreted content, as a string. The string is useful when you want to keep the number the same way you entered it: "3.0000" will be printed "3.0000", not "3.". But here, since we convert integers to float, we have a mixed solution.

Defining the location

let _loc = Loc.mk "<string>" 

A location is necessary in order to build syntactic trees. In our small example we don't need to have several locations so we create one named "_loc" as a function. When calling a tree builder function without giving a location in parameter, "_loc" will be automatically given.


let parse_and_generate_code str =
  let e = Gram.parse_string expression _loc str in
   (* generate_code *) e

We will define the "generate_code" function in the second part of this tutorial. Then you will need to remove the comment. Note that we have to explicitly give the location in parameter of the "Gram.parse_string" function.


let _ =
  print_string "# ";
    let str = read_line () in
	let e = parse_and_generate_code str in
            ignore (e)

This main will be modified in order to display the syntactic tree that we are going to build.

Ok, now let's move on!

Second step: generating code from our DSL AST

Here comes the interesting things. Once we have defined our grammar and our parser we can start generating code. Our parser returns a structure of imbricated vec so we just have to filter this structure and create the AST while doing this.

Let's complete our main function and then implement the "generate_code" function.

The main branch

We want the code that we are going to generate to be executable in byte code so we will create a "let _" function. In the quotation syntax (see Abstract Syntax Tree) the top level items such as functions, modules, types, classes, etc... are represented by a <:str_item < .. >> quotation. We can put whatever we want in place of the dots : other AST branchs (between $) or text that does not depend on the parsed expression.

Note that if you want to create several top level items in your generated code you will have to make a list of <:str_item< .. >> quotations.

So here is our completed main :

let _ =
    print_string "# ";
    let str = read_line () in
	let e = parse_and_generate_code str in		
	    let ast_e = <:expr< $e$ >> in
		Camlp4.PreCast.Printers.OCaml.print_implem <:str_item< let _ = let res = $ast_e$ in print_float res >> 

Let's detail this a little :

  • We first create a "ast_e" variable which will contain the AST generated by the "generate_code" function (the AST of our parsed expression)
  • The "Camlp4.PreCast.Printers.OCaml.print_implem" function allows to print AST contained in a <:str_item< .. >> quotation.
  • In our <:str_item< .. >> quotation we create the "let _" main function. We also create a "res" variable whose expression will be the content of our "ast_e" variable, i.e. the AST of our parsed expression.

The generate_code function

This function must obviously be defined before the "parse_and_generate_code" function which calls it in order to avoid the "unbound variable" error.

Since we only parse expressions, we will only need the <:expr< .. >> quotation.

let rec generate_code = function 
	| Scalar s -> <:expr< $flo:s$ >>

$flo:s$ means the value of "s" as float

| Vector vlist -> List.fold_right 
			(fun x l -> <:expr< $flo:x$ :: $l$ >>) 
			<:expr< [] >>

This creates the expression of the float list representing the vector. The way of creating an AST representing a list is a little complicated. I used the warning found in the Ocaml documentation http://caml.inria.fr/pub/old_caml_site/camlp4/manual/manual010.html#toc36

| Sum (v1, v2) -> let ast_v1 = generate_code  v1
		  and ast_v2 = generate_code  v2
			(match (v1, v2) with
			| Scalar _, Scalar _ 	-> <:expr< ( +. ) $ast_v1$ $ast_v2$ >>
			|  _, _			-> <:expr< List.map2 ( +. ) $ast_v1$ $ast_v2$ >>)

We recursively generate the AST for the Sum expression parameters and we check if we want to sum 2 scalars or 2 things asserted to be vectors. In the first case we generate the code for a simple addition whereas in the other case we use the "List.map2" function in order to add the 2 vectors. Remark : if we want to sum a scalar and a vector the code will be generated with no error but it will crash when compiling ("this function is float but is here used with float list" message)

| ScalarProduct (v1, v2) ->let ast_v1 = generate_code  v1
			   and ast_v2 = generate_code  v2
				(match (v1, v2) with
				| Scalar _, Scalar _ -> <:expr< ( *. ) $ast_v1$ $ast_v2$ >>
				| Scalar _, _ 	     -> <:expr< List.map ( fun a -> a *. $ast_v1$ ) $ast_v2$  >>
				| _, Scalar _	     -> <:expr< List.map ( fun a -> a *. $ast_v2$ ) $ast_v1$ >>
				| _, _		     -> <:expr< List.fold_right 
                                                                   ( +. ) 
                                                                   (List.map2 ( *. ) $ast_v1$ $ast_v2$ ) 
                                                                    0. >>)

This will do nearly the same thing as above for ScalarProduct expressions.

Compiling and testing

Since we use original syntax, it is camlp4of:

ocamlc -I +camlp4 -pp camlp4of.opt camlp4lib.cma pa_vector.ml
camlp4 -parser pa_vector.cmo

Just type in the expression you want to parse after the "#".

To go further

See Abstract Syntax Tree for a complete list of quotations, their signification and equivalent AST node.

Personal tools
Espace privé