Reflective OCaml

From Gallium

Jump to: navigation, search

Contents

Requirements

In this article we suppose that concepts like quotations, antiquotations, ASTs are understood.

The article strings with expansion also give a good start about feeling natural with quotations and antiquotations.

Motivations

As you certainly know, the role of Camlp4 is to give syntactic tools to work with languages like OCaml.

When writing programs (parsers, compilers, printers, transformations, meta-programs and so on...) for such languages one needs to construct and match terms of the language.

The internal representation of programs is often an algebraic data type called an Abstract Syntax Tree.

Using concrete syntax instead of the abstract one is useful for several reasons:

  • Being abstract and independent of the internal representation.
  • Have more concise programs.

Camlp4 features a general mechanism for embedding sub-languages, the so-called quotations. It's trough this system that reflection is provided.

Reflection by examples

Placing a quotation in position of expression

In a position of expression one can put a quotation. There is as many predefined expanders at this position as syntactic categories.

Example of expression positions (denoted by e):

fun x -> e
e e (* the application *)
match e with p -> e | ... | p -> e
let x = e in e
if e then e else e
while e do e done
for x = e to e do e done
...

Examples of quotations in position of expression:

(* An expression *)
<:expr< 3 + 4 >>

(* A type *)
<:ctyp< int -> string >>

(* A structure item, more precisely a type declaration *)
<:str_item< type t = { foo : int ; bar : string } >>

(* A signature item, more precisely a module declaration *)
<:sig_item< module M : S >>

Placing a quotation in position of pattern

In a position of pattern one can put a quotation. There is as many predefined expanders at this position as syntactic categories.

Example of pattern positions (denoted by p):

(* patterns inside expressions *)
fun p -> e
function p -> e | ... | p -> e
match e with p -> e | ... | p -> e
let p = e and ... and p = e in e
...

(* patterns inside patterns *)
p | p     (* Or-patterns *)
p .. p    (* A range *)
(p as p)  (* A named pattern *)
...

Examples of quotations in position of patterns:

(* An expression *)
function <:expr< 3 + 4 >> -> 7

(* A type *)
function <:ctyp< int * string >> -> ...

match ... with
| <:str_item< type t = { foo : int ; bar : string } >> -> ...
| <:str_item< open $ident$ >> -> ...
| ...

Building any piece of AST easily

Building a match expression

Imagine that you have a list of patterns and a list of expressions and that you want to pair these lists as a match construct (or a function).

General picture:

 gen [p1; ...; pN] [e1; ...; eN] = << function p1 -> e1 | ... | pN -> eN >>

Basically to do that one just need to know the syntax of what we want to produce. Here it's a function:

 e ::= ...
     | function p -> e | ... | p -> e

Now Camlp4 let you build terms using smaller pieces:

 e ::= ...
     | function mc
 mc ::= ...
      |          (* The empty match-case (like [] for lists or None for option) *)
      | p -> e   (* The arrow. In fact that's << p when e -> e >> and this is a sugar using an empty guard. *)
      | mc | mc  (* Chaining match-cases *)

Now the real code:

 $ cat get_match_case.ml
 open Camlp4.PreCast;;
 
 let gen patts exprs =
   let cases =
     List.fold_right2 begin fun patt expr acc ->
       let _loc = Loc.merge (Ast.loc_of_patt patt) (Ast.loc_of_expr expr) in
       <:match_case< $patt$ -> $expr$ | $acc$ >>
     end patts exprs <:match_case@here<>>
   in
   let _loc = Ast.loc_of_match_case cases in
   <:expr< function $cases$ >>
 ;;
 $ ocamlc.opt -c -I +camlp4 -pp camlp4of -o gen_match_case.cmo gen_match_case.ml

In this code the important construction is:

 <:match_case< $patt$ -> $expr$ | $acc$ >>

where one chain one case to the already built cases, combining "p -> e" and "mc | mc".

Note 1: In this function one even takes the "luxe" (since most of the time people use the empty location everywhere) to synthesis locations (this perhaps don't make sense, it depends on the input).

Note 2: In this function the syntax of quotations is the original syntax.

Building an sum-type

Let's take an example, you want to generate an algebraic data type declaration given an integer N.

Example for N = 7:

 (* In the original syntax *)
 type t7 =
   | C1 of t7
   | C2 of t7 * t7
   | C3 of t7 * t7 * t7
   | C4 of t7 * t7 * t7 * t7
   | C5 of t7 * t7 * t7 * t7 * t7
   | C6 of t7 * t7 * t7 * t7 * t7 * t7
   | C7 of t7 * t7 * t7 * t7 * t7 * t7 * t7
 
 (* In the revised syntax *)
 type t7 =
   [ C1 of t7
   | C2 of t7 and t7
   | C3 of t7 and t7 and t7
   | C4 of t7 and t7 and t7 and t7
   | C5 of t7 and t7 and t7 and t7 and t7
   | C6 of t7 and t7 and t7 and t7 and t7 and t7
   | C7 of t7 and t7 and t7 and t7 and t7 and t7 and t7 ];

Let's keep the revised syntax since since it's less ambiguous:

  • "and" is used instead of "*" in order to avoid conflicts with tuples.
  • Brackets wrap the whole declaration roughly helps to keep a clean LL(1) parsing.

To produce such a declaration one just need to know the syntax:

 [ C1 of t11 and ... and t1K_1 | ... | CM of tN1 and ... and tNK_N ]

Camlp4 therefor allows you to build terms using smaller pieces:

 t ::= ...
    |         (* The empty type (like [] for lists or None for option) *)
    | uid     (* An upper case identifier *)
    | lid     (* A lower case identifier *)
    | t of t  (* A data-constructor declaration *)
    | t and t (* Chaining data-constructor arguments *)
    | t | t   (* Chaining data-constructor declarations *)
    | [ t ]   (* The sum-type envelop *)

One can then write a program to generate the type declaration. So let's start with "t and t and ... and t":

 let data_constructor_arguments _loc n t =
   let rec self n =
     if n <= 0 then <:ctyp<>> else <:ctyp< $t$ and $self (n-1)$ >>
   in self n
 ;;

When N is too low one just returns an empty type (like "[]" or "None"), otherwise one use the "t and t" construct, that is also written using the "ctyp" quotation, and two antiquotations (dollars) to make holes and fill them.

One can then write a function to produce "CN of t and ... and t" combining an "uid" antiquotation, the "t of t" construct, and the previous function:

 let data_constructor _loc n t =
   <:ctyp< $uid:"C"^string_of_int n$ of $data_constructor_arguments _loc n t$ >>
 ;;

Finally one can iterate over N to make all branches and tie them with the "t | t" construct, then one wrap it using "[ t ]".

 let type_gen _loc n t =
   let rec self n =
     if n <= 0 then <:ctyp<>>
     else <:ctyp< $self (n-1)$ | $data_constructor _loc n t$ >>
   in <:ctyp< [ $self n$ ] >>
 ;;

To give a runnable example, here is a Camlp4 filter that replaces "gen_type tN" by the generated declaration of size N.

 $ cat gen_type_N.ml
 open Camlp4.PreCast;;
 
 let data_constructor_arguments _loc n t =
   let rec self n =
     if n <= 0 then <:ctyp<>> else <:ctyp< $t$ and $self (n-1)$ >>
   in self n
 ;;
 
 let data_constructor _loc n t =
   <:ctyp< $uid:"C"^string_of_int n$ of $data_constructor_arguments _loc n t$ >>
 ;;
 
 let gen_type _loc n t =
   let rec self n =
     if n <= 0 then <:ctyp<>>
     else <:ctyp< $self (n-1)$ | $data_constructor _loc n t$ >>
   in <:ctyp< [ $self n$ ] >>
 ;;
 
 let filter =
   function
   | <:ctyp@_loc< gen_type $lid:x$ >> | <:ctyp@_loc< $lid:x$ gen_type >> ->
       Scanf.sscanf x "%[^0-9]%d" begin fun _ n ->
         gen_type _loc n <:ctyp< $lid:x$ >>
       end
   | t -> t
 ;;
 
 AstFilters.register_str_item_filter (Ast.map_ctyp filter)#str_item;;
 $ ocamlc.opt -c -I +camlp4 -pp camlp4orf -o gen_type_N.cmo gen_type_N.ml
 $ camlp4o ./gen_type_N.cmo -str 'type t7 = gen_type t7' -printer r
 type t7 =
   [ C1 of t7
   | C2 of t7 and t7
   | C3 of t7 and t7 and t7
   | C4 of t7 and t7 and t7 and t7
   | C5 of t7 and t7 and t7 and t7 and t7
   | C6 of t7 and t7 and t7 and t7 and t7 and t7
   | C7 of t7 and t7 and t7 and t7 and t7 and t7 and t7 ];

Building anything

By following the same path you can build and (de-construct) anything, including:

  • Type declarations:
    • tuples (t1 * ... * tN)
    • sum-types [ C1 of t1 | ... | CN of tN ]
    • records { f1 : t1; ...; fN : tN }
    • objects < f1 : t1; ...; fN : tN (..)? >
    • polymorphic variants [< `c1 of t1 | ... | `cN of tN ]
    • a lot more...
  • Expressions:
    • bindings let p1 = e2 and ... and pN = eN in eM
    • abstractions fun p1 ... pN -> e
    • match-cases match e with p1 -> e1 | ... | pN -> eN
    • a lot more...
  • And also patterns, modules, classes...

Pitfalls

The gap between the revised syntax and the original can be somewhat annoying.

Things that are not yet feasible in the original syntax:

  • You have to write <:ctyp< t1 and t2 >> even in original since "*" overlaps with tuples.
  • <:ctyp< [ t ] >> have no equivalent in the original syntax.
  • <:expr< e1, e2 >> build a complete pair instead of an incomplete tuple.
  • perhaps more...

Things that are different:

  • Type application is reversed (int list ~~> list int)
  • Module application is like expressions (Set.Make(F(X)) ~~> Set.Make (F X))
  • In records "mutable" is after the colon (mutable f : t ~~> f : mutable t)
  • many more...

What camlp4 to use

When dealing with different host languages and embedded languages the confusion about what command to use quickly comes. using Camlp4 contains a section explaining differences between camlp4o,camlp4of,camlp4r,camlp4rf...

Personal tools
Espace privé