The tutorial below is also a valid CDuce program. You can save it as "tutorial.cd" in a fresh directory and learn...
(* -*- tuareg -*-
CDuce tutorial for the OCaml programmer ======================================= CDuce is a programming language dedicated to the manipulation of XML documents. The official documentation is at http://www.cduce.org/documentation.html ------------------ This whole file constitutes a valid CDuce program. -*- tuareg -*- on the first line tells emacs to load the tuareg-mode which is normally used for editing OCaml code, but works pretty well with CDuce too. Run this program from a fresh directory, by executing: cduce tutorial.cd It should not display any error message. It is recommended to practice both with the interactive mode of cduce, and by modifying and compiling the code from emacs with tuareg-mode or caml-mode (other text editors are probably fine too). To start cduce in interactive mode, just type "cduce" on the command line. Most tips for using the ocaml toplevel apply here too (see http://wiki.cocan.org/tips_for_using_the_ocaml_toplevel). Prerequisites for this tutorial: - you should be reasonably familiar with XML, - you should be reasonably familiar with OCaml, - you should realize that CDuce is pretty different from OCaml, although it shares some syntaxic similarities, - you should have a basic idea of what regular expressions are, and their usual notations (star, plus, question mark, vertical bar) *) (* Note about comments: C-style comments using /* and */ should be used for text that contains unmatched quotes, while OCaml-style comments using (* *) are preferred for commenting out pieces of code. *) /* * Let's create a simple but realistic example * that we will use throughout this tutorial. */ type a = <a>[ b* ] type b = <b ..>[ (<c>String)* | Char* ] let doc : a = <a> [ <b> [] <b name="b1"> [ <c> "c text 1" <c> "c text 2" ] <b name="b2"> "Pure Text" ] /* doc represents the following XML code: <a> <b/> <b name="b1"> <c>c text 1</c> <c>c text 2</c> </b> <b name="b2">Pure Text</b> </a> */ /* You can input and output XML data using some predefined functions. Here is a small list that should be enough for us now: Output: print_xml: converts any data to a string (type String) print: prints a string to stdout dump_xml (CDuce versions >= 0.4.1): takes any data and prints it directly to stdout dump_to_file: takes a file name (first argument), a string (second arg.), and writes the string to the file. Input: load_xml: take a file name or a URI, and load it as XML. For a full list of primitives, see: http://www.cduce.org/memento.html Let's get started: the following code defines a function "test_io" which writes some XML data to a file, and reads it back from the file. Not very useful, but instructive. */ let test_io (file : Latin1) (data : Any) : Bool = let _ = dump_to_file file (print_xml data) in let data2 = load_xml file in if data = data2 then `true else `false let _ = match test_io "doc.xml" doc with `true -> [] (* "nil" *) | `false -> raise "test_io didn't work as expected" /* A few notes about the constructs that we saw above: - match-with is similar to OCaml, and if-then-else is just a specialization of a match-with to booleans; - _ has the same meaning as in OCaml; - there is no exception type: "raise" accepts data of any type as argument; - [] is here used like () in OCaml (unit type). It is actually pretty much like the empty list or nil. The equivalent of lists is called sequences, however their type can define what kind of elements they contain, in which order and how many times they can occur. */ /* We already did some pretty advanced stuff: - we defined the structure of an XML document (type a); - we defined an XML document (doc) of type a; - we exported and imported it back from a file; - we saw how to apply functions and one syntax for defining a function. Now we will see how to manipulate effectively XML data, i.e. transform an XML tree into some other data, which typically would be ready to be exported to OCaml. */ /* Let's explore several syntaxic constructs that will allow us do some common tasks */ /* Task 1: extract all the b nodes from doc. The slash operator (/) expects: - on the left: a sequence of XML nodes (an expression); - on the right: a pattern for matching all subnodes. It important to note that the lefthand expression is a sequence, not just a node. This why we have to put square brackets around doc. So [doc] is a sequence of one element. */ let bnodes = [doc] / <b ..> _ /* The previous example was not extremely useful because it returns all the childrens of the single node <a>. That could have been achieved directly using simple pattern matching: */ let achildren = match doc with <a> children -> children /* Before continuing, let's have a closer look at the pattern matching above. "children", on the left side of the arrow binds the variable "children" to the sequence which constitutes the contents of the <a> node. The pattern matching is complete because the type of doc is t. It is however possible to cast "doc" to a more general type (a supertype of t). For example, the predefined type "Any" represents any possible CDuce value, XML or not: a value of any type can be cast to type Any. Let's do it: we create doc2, which is the same document as doc, just with the general type Any: */ let doc2 = doc : Any /* But now, if you try to define the "achildren" example using doc2 instead of doc, cduce will complain. The peculiarity of this type system, as opposed to the type system of OCaml, is that there are no polymorphism that uses type parameters (e.g. 'a) as in OCaml. For example, you can not define a polymorphic identity function in CDuce: it would always return something of type Any. In OCaml, the identity function can be defined as follows: let identity x = x Its signature is: val identity : 'a -> 'a So in OCaml, (identity 123) has type int like 123. In CDuce, a generic identity function would always return an object of type Any. Let's define it: */ let identity (x : Any) : Any = x /* If you try it in the cduce toplevel, you get this: # let identity (x : Any) : Any = x;; > val identity : Any -> Any = <fun> # identity 123;; - : Any = 123 And if you try to use it as an Int, you get one of those common type errors: That works all right (by the way, note the funny type "124" which is a subtype of Int): # 123 + 1;; - : 124 = 124 That's the problem we are talking about: # identity 123 + 1;; Characters 0-12: This expression should have type: { .. } | Int but its inferred type is: Any which is not a subtype, as shown by the sample: Atom These error messages can be confusing, but it often means that a more specific type was expected. It may mean that you forgot a downcast (see below) or that your data doesn't fit one of your type definitions. */ /* It is possible to view the same object with another type: - a more general type (supertype) is always allowed; - a more specific type (subtype) is allowed, if it matches the structure of the object. The former is something which is possible in OCaml. The latter is a downcast and it is not possible in OCaml, since it requires to store some type information at runtime. In CDuce, some typing happens at runtime (dynamically), so downcasts are possible, and naturally may cause runtime errors. 1. You can change the type of an object to a supertype (upcast) using ":". This is done statically, so you will get a message from the compiler if the given type does not include the current type of the object. 2. You can change the type of an object to any compatible type (downcast or upcast) using ":?". This is done at runtime and raises an exception if the requested type is not compatible with the structure of the object. The usefulness of static type conversions is limited, just like in OCaml, since there is little need to purposefully set the type of an object to a more general type: it is done automatically when the object is passed as an argument to function which expects a more general type. Downcasts are not possible directly in OCaml, and are generally considered bad practice anyway. Here, we will use them to check and assign a type to an XML document, which usually comes from some data loaded at runtime. Typically, we would load our "doc.xml" file as follows: */ let doc_reloaded = load_xml "doc.xml" :? a /* The command above may fail if the file "doc.xml" does not contain an XML document that conforms to type a. It is now clear that we use the dynamic cast operator ":?" as a way of matching the structure of a document against some predefined pattern, i.e. a type. Once an XML document has been validated, it can be passed as an argument to functions that work exclusively on that type. */ /* Let's go back to our sheep, as we say in French. We wanted to extract some nodes from our data. We saw that we can take a sequence of nodes, select and regroup all the children that match some pattern, using the slash operator: let bnodes = [doc] / <b ..> _ We were saying that this thing above was a bit complicated for just extracting the children of <a>. Let's jump to task 2. */ /* Task 2: Extract only the <b> nodes that have a "name" attribute. Very easy, we just have to make the pattern (righthand side of the slash) a little more specific: */ let named_bnodes = [doc] / <b name=_ ..> _ /* Using the same technique twice, we can extract the grandchildren of <a>: */ let cnodes = [doc] / <b ..> _ / <c> _ /* Note that the code above only selects the <c> nodes without attributes, because we omitted the ".." wildcard. It's okay because this is what we want, but using .. may be a good habit in general. It is nice to be able to go down the hierarchy using a sequence of node patterns separated by slashes, like for a filesystem. This explains why the expression (on the left) must be a sequence of nodes rather than just a node. */ /* Task 3: Extract the strings that are enclosed within <c> tags, as a sequence of strings (rather than a sequence of <c> nodes) */ /* From the previous example, we know how to extract the <c> nodes, and they are already stored in the cnodes variable. We are going to convert the sequence of <c> nodes into a sequence of the same length containing what we want. For this, we use the map-with construct. It is analog to List.map in OCaml, but unlike List.map it is not a function. */ let ccontents = map cnodes with <c> x -> x /* Not that what follows the mandatory "with" keyword is a pattern matching, not a function. But we can create our own mapf function which would take a function as its first argument, and map the list passed as second argument: */ let fmap (f : Any -> Any) (seq : [Any*]) : [Any*] = map seq with x -> f x /* As opposed to OCaml's List.map and other polymorphic functions, the result of fmap would always be of type [Any*] which is the most general type of sequence. So if you want to use such a function, the result would have to be downcasted using ":?", which involves a runtime check of your data. So you should probably not use that technique. However a workaround is presented there: http://www.cduce.org/tips.html */ /* Task 4: Write a function that selects <b> nodes that have a "name" attribute of a certain value. This value should be passed as a parameter to the function. Here is the solution: */ let select_bnode (name : String) (seq : [b*]) : [b*] = transform seq with x & <b name=y ..> _ -> if y = name then [x] else [] let b1_nodes = select_bnode "b1" bnodes let b2_nodes = select_bnode "b2" bnodes /* This solution introduces two main novelties: - the transform-with syntaxic construct, - the "&" operator in patterns. First, let's see what transform-with does. Like map-with, it is a language construct, not a function. Like map-with, it scans the elements of a sequence and returns another sequence. Its role is to allow mapping and filtering of data at the same time. Each item of the list is pattern-matched and must be converted into a sequence of zero, one or maybe more elements. With map-with it would result in a sequence of sequences, but here the result is flattened, i.e. all the sequences are joined together. In OCaml, there is no such builtin functionality, but an equivalent polymorphic function could be written as follows: # let rec transform f l = List.flatten (List.map f l) ;; val transform : ('a -> 'b list) -> 'a list -> 'b list = <fun> In the transform-with construct, pattern-matching always succeeds, since an invisible catch-all case is added and is equivalent to returning the empty sequence []. In other words, all elements that don't match are discarded. Now let's look at the pattern. It uses "&", placed between two patterns. The first pattern "x" matches everything and is just used to bind a variable (x) to the whole element. The second pattern "<b name=y ..> _" selects <b> elements that have a "name" attribute. So the "&" here is used like the "as" keyword in OCaml's pattern matching. It is however more general since it allows to force a single object to match two different patterns. Please note that CDuce also has a "::" operator, whose role is to name subsequences; it only appears from within the square brackets of sequence patterns, e.g.: # match [1 2 3 4] with [ _ x :: (_ _) _ ] -> x;; - : [ 2 3 ] = [ 2 3 ] */ /* Task 5: Understanding types */ /* CDuce provides a broad set of types, which are reminiscent of OCaml types. In addition to those, XML types exist and can be used to represent some XML data. However there are several interesting considerations to take into account */ /* 1) so-called XML types can represent more than just XML documents. In XML, data are always string-based. Here, other types can be used, such as Ints or records. When converting an object of an XML type, an error would occur if it cannot be converted to real XML: for instance Ints are translated to their string representation, but other types like records cause an error. The following object is an XML type a tag <a> that contains a record, and it can be manipulated within CDuce: */ let xml_with_record = <a> { x = 1; y = 2 } /* but it cannot be converted to a traditional XML file because records don't exist in real XML. So if you try, print_xml (<a> { x = 1; y = 2 }) would fail. */ /* 2) type and variable names can be capitalized or not, but they are case-sensitive, just like XML attribute labels. In addition, type names can be used in pattern-matching, just like capture variables. For example, the meaning of match 123 with t -> 456 depends on the context: - If a type t was defined, it means that the structure of x should be checked against the pattern defined by type t. - If there is no such type as t, then t is considered as a variable, which here would be an equivalent for x. Test 1: t as a variable (the warning is expected): # match 123 with t -> 456;; Characters 15-23: Warning: The capture variable t is declared in the pattern but not used in the body of this branch. It might be a misspelled or undeclared type or name (if it isn't, use _ instead). - : 456 = 456 Test 2: t as a type # type t = Int;; # match 123 with t -> 456;; - : 456 = 456 Test 2 works because 123 actually belongs to type t or Int. Using an incompatible type such as String results in an error: # match 123 with String -> 456;; Characters 6-9: This expression should have type: String but its inferred type is: 123 which is not a subtype, as shown by the sample: 123 */ /* Task 6: Making CDuce functions and data available to an OCaml program. Here is what you need: - a CDuce program (a.cd) - a compatible OCaml interface file for the CDuce program (a.mli) A CDuce file will constitute an OCaml module. Essentially, cduce will compile it into an OCaml implementation file, which use the CDuce runtime library. Sequence of commands to produce the OCaml implementation: ocamlfind ocamlc -c a.mli -package cduce cduce --compile a.cd cduce --mlstub a.cdo > a.ml Then a.ml is compiled normally with either ocamlc or ocamlopt, using the CDuce library: ocamlfind ocamlopt -c a.ml -package cduce In a Makefile, in addition to the rules you already use to compile all your .mli and .ml files, you can add those two: a.cdo: a.cd a.cmi cduce --compile a.cd a.ml: a.cdo cduce --mlstub a.cdo > a.ml The correspondence between OCaml and CDuce types is described in the official documentation at http://www.cduce.org/manual_interfacewithocaml.html#transl We will just give a few remarks and a simple example. About translating OCaml types to CDuce types: - Not all CDuce types can be converted to OCaml types. - Some CDuce types can be converted into different kinds of OCaml types, depending on how you define the OCaml interface. - Some types remain abstract in OCaml. The most common example is the Char type which forms the String type (String is an alias for [Char*]). If you want to use OCaml's string type, you have two options: 1. If your string only uses Unicode codes 0 to 255, then you can convert it from String to Latin1, e.g. yourstring :? Latin1 2. If your string may contain Unicode characters above 255, then you may want to export them as-is. The OCaml type you get is Cduce_lib.Encodings.Utf8.t, and it can be converted to a regular OCaml string (UTF8 encoded) with the Cduce_lib.Encodings.Utf8.to_string function. It is recommended that you browse through the available functions of the CDuce library using a tool like ocamlbrowser. Example: some OCaml types, followed their CDuce counterparts type opt = string option type opt8 = Cduce_lib.Encodings.Utf8.t option type stringlist = string list type variant = A | B of variant type variantpoly = `A | `B of variant type f = bool -> unit type flab = lab:bool -> unit type point = { x : float; y : float } */ type opt = [ Latin1? ] type opt8 = [ String? ] type stringlist = [ Latin1* ] type variant = `A | (`B, variant) type variantpoly = `A | (`B, variant) type f = Bool -> [] type flab = Bool -> [] type point = { x = Float; y = Float } /* Complete information about interfacing CDuce and OCaml is given at http://www.cduce.org/manual_interfacewithocaml.html */