OCaml Best Practices for Developers
OCaml Best Practices Guide
This page describes a set of best practices for software development of XAPI in OCaml.
The aim of this page is to describe coding standards which promote readability, understandability and maintainability.
Use of Interface Files
Interface files (.mli files) should be used wherever possible.
This forces the developer to think about the interface to a module, rather than allowing all functions to be externally visible by default.
Furthermore, interfaces provide a quick reference for the types of modules and functions, as well as documenting their intended behaviour.
Bare-bones interface files can be generated automatically from module implementations (.ml files) using the following command:
ocamlc -i <impl-file> > <if-file>
where <impl-file> is the path to an .ml file and <if-file> is the path to the .mli file to be created.
However, do not stop there---edit the .mli file to remove internal functions and add documentation.
In some cases, the .mli file need to be built using complex dependencies. In our case, the easiest way to do this is to use the already existing omake infrastructure. As example, suppose you want to build foo.mli. Then:
- remove foo.cmx from the current directory;
- build foo.cmx, ie. type "omake foo.cmx";
- omake should have displayed a (usually long) line such as "ocamlfind ... -c foo.ml". Copy this line and add "-i > foo.mli" at the end of it to obtain your .mli file;
- edit foo.mli to remove internal functions and add documentation.
Every function in an interface (.mli files or sig...end blocks) must be documented.
This must include a description of the purpose of the function, descriptions of the arguments and return value.
The nature of any exceptions which may be raised, and the circumstances in which they are raised, should also be described.
The syntax for comments in interface files should use the ocamldoc syntax.
This consists of comments of the form:
(** ... *)
For details about the syntax, see this page.
Functions which are only executed on the pool master or only executed on a slave should be annotated with a comment
(* MASTER ONLY *)
(* SLAVE ONLY *)
above the implementation of the function, where this is not obvious from the context.
This may be incorporated into a larger comment describing the function's implementation.
Where an important invariant must be borne in mind by a developer editing a module, such as the need to use a particular lock when performing certain operations, a comment of the following form should be used to draw the reader's eye:
(* XXX The lock must be held while performing operations. *)
Names of Identifiers
Value identifiers should use lower case with underscores to separate words:
let birthday_present = "A nice, new wide-screen monitor."
Type identifiers should use lower case with underscores to separate words:
Type constructors should have an initial capital letter, and use lower case with underscores to separate words:
type birthday_present = Widescreen_monitor | Fast_car | Box_of_chocolates
Module identifiers require an initial capital letter and may use further capital letters to represent abbreviations/acronyms and underscores to separate words:
module Parser = struct...end module Locking_strategy = struct...end module XML_UTF8 = struct...end
Module type identifiers may be in upper-case, with underscores to separate words, but this is not mandatory. This style should only be used where it does not hinder readability:
module type CAR_FACTORY = sig...end module Fast_car_factory : CAR_FACTORY = struct...end
Use of parentheses
In function application
Only use brackets when expressions do not associate to the left.
For example, none of the brackets are necessary in ((f x) y) but the brackets are necessary in f (x y).
In type annotations
Only use brackets when expressions do not associate to the right.
For example, none of the brackets are necessary in int -> (int -> int) but the brackets are necessary in (int -> int) -> int.
In mathematical expressions
Parentheses are optional for mathematical and boolean operations to aid clarity. For example, a complex boolean expression of the form A || B && C may be better expressed as A || (B && C).
In sequences of expressions
Only use brackets around sequences of expressions if the expression is written on one line, otherwise use begin...end. In general, it's easier to spot the difference between
begin match A with B -> C end; D
begin match A with B -> C; D end
compared to the difference between
(match A with B -> C); D
(match A with B -> C; D)
In if...then...else structures
The important thing here is to make the distinction between the contents of the then clause and the else clause, and the distinction between what is in these clauses and what follows the if structure. There are a variety of ways of achieving this clearly, depending on taste. Here are some suggestions.
Brackets used to enclose the contents of the then and else blocks if they consist of a sequence of expressions.
if A then ( B; C ) else ( D; E ); F
A begin...end block used in a then or else component consisting of a sequence of expressions:
if A then begin B; C end else begin D; E end; F
Other uses of whitespace are also permitted provided that clarity is maintained, for example:
if A then begin B; C end else begin D; E end; F
As in the above examples, any expression following the if structure should begin on the line after the close of the else block.
This helps the reader to spot the difference between, in general, a structure of the form
(if A then B else C); D
and a structure of the form
if A then B else (C; D)
Use of open
Avoid using open except where another module is heavily used within the current module or heavily used across the application (such as the String module).
This makes it clearer which module the function is defined in and helps avoid aliasing resulting from name-clashes.
It is permitted to use open to give a module a different name in the local context to add a layer of indirection where helpful.
For example, the following is permitted:
module CMN = Complicated_module_name
Opening a module should always be done at the top of a module.
Comments should be no more than 80 characters wide.
Comments should be written in English prose.
In multi-line comments, each line should be preceded by an asterisk, vertically aligned with the opening asterisk, for example:
(* This is a multi-line comment. Having an asterisk on each * line helps the reader to identify the comment as a single * entity and is particularly important in the absence of * syntax highlighting. *)
The first line of a multi-line comment may be optionally left blank.
The close of the comment may appear on the last line of the comment or on a line of their own, as in the example above.
Single-line comments should usually be on a line of their own, but may be permitted on the same line as code provided they are very short.
There is no maximum width for source code but long lines should be used judiciously. Consider using let...in for sub-expressions, to give them a name and reduce the length of the line where they are used.
Note also that diff compares files on a line-by-line basis, so particularly long lines become difficult to analyse.
Use of whitespace
The top-level of a source file has no indent. Subsequent indentation should occur on each line of a multi-line expression until the next ;, in or the end of the expression.
A single level of indenting should consist of a single tab character. Spaces should not be used. (Refer to Editing for details of how to configure popular editors to conform to this requirement.)
In with and function blocks, unless all the cases fit onto a single line, position the first case on the line underneath the start of the block and prefix all cases (including the first) with a vertical bar. For example:
match A with | B -> C | D -> E
Note that it is not necessary to vertically align the -> arrows where B and D are of different lengths.
If the body of the case is too long to fit on a single line, its first line should be dropped and indented as follows:
match A with | B -> C1; C2 | D -> E1; E2
The in keyword should be treated like a semi-colon. Thus the expression following in need not be indented any further than the level of the let. For example:
let A = B in let C = D in E
If the assignment does not fit onto a single line, the in may either be placed at the end of the last line or on a line of its own, at the same indentation level as the let. Both styles are employed in the following example:
let A = B1; B2 in let C = D1; D2 in E
The use of an underscore as a wildcard `catch-all' pattern should only be used when anything other than the previous cases is to be matched.
It should not be used when there is only one remaining pattern.
This helps the reader know what values might match the pattern, without needing to refer to the type definition, and helps to prevent mistakes when extending the type with a new constructor.
For example, when matching values from the type
type state = Dying | Shutdown of int | Paused | Blocked | Running
the following code is discouraged:
match x with | Dying -> 1 | Shutdown _ -> 2 | Paused -> 3 | Blocked -> 4 | _ -> 5
Instead, the final pattern should use the Running constructor explicitly.
Wildcards may be used in patterns to represent sub-expressions which not referred to in the body of the case, as demonstrated in the Shutdown case above.
Do not allow the compiler to give a `pattern-matching is not exhaustive' warning.
Always specify sufficient cases to exhaust all possible patterns, even if some of them have no-op cases.
Discarding returned values
When an expression in a sequence has a type other than unit, use one of the functions named with prefix ignore_ from Pervasiveext to denote intentional discard of its value.
In other words, treat ; as having type unit -> 'a -> 'a.
ignore_int (Unix.lseek fd pos Unix.SEEK_SET)
For unusual return types, do the same thing yourself and use:
let (_: <type>) = <expr> in
where <type> is the expected return type of the expression <expr>.
(Note that the built-in ignore function is not safe with respect to partial function application. If a function whose return value is being discarded has an extra parameter added at a later date, you will not get a compiler error because ignore will be discarding a thunk rather than a value.)
Tail-recursive functions should be used where possible to avoid consuming O(n) memory on the stack.
Use of and
The and keyword may be used as a means of defining several functions in the same environment. Avoid defining a recursive function together with anything anything else (see this page) unless the functions are mutually recursive.
Concatenation of literals
Literal strings should be expressed as a single string rather than as a concenation of several literals.This is because the compiler keeps the strings in separate memory locations and the concatenation is performed at run-time.
For example, rather than
"hello " ^ "world"
With strings longer than 80 characters, use the "\" line continuation character to break the string into shorter, more readable chunks:
let long_string = "Donec id leo sed est porttitor porta. Nulla eget tellus. Ut quam. \ Morbi egestas, mauris semper mattis vehicula, neque sem accumsan \ felis, et aliquet justo justo eu est. Curabitur ut sem id orci \ id, venenatis et, leo."
Use of objects
Object-oriented programming is a useful technique in certain situations. However OCaml's object system should be used with caution.
Before using objects, first consider whether your problem can easily be solved by using alternative solutions:
- Modules types
- Polymorphic variants
However, if you find yourself building your own object-like type system in OCaml, you might be better off using OCaml objects. In particular, it is preferable to use OCaml objects rather than building v-tables by hand.
If you do use objects, there are a number of techniques for increasing readability of compiler errors:
- Hide object definitions and types behind an .mli file.
- Use named class types, and make your classes implement these types.
Using these techniques will allow the compiler to produce shorter error messages, with names instead of long method lists.
Unit tests should be created for every logical unit of code which fulfils a particular single purpose and whose implementation is not obviously functionally correct. (And remember that what might be obvious to you now might not be obvious to someone else, or even to you in a few months!)
Unit tests should reside in a test sub-directory relative to the directory containing the code tested.
Since the circumstances in which a unit test is appropriate are subjective, discretion should be used. For some functions, no tests are required. For others, an extensive set may be beneficial. For some functions, it may suffice to check a few corner cases.
Remember that unit tests are not merely confirmations that your code works as intended. They may have a longer life than your implementation of the tested functionality so can act as regression tests.