Google

"http://www.w3.org/TR/REC-html40/loose.dtd"> Pattern Matching Previous Contents Next

5   Pattern Matching

Pattern matching provides a concise, convenient way to bind parts of large objects to new local variables. Two Cyclone constructs use pattern matching, let declarations and switch statements. Although the latter are more common, we first explain patterns with let declarations because they have fewer complications. Then we describe all the pattern forms. Then we describe switch statements.

You must use patterns to access values carried by tagged unions, including exceptions. In other situations, patterns make code more readable and less verbose.

5.1   Let Declarations



In Cyclone, you can write

  let x = e;

as a local declaration. The meaning is the same as t x = e; where t is the type of e. In other words, x is bound to the new variable. Patterns are much more powerful because they can bind several variables to different parts of an aggregate object. Here is an example:

  struct Pair {  int x; int y; };

  void f(struct Pair pr) {

    let Pair(fst,snd) = pr;

    ...

  }

The pattern has the same structure as a struct Pair with parts being variables. Hence the pattern is a match for pr and the variables are initialized with the appropriate parts of pr. Hence ``let Pair(fst,snd) = pr'' is equivalent to ``int fst =pr.x; int snd = pr.y''. A let-declaration's initializer is evaluated only once.

Patterns may be as structured as the expressions against which they match. For example, given type

  struct Quad { struct Pair p1; struct Pair p2; };

patterns for matching against an expression of type struct Quad could be any of the following (and many more because of constants and wildcards---see below):

  • Quad(Pair(a,b),Pair(c,d))
  • Quad(p1, Pair(c,d))
  • Quad(Pair(a,b), p2)
  • Quad(p1,p2)
  • q
In general, a let-declaration has the form ``let p = e;'' where p is a pattern and e is an expression. In our example, the match always succeeds, but in general patterns can have compile-time errors or run-time errors.

At compile-time, the type-checker ensures that the pattern makes sense for the expression. For example, it rejects ``let Pair(fst,snd) = 0'' because 0 has type int but the pattern only makes sense for type struct Pair.

Certain patterns are type-correct, but they may not match run-time values. For example, constants can appear in patterns, so ``let Pair(17,snd) = pr;'' would match only when pr.x is 17. Otherwise the exception Match_Exception is thrown. Patterns that may fail are rarely useful and poor style in let-declarations; the compiler emits a warning when you use them. In switch statements, possibly-failing patterns are the norm---as we explain below, the whole point is that one of the cases' patterns should match.

5.2   Pattern Forms



So far, we have seen three pattern forms: variables patterns, struct patterns, and constant patterns. We now describe all the pattern forms. For each form, you need to know:

  • The syntax
  • The types of expressions it can match against (to avoid a compile-time error)
  • The expressions the pattern matches against (other expressions cause a match failure)
  • The bindings the pattern introduces, if any.
There is one compile-time rule that is the same for all forms: All variables (and type variables) in a pattern must be distinct. For example, ``let Pair(fst,fst) = pr;'' is not allowed.

You may want to read the descriptions for variable and struct patterns first because we have already explained their use informally.

  • Variable patterns
    • Syntax: an identifer
    • Types for match: all types
    • Expressions matched: all expressions
    • Bindings introduced: the identifier is bound to the expression being matched
  • Wildcard patterns
    • Syntax: _ (underscore, note this use is completely independent of _ for type inference)
    • Type for match: all types
    • Expressions matched: all expressions
    • Bindings introduced: none. Hence it is like a variable pattern that uses a fresh identifier. Using _ is better style because it indicates the value matched is not used. Notice that ``let _ = e;'' is equivalent to e.

  • Reference patterns
    • Syntax: *x (i.e., the * character followed by an identifier)
    • Types for match: all types
    • Expressions matched: all expressions. (Very subtle notes: Currently, reference patterns may only appear inside of other patterns so that the compiler can determine the region for the pointer type assigned to x. They also may not occur under a tunion pattern that has existential types unless there is a pointer pattern in-between.)
    • Bindings introduced: x is bound to the address of the expression being matched. Hence if matched against a value of type t in region `r, the type of x is t@`r.

  • Numeric constant patterns
    • Syntax: An int, char, or float constant
    • Types for match: numeric types
    • Expressions matched: numeric values such that == applied to the value and the pattern yields true. (Standard C numeric promotions apply. Note that comparing floating point values for equality is usually a bad idea.)
    • Bindings introduced: none

  • NULL constant patterns
    • Syntax: NULL
    • Types for match: nullable pointer types, including ? types
    • Expressions matched: NULL
    • Bindings introduced: none

  • enum patterns
    • Syntax: an enum constant
    • Types for match: the enum type containing the constant
    • Expressions matched: the constant
    • Bindings introduced: none

  • Tuple patterns
    • Syntax: $(p1,...,pn) where p1,...,pn are patterns
    • Types for match: tuple types where pi matches the type of the tuple's ith field i between 1 and n.
    • Expressions matched: tuples where the ith field matches pi for i between 1 and n.
    • Bindings introduced: bindings introduced by p1, ..., pn.

  • Struct patterns
    • Syntax: There are two forms:
      • X(p1,...,pn) where X is the name of a struct with n fields and p1,...,pn are patterns. This syntax is shorthand for X{.f1 = p1, ..., .fn = pn} where fi is the ith field in X.
      • X{.f1 = p1, ..., .fn = pn} where the fields of X are f1, ..., fn but not necessarily in that order

    • Types for match: struct X (or instantiations when struct X is polymorphic) such that pi matches the type of fi for i between 1 and n.
    • Expressions matched: structs where the value in fi matches pi for i between 1 and n.
    • Bindings introduced: bindings introduced by p1,...,pn

  • Pointer patterns
    • Syntax: &p where p is a pattern
    • Types for match: pointer types, including ? types. Also tunion Foo (or instantiations of it) when the pattern is &Bar(p1,...,pn) and Bar is a value-carrying variant of tunion Foo and pi matches the type of the ith value carried by Bar.
    • Expressions matched: non-null pointers where the value pointed to matches p. Note this explanation includes the case where the expression has type tunion Foo and the pattern is &Bar(p1,...,pn) and the current variant of the expression is ``pointer to Bar''.
    • Bindings introduced: bindings introduced by p

  • tunion and xtunion patterns
    • Syntax: X if X is a variant that carries no values. Else X(p1,...,pn) where X is the name of a variant (that has no existential type parameters) and p1, ..., pn are patterns. If X has existential type parameters, the syntax is X<`t1,...,`tm>(p1,...,pn) for distinct `t1, ..., `tm.
    • Types for match: If X is non-value-carrying variant of tunion Foo, then types tunion Foo and tunion Foo.x (or instantiations of them). If X carries values, then tunion Foo.X (or instantiations of it) where the pi matches the type of ith field. The number of existential type variables in the pattern must be the number of existential type variables for tunion Foo.X.
    • Expressions matched: If X is non-value-carrying, then X. If X is value-carrying, then values created from the constructor X such that pi matches the ith field.
    • Bindings introduced: bindings introduced by p1,...,pn

5.3   Switch Statements



In Cyclone, you can switch on a value of any type and the case ``labels'' (the part between case and the colon) are patterns. The switch expression is evaluated and then matched against each pattern in turn. The first matching case statement is executed. Except for some restrictions, Cyclone's switch statement is therefore a powerful extension of C's switch statement.

Restrictions
  • You cannot implicitly ``fall-through'' to the next case. Instead, you must use the fallthru; statement, which has the effect of transferring control to the beginning of the next case. There are two exceptions to this restriction: First, adjacent cases with no intervening statement do not require a fall-through. Second, the last case of a switch does not require a fall-through or break.
  • The cases in a switch must be exhaustive; it is a compile-time error if the compiler determines that it could be that no case matches. The rules for what the compiler determines are described below.
  • A case cannot be unreachable. It is a compile-time error if the compiler determines that a later case may be subsumed by an earlier one. The rules for what the compiler determines are described below. (C almost has this restriction because case labels cannot be repeated, but Cyclone is more restrictive. For example, C allows cases after a default case.)
  • The body of a switch statement must be a sequence of case statements and case statements can appear only in such a sequence. So idioms like Duff's device (such as ``switch(i%4) while(i-- >=0) { case 3: ... }'') are not supported.
  • A constant case label must be a constant, not a constant expression. That is, case 3+4: is allowed in C, but not in Cyclone. Cyclone supports this feature with a separate construct: switch "C" (e) { case 3+4: ... }. This construct is much more like C's switch: The labels must be constant numeric expressions and fallthru is never required.
An Extension of C
Except for the above restrictions, we can see Cyclone's switch is an extension of C's switch. For example, consider this code (which has the same meaning in C and Cyclone):


  int f(int i) {

    switch(i) {

    case 0:  f(34); return 17;

    case 1:  return 17;

    default: return i;

    }

  }

In Cyclone terms, the code tries to match against the constant 0. If it does not match (i is not 0), it tries to match against the pattern 1. Everything matches against default; in fact, default is just alternate notation for ``case _'', i.e., a case with a wildcard pattern. For performance reasons, switch statements that are legal C switch statements are translated to C switch statements. Other switch statements are translated to, ``a mess of tests and gotos''.

We now discuss some of the restrictions in terms of the above example. Because there is no ``implicit fallthrough'' in non-empty cases, the return statement in case 0 cannot be omitted. However, we can replace the ``return 17;'' with ``fallthru;'' a special Cyclone statement that immediately transfers control to the next case. fallthru does not have to appear at the end of a case body, so it acts more like a goto than a fallthrough. As in our example, any case that matches all values of the type switched upon (e.g., default:, case _:, case x:) must appear last, otherwise later cases would be unreachable. (Note that other types may have even more such patterns. For example Pair(x,y) matches all values of type struct Pair int x; int y;).

Much More Powerful
Because Cyclone case labels are patterns, a switch statement can match against any expression and bind parts of the expression to variables. Also, fallthru can (in fact, must) bind values to the next case's pattern variables. This silly example demonstrates all of these features:


  extern int f(int);}

  int g(int x, int y) {

    // return f(x)*f(y), but try to avoid using multiplication

    switch($(f(x),f(y))) {

    case $(0,_): fallthru;

    case $(_,0): return 0;

    case $(1,b): fallthru(b+1-1);

    case $(a,1): return a;

    case $(a,b): return a*b;

    }

  }

The only part of this example using a still-unexplained feature is ``fallthru(b)'', but we explain the full example anyway. The switch expression has type $(int,int), so all of the cases must have patterns that match this type. Legal case forms for this type not used in the example include ``case $(_,id):'', ``case $(id,_):'', ``case id:'', ``case _:'', and ``default:''.

The code does the following:

  • It evaluates the pair $(f(x), f(y)) and stores the result on the stack.
  • If f(x) returned 0, the first case matches, control jumps to the second case, and 0 is returned.
  • Else if f(y) returned 0, the second case matches and 0 is returned.
  • Else if f(x) returned 1, the third case matches, b is assigned the value f(y) returned, control jumps to the fourth case after assigning b+1-1 to a, and a (i.e., b + 1 - 1, i.e., b, i.e., f(y)) is returned.
  • Else if f(y) returned 1, the fourth case matches, a is assigned the value f(x) returned, and a is returned.
  • Else the last case matches, a is assigned the value f(x) returned, b is assigned the value f(y) returned, and a*b is returned.
Note that the switch expression is evaluated only once. Implementation-wise, the result is stored in a compiler-generated local variable and the value of this variable is used for the ensuring pattern matches.

The general form of fallthrus is as follows: If the next case has no bindings (i.e., identifiers in its pattern), then you must write fallthru;. If the next case has n bindings, then you must write fallthru(e1,...,en) where each ei is an expression with the appropriate type for the ith binding in the next case's pattern, reading from left to right. (By appropriate type, we mean the type of the expression that would be bound to the ith binding were the next case to match.) The effect is to evaluate e1 through en, bind them to the identifiers, and then goto the body of the next case. fallthru is not allowed in the last case of a switch, not even if there is an enclosing switch.

We repeat that fallthru may appear anywhere in a case body, but it is usually used at the end, where its name makes the most sense. ML programmers may notice that fallthru with bindings is strictly more expressive than or-patterns, but more verbose.

Case Guards
We have withheld the full form of Cyclone case labels. In addition to case p: where p is a pattern, you may write case p && e: where p is a pattern and e is an expression of type int. (And since e1 && e2 is an expression, you can write case p && e1 && e2: and so on.) Let's call e the case's guard.

The case matches if p matches the expression in the switch and e evaluates to a non-zero value. e is evaluated only if p matches and only after the bindings caused by the match have been properly initialized. Here is a silly example:


extern int f(int);

int g(int a, int b) {

  switch ($(a,b-1)) {

  case $(0,y) && y > 1: return 1;

  case $(3,y) && f(x+y) == 7 : return 2;

  case $(4,72): return 3;

  default: return 3;

  }

}

The function g returns 1 if a is 0 and b is greater than 2. Else if x is 3, it calls the function f (which of course may do arbitrary things) with the sum of a and b. If the result is 7, then 2 is returned. In all other cases (x is not 3 or the call to f does not return 7), 3 is returned.

Case guards make constant patterns unnecessary (we can replace case 3: with case x && x==3:, for example), but constant patterns are better style and easier to use.

Case guards are not interpreted by the compiler when doing exhaustiveness and overlap checks, as explained below.

Exhaustiveness and Useless-Case Checking
As mentioned before, it is a compile-time error for the type of the switch expression to have values that none of the case patterns match or for a pattern not to match any values that earlier patterns do not already match. Rather than explain the precise rules, we currently rely on your intuition. But there are two rules to guide your intuition:

  • In terms of exhaustiveness checking, the compiler acts as if any case guard might evaluate to false.
  • In terms of exhaustiveness checking, numeric constants cannot make patterns exhaustive. Even if you list out all 256 characters, the compiler will act as though there is another possibility you have not checked.
We emphasize that checking does not just involve the ``top-level'' of patterns. For example, the compiler rejects the switch below because the third case is redundant:


  enum Color { Red, Green };

  void f(enum Color c1, enum Color c2) {

    switch ($(c1,c2)) {

    case $(Red,x): return;

    case $(x,Green): return;

    case $(Red,Green): return;

    default: return;

    }

  }

Rules for No Implicit Fall-Through
As mentioned several times now, Cyclone differs from C in that a case body may not implicitly fall-through to the next case. It is a compile-time error if such a fall-through might occur. Because the compiler cannot determine exactly if an implicit fall-through could occur, it uses a precise set of rules, which we only sketch here. The exact same rules are used to ensure that a function (with return type other than void) does not ``fall off the bottom.'' The rules are very similar to the rules for ensuring that Java methods do not ``fall off the bottom.''

The general intuition is that there must be a break, continue, goto, return, or throw along all control-flow paths. The value of expressions is not considered except for numeric constants and logical combinations (using &&, ||, and ? :) of such constants. The statement try s catch ... is checked as though an exception might be thrown at any point while s executes.


Previous Contents Next