Google

NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.64 ">

3. Foreign language interfaces are harder than they look

Even after the scope is restricted to designing a foreign-language interface from Haskell to C, the task remains surprisingly tricky. At first, one might think that one could take the C header file describing a C procedure, and generate suitable interface code to make the procedure callable from Haskell.

Alas, there are numerous tiresome details that are simply not expressed by the C procedure prototype in the header file. For example, consider calling a C procedure that opens a file, passing a character string as argument. The C prototype might look like this:

int open( char *filename );

Our goal is to generate code that implements a Haskell procedure with type

open :: String -> IO FileDescriptor

  • First there is the question of data representation. One has to decide either to alter the Haskell language implementation, so that is string representation is identical to that of C, or to translate the string from one representation to another at run time. This translation is conventionally called marshalling.

    Since Haskell is lazy, the second approach is required. (In general, it is tremendously constraining to try to keep common representations between two languages. For example, precisely how are structures laid out C?)

  • Next come questions of allocation and lifetime. Where should we put the translated string? In a static piece of storage? (But how large a block should we allocate? Is it safe to re-use the same block on the next call?) Or in Haskell's heap? (But what if the called procedure does something that triggers garbage collection, and the transformed string is moved? Can the called procedure hold on to the string after it returns?) Or in C's malloc'ed heap? (But how will it get deallocated? And malloc is expensive too.)

  • C procedures often accept pointer parameters (such as strings) that can be NULL. How is that to be reflected on the host-language side of the interface? For example, if the documentation for open told us that it would do something sensible when called with a NULL string, we might like the Haskell type for open to be

    open :: Maybe String -> IO FileDescriptor

    so that we can model NULL by Nothing.

  • The desired return type, FileDescriptor, will presumably have a Haskell definition such as this:

    newtype FileDescriptor = FD Int

    The file descriptor returned by open is just an integer, but Haskell programmers often use newtype declarations create new distinct types isomorphic to existing ones. Now the type system will prevent, say, an attempt to add one to a FileDescriptor.

    Needless to say, the Haskell result type is not going to be described in the C header file.

  • The file-open procedure might fail; sometimes details of the failure are stored in some global variable, errno. Somehow this failure and the details of what went wrong must be reflected into Haskell's IO monad.

  • The open procedure causes a side effect, so it is appropriate for its type to be in Haskell's IO monad. Some C functions really are functions (that is, they have no side effects), and in this case it makes sense to give them a ``pure'' Haskell type. For example, the C function sin should appear to the Haskell programmer as a function with type

      sin :: Float -> Float
  • C function prototypes are not explicit about the mode of their parameters. Which parameters are in parameters, which out and which are in out - that is, in what direction do data pass via a parameter?

None of these details are mentioned in the C header file. Instead, many of them are in the manual page for the procedure, while others (such as parameter lifetimes) may not even be written down at all.