Friday, February 6, 2009

pure, side effect, types, and Computation....

A friend (a purist one!) was arguing that a purely functional programming paradigm is not useful in real life. His point was that a pure functional language means no side effect. Without side effect there is not much a software can do. Simon Peyton Jones, one of the gurus of the Haskell, had an interesting comment in that a computer program that has no side effect is essentially a heater! As it generates heat and no real useful work.

So what is the disconnect? It has to do with a type systems and the scope of your types (essentially what are you typing in your system?)

Lets start from Assembly programming. In an assembly language program any instruction can read/write from/to any address in its space. To understand the code (as in debugging it) you need to go line by line and make sure you understand the side effects in each instruction.

High-level languages improved on this. For instance in a C program (or Java), you know the types of the input and output to/from your function (or method). The key point to realize is that the type of input/output doesn't say anything as to what the method will (or will not) cause when it is invoked. For instance, if a method takes two integer and returns another integer (as in add), you don't know if the invocation of the method would create any side effects on the file system. The best you can do is to look at the program's (or Class) include (or imports). For instance a class that doesn't import the file system classes (directly or indirectly) then its methods would not have any side effects or dependency in the file system.

Then comes Dependency Injection to solve the problem in Object oriented model. Dependency Injection allowed you to abstract out your object's external dependencies into an interface that would get injected at run time into the class.. It is an improvement on the base Object Oriented model. But on a given method invocation, you still didn't know what would happen. At least there is nothing in the compiler that you can count on to ensure the dependencies. So for example, a call to a method1 on a class can influence a subsequent call to method2 of the class. You would not have any idea about such backdoor connections. With good programing practices you can get by. But when you are thinking about testing, reuse or sharing of code this kind of issues always creep in.

Clearly some languages do away with types all together. But here we are talking about cases where (for whatever reason) you want to have a type system. What is interesting is that in languages that do have type system, the types of the input and output is not enough to be able to make sense of the code (as in maintain it, share it, or write unit tests for it, etc).


If instead of typing the input/out you actually type the computation then you can get lot out of your typing system. Take for instance the saying: life is not a destination, it is journey. In C/Java/etc life is a destination. All you care about is the type of the returned data. In typing the computation it is all about the journey. But how is that different?

Let say your kids are going on a summer trip with their school. What is important to you? Do you just care that they reach their destination or are you also concern about the type of activities that they would experience along the way? For instance, you want to know if they serve alcohol in the trip or not. If kids are going on the trip, then may be you want to find alternatives. If adults are going for the spring break, well that may be the thing you are looking for.

The key is that you don't want to forbid or encourage alcohol consumption (side effects) just that you want to make sure the type of the journey (the computation) matches your requirements as in destination AND journey. You want to know all the effects and not just the return value. After all, if you want a type system, why should you just stop at the return values?

In Haskell, the type of the function is not just the return value, it is the type of computation (journey). That is the key concept. Of course the type of the return value would also be included in the type of computation. Furthermore, there are cleaver ways to take a computation and define as abstract type (similar to template programming C++ or Generics in Java) where the actual type of the return value depends on the invocation of the computation (based on other parameters). If there are any side effects it would also show up on your type system. You have full disclosure, or specification. Modeling IO is an interesting example of using type system to fully define computations with side effect. The IO Inside article might be a good start to get more details.

To sum it up, in a pure functional language you don't want to avoid side effects. You just want to know the computational models that do generate side effect and use them accordingly. The beauty of it is that the compiler ensures that a computation would not violate its computational contract.

Knowing the computation's type (as oppose to just its return value) is quite beneficial. It like the dependency injected but at function level. Hence, the type signature can be used to perform automatic testing of the functions. Furthermore, the caller can control the various aspect of the computation. For instance, lets say the computation signature indicates that the computation generates log messages. The caller can call the function with its own logging implementation in the computation, which sounds awfully like Aspects programming.

No comments: