Thursday, May 23, 2013

Even more reasons not to use Scala

In a response to my post to Quora, where I recommended against use of Scala for any serious project, a reader posted this:

Let's say you have three functions (f,g, and h) that receive an integer and performs an asynchronous computation, returning Future of Int and you need to chain the three functions together (use the result of one computation as input to the next one). In Scala you'll do: 

 f(x).flatMap(g(_).flatMap(h(_)))


 or with the "for" notation:  


 for {  
    i <- f(x)  
    ii <- g(i)  
    iii <-h(ii)  
    } yield iii  

What appears to be a cleaver (or "cute") use of monadic composition is actually seems to be completely misleading.   A closer look at the "flatMap" implementation in Future shows that:


 def flatMap[S](f: T => Future[S])(implicit executor: ExecutionContext): Future[S] = {  
   val p = Promise[S]()  
   onComplete {  
    case f: Failure[_] => p complete f.asInstanceOf[Failure[S]]  
    case Success(v) =>  
     try {  
      f(v).onComplete({  
       case f: Failure[_] => p complete f.asInstanceOf[Failure[S]]  
       case Success(v) => p success v  
      })(internalExecutor)  
     } catch {  
      case NonFatal(t) => p failure t  
     }  
   }(executor)  
   p.future  
  }  

Another word, your set of futures are no longer running independently and asynchronously (as you would expect in Future)  they are composed sequentially in a single Future.   If that was your original goal, then you should have composed the functions (in this case f, g, h) and run them in a Future.   On the other hand when you do:
f(x).flatMap(g(_).flatMap(h(_))) 
you must be thinking that the functions are running in parallel and working on each others output.    But it seems (reading the code and not actually done a test, as I don't have the Scala dev environment handy)  to me that "g" would not run at all until "f" is finished.   Again not what you would expect when you are using Futures.


Bayesian Learning Lectures

A set of very informative, not to be missed,  lectures on the Bayesian learning by Dr Draper.

Bayesian Modeling, Inference, Prediction and Decision-Making


Friday, May 10, 2013

What is "high level" or "low level" or "functional"?


......what does it mean for one library to be more "high level" or "low
level" or "functional" than another? On what basis should we make such
comparisons? A pithy answer is given by Perlis:
A programming language is low level when its programs require attention
to the irrelevant.
Alan Perlis
But how should we decide what is relevant?

Within the functional programming community, there is a strong historical
connection between functional programming and formal modeling.
Many authors have expressed the view that functional programming languages
are "high level" because they allow programs to be written in terms of an
abstract conceptual model of the problem domain, without undue concern for
implementation details.

Of course, although functional languages can be used in this "high level"
way, this is neither a requirement nor a guarantee. It is very easy to write
programs and libraries in pure Haskell that are littered with implementation
details and bear little or no resemblance to any abstract conceptual model of
the problem they were written to solve

Source Genuinely Functional User Interfaces

Tuesday, May 7, 2013

Meaning of NOSql and BigData for Software Engineers


The Term NoSql as is No SQL doesn't convey the real meaning of the concept behind NOSql. After-all    SQL stands for Standard Query Language. It is a language of querying relations  which is based on Relational Algebra and Relational Calculus. It is a query language has no bearing on the kind of processing that NoSql implies. In fact you can use SQL to query your “NoSql/BigData” as in Hive.

The Term NoSql as “Not Just SQL” is closer to the meaning implied by NoSQL but still doesn't really convey what the NoSQL is all about.  It is not about what language you use to query your data.

Big Data is also not really meaningful. Sure the size of data might be large but you can have NoSQL  problem with small data (at least in today's relative terms).

IMHO, NoSQL intends to say your data is not ACID. ACID as in Atomic, Consistent, Isolated, and Durable has been the corner stone of the transactional databases. In "NoSql"  you are dealing persisted data without strict grantees on its Atomicity, Consistency, Isolation, and/or Durability. Another word you have noisy data with duplication, inconsistency, loss. The goal of NoSql is to develop software that can work with such data. Even if you have small size but noisy data, the standard algorithms that work on ACID data would not yield useful results.  To them non-ACID data is garbage and you endup with garbage-in-garbage-out dilema.

A better way to think of NoSql is to think of problem of inference about the underlying model in the data, prediction on future data, and/or making decisions all using the noisy data (of any size). That is the problem that has been address in the statistics community as Bayesian analysis. The challenge for software developers tackling NoSql/Big Data problems is to understand and incorporate statistical analysis in their application. A good place to start on this are an excellent encyclopedic write up by Professor David Drapper's Bayesian Statistics or his priceless in-depth lectures on the topic  Bayesian Modeling, Inference, Prediction and Decision-Making.