Thursday, July 5, 2018

Hamming on Learning, Information Theory....

An interesting set of lectures by Hamming of Hamming code fame

Sunday, May 13, 2018

Bottleneck, Deep learning and ML theory.

The Anatomize deep learning with information theory is a nice summary of the bottleneck theory and deep learning talk by Professor Tishby.   The blog also has a nice reference to a two part write up on traditional learning theory that doesn't actually explain the success the deep learning: Tutorial on ML theory, part1and part 2.

Monday, March 19, 2018

Polymorphism and Design Pattern Haskell Style

The following short demonstration does a nice job capturing parametric polymorphism and design patterns in Haskell.

Domain Modelling with Haskell: Generalizing with Foldable and Traversable

It is easy to see polymorphism in a list.  You can have a list of people, and you map a function over them to get their names.  Now you have a list of strings, or Maybe Strings for the names of each person on the list.  So the list went from List of People to List of Maybe Strings.

This a bit more challenging to see in your own data types.   In the video above the data type for Project is modified to use a polymorphic parameter "a" instead of fixed project id.   This allows for the project to maintain its structure (the constructors) as functions are applied to the subelements of the structure.  This is just like what happens in the list but it is a bit harder to see if you are accustomed to generics in the OO sense.

The notion of design pattern is also interesting in this lecture.   There are two main issues with the traditional GOF design pattern.  First, is that if there is a pattern why isn't it implemented in code once and reused everywhere.   Second,  patterns are not in the design but in forms of computation.   In this video, you can see the computational patterns  Functor, Foldable and Traversable are used in the computation.    More importantly, as they are patterns it can be automatically derived. 

Wednesday, February 28, 2018

Regression

I like this series of short lectures on Regression.

Statistical Regression

Thursday, January 25, 2018

Why you cant just skip Deep Learning.

An interesting post:  Dear Aspiring Data Scientists, Just Skip Deep Learning (For Now).

Hardly a day goes by that I don't hear the same sentiment expressed in one way or another.   The problem is that they are correct and wrong at the same time.

The problem is terminology.   Let's use the terminology defined in What's the difference between data science, machine learning, and artificial intelligence?

So far as you are looking at Data Science and Machine Learning (and your focus is to be hired)  that is insight and prediction, then the article is valid.      If your goal is a prediction, then why not use the simplest method, it is easier to train and generalize.   Furthermore, they are correct in that training for image recognition, voice processing, computer vision... you need a massive amount of data and processing power.     This is not where most DS/ML jobs are.

The problem they are missing, again going back to the definition above, is prediction vs action.     So long as the goal of DS/ML is to gather insight and prediction for humans then arguments are valid.   It is when you want to take an action that it all falls apart.   Humans can potentially apply their domain knowledge and navigate the predictions to find the optimum action.

But for machines, it is very different.   You basically have three choices to determine the best action, apply human-derived rules to the predictions (1980s AI),   reduce the problem to an optimization program (linear or convex), or essentially use reinforcement learning to derive a policy to deal with the uncertainty of your prediction.    This is I think the essence of Software 2.0 or what I like to call Training Driven Development (TrDD) -- More on this later.

If rules and/or optimization works then great you are done.   But when that is not an option, then in the model prescribed by the article, you need to combine a policy neural network with your ML prediction.      The problem now is that you have two islands to deal with, your object loss function's gradient from neural network can't propagate to your ML prediction.    Simplifications that worked so well for ML prediction for humans now are being amplified as errors in your policy network.    I don't know how you can have a loss function that can train the policy and communicate with the say a linear regression model's loss function.

After reading Optimizing things in the USSR I ordered my Red Plenty book.  It has some interesting observation as to what happens when you "simplify assumptions".

Wednesday, January 24, 2018

Information Theory of Deep Learning. Bottleneck theory



The Information Bottleneck Method by Tishby, Pereira, and Bialek is an interesting way to look at what is happening in a deep neural network.    You can see the concept explained in following papers


  1. Deep Learning and the Information Bottleneck Principle , and
  2. ON THE INFORMATION BOTTLENECK THEORY OF DEEP LEARNING 


Professor Tishby also does a nice lecture on the topic in

Information Theory of Deep Learning. Naftali Tishby

With a follow-on alk with more detail:

Information Theory of Deep Learning - Naftali Tishby

Deep learning aside, there are other interesting applications of the bottleneck method.  It can be used to categorize music chords:

Information bottleneck web applet tutorial for categorizing music chords

and in this talk,  the method is used to quantify prediction in the brain

Stephanie Palmer: "Information bottleneck approaches to quantifying prediction in the brain"


I found the following talk also interesting simplified version of the concept applied to deterministic mappings:

The Deterministic Information Bottleneck




Sunday, January 21, 2018

Visualization

Nice visualization of Hilbert Curves   and Linear Algebra.


I like this comment by "matoma" in Abstract Vector Space:

Furthermore, why does the exponential function appear everywhere in math? One reason is that it (and all scalar multiples) is an eigenfunction of the differential operator. Same deal for sines and cosines with the second-derivative operator (eigenvalue=-1).

Noting the definition of Eigenfunction