Thursday, August 30, 2018

Dealing with Unknown

The 2nd paragraph of the Shannon's A Mathematical Theory of Communication paper says:
The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.
BACHELIER's The THE THEORY OF SPECULATION paper that started the Finance theory starts off with:
The determination of these fluctuations is subject to an infinite number of factors: it is therefore impossible to expect a mathematically exact forecast. Contradictory opinions in regard to these fluctuations are so divided that at the same instant buyers believe the market is rising and sellers that it is falling. Undoubtedly, the Theory of Probability will never be applicable to the movements of quoted prices and the dynamics of the Stock Exchange will never be an exact science. However, it is possible to study mathematically the static state of the market at a given instant, that is to say, to establish the probability law for the price fluctuations that the market admits at this instant. Indeed, while the market does not foresee fluctuations, it considers which of them are more or less probable, and this probability can be evaluated mathematically.

Thursday, August 23, 2018

Fractional Derivatives and Robot Swarms

An interesting idea...  Note to self:  read the thesis








This is related to the observation that the intelligent system's interaction is more complicated than simple processes that model the interaction as a random and uncorrelated bouncing of particles (Brownian motion).
Which in turn may be related to

And more on fractional thinking: Fractional Calculus, Delay Dynamics and Networked Control Systems

Wednesday, August 22, 2018

Functional Programming and Denotation Semantic

In The Next 700 Programming Languages paper Peter Ladin define the key feature of functional programming. He says:
The commonplace expressions of arithmetic and algebra have a certain simplicity that most communications to computers lack. In particular, (a) each expression has a nesting subexpression structure, (b) each subexpression denotes something (usually a number, truth value or numerical function), (c) the thing an expression denotes, i.e., its "value", depends only on the values of its subexpressions, not on other properties of them. It is these properties, and crucially (c), that explains why such expressions are easier to construct and understand. Thus it is (c) that lies behind the evolutionary trend towards "bigger righthand sides" in place of strings of small, explicitly sequenced assignments and jumps. When faced with a new notation that borrows the functional appearance of everyday algebra, it is (c) that gives us a test for whether the notation is genuinely functional or merely masquerading.

It all starts with a functional language. Following lecture shows Denotational Semantic of an expression in Lambda Calculus to Set.
Once the meaning of an expression is knowns we need the meaning of the values that the expression reduces to. This is where as a designer you define the meaning of the values in your specific domain.
In the following lecture, Conal Elliot shows how you extend the denotational semantic to values. He contrasts this approach to the standard imperative API model. In imperative API the interface is about creating the state, initializing it and defining functions to side effect the state.
Functional programming, on the other hand, is about APIs to define things that "ARE" in the domain. There is a clear distinction between values and evaluation. So you have base values, transformation, and composition operators. In all, you need to make sure there is a clear, and precise meaning to the values and that it can lead to an efficient implementation. The distinction is sometimes hard to get. For me an analogy is SQL. An SQL expression defines that property of the result your are interested in, the database engine then turns that into an execution plan and evaluates it to give you the results.
The key take away from the talk is that it is not about Functional vs Imperative or if a particular language does or doesn't have lambda expression. It is about Denotational vs Operational semantic. Another word, do you think about your program in term of its execution model or in term of the meaning of the expressions.
An interesting concept that shows up in Conal's work is that the semantic function is a natural transformation. Bartosz Milewski does a nice job of explaining what the natural transformation is.

Monday, August 13, 2018

The Opening the Black Box of Deep Neural Networks via Information uses the diffusion process and heat equation as intuition for gradient flow between the layers of the deep network. I needed a refresher on the underlying math. Found some very good sources:

In "What is Laplacian" and subsequent video you get a great insightful introduction of Laplacian, heat and wave equations.

In the applied math section which continues to "Fourier and Laplace Transformation" section of the differential equation lecture series, there is also a shorter explanation of diffusion.

There is also stochastic nature to the whole thing that is nicely covered in the Stochastic process, Ito calculus , and stochastic differential equation section of the Topics in Mathematics with Applications in Finance lectures with Notes and Vidoes.

Wednesday, July 18, 2018

From General Solution to Specific

One of the most important skills in mathematical problem solving is the ability to generalize. A given problem, such as the integral we just computed, may appear to be intractable on its own. However, by stepping back and considering the problem not in isolation but as an individual member of an entire class of related problems, we can discern facts that were previously hidden from us. Trying to compute the integral for the particular value a = 1 was too difficult, so instead we calculated the value of the integral for every possibly value of a. Paradoxically, there are many situations where it is actually easier to solve a general problem than it is to solve a specific.
Richard Feynman’s integral trick

Communication and Learning

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.
A Mathematical Theory of Communication

Thursday, July 5, 2018

Hamming on Learning, Information Theory....

An interesting set of lectures by Hamming of Hamming code fame

Sunday, May 13, 2018

Bottleneck, Deep learning and ML theory.

The Anatomize deep learning with information theory is a nice summary of the bottleneck theory and deep learning talk by Professor Tishby.   The blog also has a nice reference to a two part write up on traditional learning theory that doesn't actually explain the success the deep learning: Tutorial on ML theory, part1and part 2.

Update: A really interesting article on To Remember, the Brain Must Actively Forget This is very similar to the idea of the bottleneck theory and deep learning models.

Monday, March 19, 2018

Polymorphism and Design Pattern Haskell Style

The following short demonstration does a nice job capturing parametric polymorphism and design patterns in Haskell.

Domain Modelling with Haskell: Generalizing with Foldable and Traversable

It is easy to see polymorphism in a list.  You can have a list of people, and you map a function over them to get their names.  Now you have a list of strings, or Maybe Strings for the names of each person on the list.  So the list went from List of People to List of Maybe Strings.

This a bit more challenging to see in your own data types.   In the video above the data type for Project is modified to use a polymorphic parameter "a" instead of fixed project id.   This allows for the project to maintain its structure (the constructors) as functions are applied to the subelements of the structure.  This is just like what happens in the list but it is a bit harder to see if you are accustomed to generics in the OO sense.

The notion of design pattern is also interesting in this lecture.   There are two main issues with the traditional GOF design pattern.  First, is that if there is a pattern why isn't it implemented in code once and reused everywhere.   Second,  patterns are not in the design but in forms of computation.   In this video, you can see the computational patterns  Functor, Foldable and Traversable are used in the computation.    More importantly, as they are patterns it can be automatically derived. 

Wednesday, February 28, 2018

Regression

I like this series of short lectures on Regression.

Statistical Regression

Thursday, January 25, 2018

Why you cant just skip Deep Learning.

An interesting post:  Dear Aspiring Data Scientists, Just Skip Deep Learning (For Now).

Hardly a day goes by that I don't hear the same sentiment expressed in one way or another.   The problem is that they are correct and wrong at the same time.

The problem is terminology.   Let's use the terminology defined in What's the difference between data science, machine learning, and artificial intelligence?

So far as you are looking at Data Science and Machine Learning (and your focus is to be hired)  that is insight and prediction, then the article is valid.      If your goal is a prediction, then why not use the simplest method, it is easier to train and generalize.   Furthermore, they are correct in that training for image recognition, voice processing, computer vision... you need a massive amount of data and processing power.     This is not where most DS/ML jobs are.

The problem they are missing, again going back to the definition above, is prediction vs action.     So long as the goal of DS/ML is to gather insight and prediction for humans then arguments are valid.   It is when you want to take an action that it all falls apart.   Humans can potentially apply their domain knowledge and navigate the predictions to find the optimum action.

But for machines, it is very different.   You basically have three choices to determine the best action, apply human-derived rules to the predictions (1980s AI),   reduce the problem to an optimization program (linear or convex), or essentially use reinforcement learning to derive a policy to deal with the uncertainty of your prediction.    This is I think the essence of Software 2.0 or what I like to call Training Driven Development (TrDD) -- More on this later.

If rules and/or optimization works then great you are done.   But when that is not an option, then in the model prescribed by the article, you need to combine a policy neural network with your ML prediction.      The problem now is that you have two islands to deal with, your object loss function's gradient from neural network can't propagate to your ML prediction.    Simplifications that worked so well for ML prediction for humans now are being amplified as errors in your policy network.    I don't know how you can have a loss function that can train the policy and communicate with the say a linear regression model's loss function.

After reading Optimizing things in the USSR I ordered my Red Plenty book.  It has some interesting observation as to what happens when you "simplify assumptions".

Wednesday, January 24, 2018

Information Theory of Deep Learning. Bottleneck theory



The Information Bottleneck Method by Tishby, Pereira, and Bialek is an interesting way to look at what is happening in a deep neural network.    You can see the concept explained in following papers


  1. Deep Learning and the Information Bottleneck Principle , and
  2. ON THE INFORMATION BOTTLENECK THEORY OF DEEP LEARNING 


Professor Tishby also does a nice lecture on the topic in

Information Theory of Deep Learning. Naftali Tishby

With a follow-on alk with more detail:

Information Theory of Deep Learning - Naftali Tishby

Deep learning aside, there are other interesting applications of the bottleneck method.  It can be used to categorize music chords:

Information bottleneck web applet tutorial for categorizing music chords

and in this talk,  the method is used to quantify prediction in the brain

Stephanie Palmer: "Information bottleneck approaches to quantifying prediction in the brain"


I found the following talk also interesting simplified version of the concept applied to deterministic mappings:

The Deterministic Information Bottleneck




Sunday, January 21, 2018

Visualization

Nice visualization of Hilbert Curves   and Linear Algebra.


I like this comment by "matoma" in Abstract Vector Space:

Furthermore, why does the exponential function appear everywhere in math? One reason is that it (and all scalar multiples) is an eigenfunction of the differential operator. Same deal for sines and cosines with the second-derivative operator (eigenvalue=-1).

Noting the definition of Eigenfunction


Affine Transformation

Nice visuals on Affine Matrix Transformation

Data science, machine learning, and artificial intelligence....

I like this explanation:

What's the difference between data science, machine learning, and artificial intelligence?

Monday, January 15, 2018

Nice Lecture series on Causality

In Reinforcement Learning you need to define functions for rewards and state transitions.   Big data is an good source of modeling external actors to a deep learning system.  What is needed is a causality model for the data to be able to simulate the external world correctly.

This set of lectures are quick introduction to the field.

part 1

part 2

part 3

part 4