I like this series of short lectures on Regression.

Statistical Regression

## Wednesday, February 28, 2018

## Thursday, January 25, 2018

### Why you cant just skip Deep Learning.

An interesting post: Dear Aspiring Data Scientists, Just Skip Deep Learning (For Now).

Hardly a day goes by that I don't hear the same sentiment expressed in one way or another. The problem is that they are correct and wrong at the same time.

The problem is terminology. Let's use the terminology defined in What's the difference between data science, machine learning, and artificial intelligence?

So far as you are looking at Data Science and Machine Learning (and your focus is to be hired) that is insight and prediction, then the article is valid. If your goal is a prediction, then why not use the simplest method, it is easier to train and generalize. Furthermore, they are correct in that training for image recognition, voice processing, computer vision... you need a massive amount of data and processing power. This is not where most DS/ML jobs are.

The problem they are missing, again going back to the definition above, is prediction vs action. So long as the goal of DS/ML is to gather insight and prediction for humans then arguments are valid. It is when you want to take an action that it all falls apart. Humans can potentially apply their domain knowledge and navigate the predictions to find the optimum action.

But for machines, it is very different. You basically have three choices to determine the best action, apply human-derived rules to the predictions (1980s AI), reduce the problem to an optimization program (linear or convex), or essentially use reinforcement learning to derive a policy to deal with the uncertainty of your prediction. This is I think the essence of Software 2.0 or what I like to call Training Driven Development (TrDD) -- More on this later.

If rules and/or optimization works then great you are done. But when that is not an option, then in the model prescribed by the article, you need to combine a policy neural network with your ML prediction. The problem now is that you have two islands to deal with, your object loss function's gradient from neural network can't propagate to your ML prediction. Simplifications that worked so well for ML prediction for humans now are being amplified as errors in your policy network. I don't know how you can have a loss function that can train the policy and communicate with the say a linear regression model's loss function.

After reading Optimizing things in the USSR I ordered my Red Plenty book. It has some interesting observation as to what happens when you "simplify assumptions".

Hardly a day goes by that I don't hear the same sentiment expressed in one way or another. The problem is that they are correct and wrong at the same time.

The problem is terminology. Let's use the terminology defined in What's the difference between data science, machine learning, and artificial intelligence?

So far as you are looking at Data Science and Machine Learning (and your focus is to be hired) that is insight and prediction, then the article is valid. If your goal is a prediction, then why not use the simplest method, it is easier to train and generalize. Furthermore, they are correct in that training for image recognition, voice processing, computer vision... you need a massive amount of data and processing power. This is not where most DS/ML jobs are.

The problem they are missing, again going back to the definition above, is prediction vs action. So long as the goal of DS/ML is to gather insight and prediction for humans then arguments are valid. It is when you want to take an action that it all falls apart. Humans can potentially apply their domain knowledge and navigate the predictions to find the optimum action.

But for machines, it is very different. You basically have three choices to determine the best action, apply human-derived rules to the predictions (1980s AI), reduce the problem to an optimization program (linear or convex), or essentially use reinforcement learning to derive a policy to deal with the uncertainty of your prediction. This is I think the essence of Software 2.0 or what I like to call Training Driven Development (TrDD) -- More on this later.

If rules and/or optimization works then great you are done. But when that is not an option, then in the model prescribed by the article, you need to combine a policy neural network with your ML prediction. The problem now is that you have two islands to deal with, your object loss function's gradient from neural network can't propagate to your ML prediction. Simplifications that worked so well for ML prediction for humans now are being amplified as errors in your policy network. I don't know how you can have a loss function that can train the policy and communicate with the say a linear regression model's loss function.

After reading Optimizing things in the USSR I ordered my Red Plenty book. It has some interesting observation as to what happens when you "simplify assumptions".

## Wednesday, January 24, 2018

### Information Theory of Deep Learning. Bottleneck theory

The Information Bottleneck Method by Tishby, Pereira, and Bialek is an interesting way to look at what is happening in a deep neural network. You can see the concept explained in following papers

- Deep Learning and the Information Bottleneck Principle , and
- ON THE INFORMATION BOTTLENECK THEORY OF DEEP LEARNING

Professor Tishby also does a nice lecture on the topic in

Information Theory of Deep Learning. Naftali Tishby

With a follow-on alk with more detail:

Information Theory of Deep Learning - Naftali Tishby

Deep learning aside, there are other interesting applications of the bottleneck method. It can be used to categorize music chords:

Information bottleneck web applet tutorial for categorizing music chords

and in this talk, the method is used to quantify prediction in the brain

Stephanie Palmer: "Information bottleneck approaches to quantifying prediction in the brain"

I found the following talk also interesting simplified version of the concept applied to deterministic mappings:

The Deterministic Information Bottleneck

## Sunday, January 21, 2018

### Visualization

Nice visualization of Hilbert Curves and Linear Algebra.

I like this comment by "matoma" in Abstract Vector Space:

Furthermore, why does the exponential function appear everywhere in math? One reason is that it (and all scalar multiples) is an eigenfunction of the differential operator. Same deal for sines and cosines with the second-derivative operator (eigenvalue=-1).

Noting the definition of Eigenfunction

I like this comment by "matoma" in Abstract Vector Space:

Furthermore, why does the exponential function appear everywhere in math? One reason is that it (and all scalar multiples) is an eigenfunction of the differential operator. Same deal for sines and cosines with the second-derivative operator (eigenvalue=-1).

Noting the definition of Eigenfunction

## Monday, January 15, 2018

### Nice Lecture series on Causality

In Reinforcement Learning you need to define functions for rewards and state transitions. Big data is an good source of modeling external actors to a deep learning system. What is needed is a causality model for the data to be able to simulate the external world correctly.

This set of lectures are quick introduction to the field.

part 1

part 2

part 3

part 4

This set of lectures are quick introduction to the field.

part 1

part 2

part 3

part 4

Subscribe to:
Posts (Atom)