Dayan & Abbott Meeting 5 (Chapter 4)

Quick Overview on Encoding and Decoding

  1. Non-optimal Decoding - requires preferred $s$
  2. Optimal Decoding - requires encoding model

General Definitions

  1. Question being answered is: How much does a neural response tell us about a stimulus?
  2. Entropy is a measure of the theoretical capacity of a code to convey information.
  3. Mutual information measures how much of that capacity is actually used when the code is employed to describe a particular set of data.
  4. Noise makes conveying information tricky.


  1. Easy mode uses spike-count firing rates. Can only give lower bound as discards actual firing times.
  2. Entropy roughly measures how "interesting" or "surprising" a set of responses is.
  3. Is a set of nearly identical neural responses really uninteresting? (Out of context yes, but not in the context of the brain and expectation that each neuron conveys information. e.g. regulatory pulses?)
  4. Entropy defined in terms of the probability of observing a particular response.
  5. First define 'surprise/unpredictability' of a single response: $h(P[r]) = -\text{log}_2 P[r]$
  6. plot of -log2(P[r])

  7. Average over all possible responses to get Shannon entropy: $H = - \sum_r P[r]\text{log}_2 P[r]$

Entropy is how much information could be encoded. If the coin encoded heads separately from tails with no noise then the information encoded is $1$ equal to the entropy. However, with noise or a non-equal encoding, the full capacity will not be used (mutual information).

Mutual Information

The basic idea.
  1. So far entropy can tell us about the variability in responses, but does not link this to the source of the variability (different stimuli). Actual information encoded must (obviously) require responses that depend on the stimulus.
  2. Mutual information is the difference between the total response entropy (over different stimuli) and the average entropy for given stimulus. In other words, remove the part of the entropy that is not dependent on the stimulus.
  3. I think this can be thought of as how much the response depends on the stimulus changing after ignoring how much it just changes regardless.
Calculating Mutual Information.
  1. Entropy of a response to a single stimulus $s$ is the same as before but conditional on $s$. $H_s = -\sum_r P[r|s] \text{log}_2 P[r|s]$
  2. To get the noise entropy we average this over all stimuli (but the stimulus is fixed in each individual calculation). $H_\text{noise} = \sum_s P[s]H_s$.
  3. $I_m = H - H_\text{noise}$. Full response entropy minus noise entropy.
  4. The mutual information that a set of responses conveys about a set of stimuli is identical to the mutual information that the set of stimuli conveys about the responses. (See Section 4.1 for proof.)
  5. If neuron doesn't respond to stimulus then $I = 0$. If neuron responds completely differently to each stimulus then $I = H$. (See Fig.4.1(b))

Extending to continuous variables

Continuous variables can convey infinite information so to work with them we assume some limit on measurement accuracy. This problem already exists for the spike-rate only case if the spike-rates could have arbitrarily high values.

Information and Entropy Maximisation

  1. Can often characterise neurons as maximising information (Figure 4.2).
  2. "Because the mutual information is the full response entropy minus the noise entropy, maximizing the information involves a compromise. We must make the response entropy as large as possible without allowing the noise entropy to get too big."
  3. With low noise entropy this becomes maximising the response entropy.
  4. Need to define constraints of the system otherwise maximising response entropy involves just extending the number of possible responses.
  5. Entropy maximization for a population of neurons requires that the neurons respond to independent pieces of information.