- Non-optimal Decoding - requires preferred $s$
- WTA
- Population Vector
- Optimal Decoding - requires encoding model
- Maximum Likelihood: $\text{argmax}_s P[\textbf{r}|s]$
- MAP: $argmax P[s|r] = \text{argmax}_s P[s|r] = \frac{P[r|s]P[s]}{P[r]}$

- Question being answered is:
**How much**does a neural response tell us about a stimulus? - Entropy is a measure of the theoretical capacity of a code to convey information.
- Mutual information measures how much of that capacity is actually used when the code is employed to describe a particular set of data.
- Noise makes conveying information tricky.

- Easy mode uses spike-count firing rates. Can only give lower bound as discards actual firing times.
- Entropy roughly measures how "interesting" or "surprising" a
**set**of responses is. - Is a set of nearly identical neural responses really uninteresting? (Out of context yes, but not in the context of the brain and expectation that each neuron conveys information. e.g. regulatory pulses?)
- Entropy defined in terms of the probability of observing a particular response.
- First define 'surprise/unpredictability' of a single response: $h(P[r]) = -\text{log}_2 P[r]$
- $-$ because want decreasing function, less likely = more surprise.
- $log$ so that entropy of two independent events is additive. $h(P[r1]P[r2]) = h(P[r1])+h(P[r2])$
- base 2 by convention, measures bits.
- Average over all possible responses to get Shannon entropy: $H = - \sum_r P[r]\text{log}_2 P[r]$
- Rare occurrences don't effect total much (low probability).
- Common occurrences don't effect total much (low surprise).
- Entropy of single response = 0.
- Entropy of a coin that can that is either heads or tails with equal chance = 1.
- Entropy of flipping a biased coin shown in 4.1(a).

Entropy is how much information could be encoded. If the coin encoded heads separately from tails with no noise then the information encoded is $1$ equal to the entropy. However, with noise or a non-equal encoding, the full capacity will not be used (mutual information).

- So far entropy can tell us about the variability in responses, but does not link this to the source of the variability (different stimuli). Actual information encoded must (obviously) require responses that depend on the stimulus.
- Mutual information is the difference between the total response entropy (over different stimuli) and the average entropy for given stimulus. In other words, remove the part of the entropy that is not dependent on the stimulus.
- I think this can be thought of as how much the response depends on the stimulus changing after ignoring how much it just changes regardless.

- Entropy of a response to a single stimulus $s$ is the same as before but conditional on $s$. $H_s = -\sum_r P[r|s] \text{log}_2 P[r|s]$
- To get the noise entropy we average this over all stimuli (but the stimulus is fixed in each individual calculation). $H_\text{noise} = \sum_s P[s]H_s$.
- $I_m = H - H_\text{noise}$. Full response entropy minus noise entropy.
- The mutual information that a set of responses conveys about a set of stimuli is identical to the mutual information that the set of stimuli conveys about the responses. (See Section 4.1 for proof.)
- If neuron doesn't respond to stimulus then $I = 0$. If neuron responds completely differently to each stimulus then $I = H$. (See Fig.4.1(b))

- Can often characterise neurons as maximising information (Figure 4.2).
- "Because the mutual information is the full response entropy minus the noise entropy, maximizing the information involves a compromise. We must make the response entropy as large as possible without allowing the noise entropy to get too big."
- With low noise entropy this becomes maximising the response entropy.
- Need to define constraints of the system otherwise maximising response entropy involves just extending the number of possible responses.
- Entropy maximization for a population of neurons requires that the neurons respond to
**independent**pieces of information.