Cold Porridge

Harys Dalvi

December 2022


“We are not interested in the fact that the brain has the consistency of cold porridge.”
  — Alan Turing

Simulation

Load MNIST dataset

Warning: this is extremely computationally intensive. You shouldn't do it unless you are on a good computer, and even then, use Chrome.

Artificial Neural Network

0
1
2
3
4
5
6
7
8
9
Random

Train


Biological Neural Network

Warning: very computationally intensive

0
1
2
3
4
5
6
7
8
9
Random



Help

This is a simulation comparing artificial neural networks to biological neural networks using the MNIST dataset [], which is a collection of handwritten numbers.

Artificial Neural Network

Click on one of the digits to feed that digit into the neural network, or click Random to choose a random digit. By default, the network has not been trained, so it will guess randomly. Click Train to train the neural network for one epoch so it can make better predictions.

Biological Neural Network

Click on one of the digits to feed that digit into the neural network, or click Random to choose a random digit. Check Train to provide a stimulus letting the neural network know which digit it is so it can strengthen connections associated with guessing that digit. Uncheck Train to let the neural network guess the digit on its own, without outside stimulus. Adjust the Neuroplasticity slider to change how much the connections between neurons can be strengthened or weakened. It might take a few tries of training with a digit before the neural network can recognize it, and it can be difficult for the network to maintain many digits in its memory.

What Is This?

The MNIST dataset of handwritten digits is a classic dataset used as an introduction to artificial neural networks (ANNs). This project is a result of me wondering whether a biophysical simulation of a biological neural network (BNN) could be trained to recognize digits from the MNIST dataset. In the end, because the BNN is so complicated and computationally heavy, it has a limited ability to learn in this implementation. I also had to restrict the dataset to one example per digit for the biological network for this reason. However, if you train it right, it might be able to distinguish a couple digits from each other.

The ANN isn't perfect either; it uses a simple type of layer called a Dense layer without more sophisticated technology, so it sometimes gets digits wrong. However, I think the two networks together are a useful environment to think about the similarities, as well as differences, between the two kinds of neural networks.

In all cases, orange represents a high value, blue represents a neutral value, and black represents a low value. ANN neurons are color-coded by their output, and connections are color-coded by their weights. BNN neurons are color-coded by their membrane potential, and connections are color-coded by the amount of glutamate versus GABA in the presynaptic neuron. The thickness of BNN connections represents the degree of myelination. (More on all that in the technical details section.)

What I Learned

ANNs are often hailed as the next big thing in artificial intelligence (AI), and their ability to learn a wide variety of tasks is truly impressive. Most recently, OpenAI's ChatGPT is making headlines for its ability to do everything from explaining scientific concepts to writing poetry in various languages.

At the same time, we are learning more about the neural network in our heads that lets us think about all this. As impressive as AI has become, it takes between five and eight layers of ANN neurons to simulate just a single biological neuron []. That leads to the question of how, or if, we can incorporate insights from neuroscience to create better AI.

Working on both ANNs and BNNs led me to see a lot of similarities between the approaches. I think there is something fundamental about the idea of a neural network that is extremely effective at encoding abstract information, which is why both AI and evolution came across it as the best solution for complicated tasks.

Both types of neural networks also suffer from some of the same problems. Most notable is the fact that it can be very difficult to explain why the neurons and there connections are set up the way they are. The propagation of information through neurons, whether by linear algebra or biophysics, is highly abstract compared to the tasks it attempts to accomplish, like recognizing a handwritten digit. This makes explainability a core issue in AI as well as neuroscience, as much of neuroscience is explaining why BNNs work the way they do.

On the other hand, I realized that the two types of neural networks are more different than I initially thought. While forward propagation through an ANN can be as simple as matrix multiplication and addition, forward propagation through a BNN involves:

And that doesn't even include aspects of a BNN that I didn't include in this simplified model, such as:

If biological neurons are so much more complex than artificial ones, why does the ANN model here perform better than the BNN one? There are a few possible reasons.

I would also like to note that unlike an ANN, a real human doesn't need hundreds of examples to accurately recognize handwritten digits. This is probably due to a combination of these factors.

Technical Details

ANN

These are the layers of the ANN:

  1. 28×28 (784-neuron) input layer
  2. 20-neuron Dense layer (ReLU)
  3. 16-neuron Dense layer (ReLU)
  4. 10-neuron Dense layer (ReLU)
  5. 10-neuron Dense output layer (softmax)

A Dense layer is a layer where each neuron is connected to each neuron of the previous layer with a certain weight representing the strength of the connection. Each neuron also has a bias that is added to the inputs from the previous neurons. Finally, each neuron has an activation function that determines the output of the neuron.

In this case, the activation function I used is called ReLU. Mathematically, this function can be written as \(f(x) = \max(0, x)\). In other words, positive numbers are kept as-is, while negative numbers become zero.

For example, consider a neuron with three inputs from the previous layer as in this diagram.

-1 3 2 -2 4 -3 b=-6

Diagram of a neuron with three inputs

In this diagram, the values of the three input neurons are -1, 3, and 2. The weights of the respective connections are -2, 4, and -3. The bias of the output neuron is -6. To calculate the output, we first multiply the value of each input neuron by the weight of its connection, and add the results together. This is \((-1)(-2) + (3)(4) + (2)(-3) = 8\). Then we add the bias of the output neuron. This is \(8 + (-6) = 2\). Finally, we apply the activation function. Using the ReLU activation function, since this is a positive number, the output is just 2. If we had a negative number, the output would be 0.

In order to train an ANN, we must adjust the weights and biases so the results can be as accurate as possible. This is done with some clever calculus. To implement it, at first I tried copying various implementations online, but failed because they used useful features of Python and Numpy that I didn't have on this page with JavaScript. In the end, I directly implemented the calculus using an article by Grant Sanderson (3Blue1Brown) []. One of the most important ideas was that $$\frac{\partial C_0}{\partial w^{(L)}} \propto a^{(L-1)}$$ where \(w^{(L)}\) is a weight in a particular layer, \(a^{(L-1)}\) is the activation of a neuron in the previous layer, and \(C_0\) is the cost function which tells us how far off the output is from what it should be. This equation is telling us that the more active an input neuron is, the more the cost function changes with respect to the weight of that input neuron's connection. If we want to reduce the cost function, and the neuron is very active, it will be important to adjust this connection. This is related to Hebbian theory for BNNs, which says that “neurons that fire together wire together.”

BNN

This is where things get very complicated.

First, let's go over the four ions that I simulated in this model: []

Now let's talk about an action potential. By default, the membrane potential of a neuron is around -70 mV; in other words, the voltage inside the neuron is about 70 mV lower than outside. A neuron also has sodium and potassium channels that are closed by default. Let's say that for some reason, the membrane potential increases to -55 mV. This is called depolarization because the difference in voltage (polarity) across the cell membrane is reduced. Depolarization causes sodium channels to open, which causes sodium to rush into the cell, depolarizing it even further. When the voltage has increased enough, the sodium channels close and potassium channels open. This repolarizes the cell and it returns to its resting state. This spike in voltage we just described is called an action potential, and it is how a neuron sends a signal to other neurons []. Unlike the ReLU function, which increases linearly with input, an action potential either occurs or does not. There is no in between.

How can we determine the membrane potential of a cell? A starting point might be the Nernst equation: $$E = \frac{61 \ \text{mV}}{Z} \log_{10} \bigg( \frac{C_{\text{out}}}{C_{\text{in}}} \bigg)$$ where \(E\) is the equilibrium potential for a particular ion, \(Z\) is the valence of the ion (+1 for sodium, +2 for calcium, -1 for chloride, etc.), and 61 mV is a constant based on the Boltzmann constant, the Faraday constant, and human body temperature. However, this is only for a single ion. How do we find the membrane potential given many ions?

We can use the Goldman-Hodgkin-Katz equation, which weighs the Nernst potential for each ion using the permeability of each ion []. Since chloride is a negative ion, looking at the Nernst equation and log rules, we see that the position of outside and inside concentration is swapped. I wasn't able to find a version of this equation with calcium, but I used log rules and similar logic with chloride to make an educated guess. Here is the modified equation that I used:

$$E = (61 \ \text{mV}) \log_{10} \Bigg( \frac{P_\text{K}[\text{K}_\text{out}] + P_\text{Na}[\text{Na}_\text{out}] + P_\text{Cl}[\text{Cl}_\text{in}] + P_\text{Ca}[\text{Ca}_\text{out}]^{\frac{1}{2}}}{P_\text{K}[\text{K}_\text{in}] + P_\text{Na}[\text{Na}_\text{in}] + P_\text{Cl}[\text{Cl}_\text{out}] + P_\text{Ca}[\text{Ca}_\text{in}]^{\frac{1}{2}}} \Bigg)$$
where \(P_\text{A}\) is the permeability of each ion.

Signaling

We have just gone over the basics of how an action potential occurs. How does this lead to signaling from one neuron to another? The action potential travels away from the cell body and down the axon. The axon may have myelination which allows the action potential to travel faster. At the end, the action potential depolarizes synapses at the connections between neurons. The most common kind of synapse is a chemical synapse, in which once depolarization occurs, the neuron sends neurotransmitters across a gap to the next neuron. The neurotransmitters trigger receptors on the next neuron. [] In this model, I used two neurotransmitters and three receptors, although the actual brain has many more: [][][]

These ion channels tend to have a sigmoid opening function []. This is a function of the form \(f(x) = \frac{1}{1 + \exp (a(x-b))}\).

NMDA receptors are of particular importance for this project. This is because of their connection with long-term potentiation (LTP), where the strength of a synapse increases over time, and its counterpart, long-term depression (LTD). High levels of calcium in a synapse with NMDA-R trigger an increase in AMPA-R, making that synapse more likely to cause an action potential in the future []. On the other hand, low levels of calcium have the opposite effect []. This means that if two neurons are connected and frequently fire together, the postsynaptic neuron will have a large influx of calcium due to the depolarization and glutamate from the presynaptic neuron, which will increase AMPA-R and make the connection between the two even stronger. Hence, “neurons that fire together wire together.”

Parts of the Brain

I also modeled my BNN off the way vision works. In particular, I used the concept of retinotopy, where visual input from the retina can be mapped to particular neurons based on location in the field of vision. I also used the mammalian pinwheel structure, in which certain neurons in the visual cortex are sensitive to certain orientations []. For example, one neuron might be sensitive to horizontal lines, while another might be sensitive to diagonal lines. This concept is similar to that of a convolutional neural network with ANNs, which is a much more common and more effective solution for visual data than the dense network I used here for simplicity.

After passing through the visual cortex, one possibility of where the information can go is through the ventral stream, which passes through the temporal lobe. This stream allows the brain to recognize what object the eyes are seeing [].

In addition to the visual cortex and temporal lobe, I simulated an extremely basic version of Broca's area and Wernicke's area, with just ten neurons each. These areas are specialized for producing and understanding speech, respectively []. You can imagine my model as someone seeing a number and then being asked to say the name of that number. However, unlike the visual cortex, my versions of Broca's and Wernicke's areas are not at all modeled off their real-life equivalents; this is just a useful analogy between the output of this BNN and speech.

Additionally, if Train is checked in my model, a signal is sent to the neuron in Wernicke's area corresponding to the digit shown. This releases glutamate across the synapse with the corresponding neuron in Broca's area, with relatively few AMPA receptors but also unusually slow reuptake. This makes the correct neuron in Broca's area more excitable, so it is more likely than the others to have an action potential, which will then strengthen the pathways that led to its firing by Hebbian theory. You can imagine the training as someone showing you a digit as well as telling you its name, allowing you to easily form a connection between the two.

Finally, I added a negative feedback loop connected to Broca's area. This makes it so that once one neuron in Broca's area is fired (once the model “says the name of a digit”), it fires a system that releases GABA with slow reuptake, preventing it from saying a different digit until it receives new input. I don't think this is actually how Broca's area works; in fact, movement of the tongue to produce speech is a very complicated action. But since you can't say multiple things at the same time, I needed something like that to restrict the output of the BNN. Just in case multiple neurons in Broca's area fire, I circle the first neuron that fired, or the neuron that should have fired if none do.

At first, I connect my version of Broca's area to the temporal lobe with no AMPA receptors and a lot of NMDA receptors. Because NMDA receptors require depolarization to remove the Mg2+ block, they will not fire without further stimulation. This further stimulation comes from Wernicke's area during training mode, allowing the correct connections to strengthen by LTP.

Conclusion

Biological neural networks are far more complex than artificial ones. Although AI is advancing at an incredible pace, I think that will remain the case for a long time. The amount of computation that a human brain does is simply unfathomable.

Should we adopt aspects of the complexity of biological neural networks into AI? Some connections, like the similarity between the visual cortex and convolutional neural networks, seem promising. The similarities between ANNs and BNNs are significant enough that development in one field is likely to affect the other.

However, I think it would be a mistake to blindly adopt biological features into AI. The amazing complexity of the brain is not well-adapted to computer hardware, as the limitations of this BNN simulation show. Many things such as receptors and neurotransmitters make sense as physical objects in the brain, but are extremely expensive to accurately simulate in the language of computers and Boolean algebra.

One idea from BNNs that might be promising in AI is the fact that BNNs are more of a network. While ANNs are generally connected sequentially layer by layer, BNNs are connected in a much more intricate fashion. Something like this is already having some success in the AI world: many ANN models with skip connections, in which data from one layer can skip over layers and directly reach another layer, have become quite successful. Expanding on this idea might allow us to achieve some of the useful complexity of BNNs while still keeping in mind that we are building AI with microchips and transistors, not neurons and axons.

Alan Turing said “we are not interested in the fact that the brain has the consistency of cold porridge.” In other words, computers might be able to think much like the brain does: despite the difference in physical medium, many computing principles are the same. The brain is certainly an example of an incredible amount of computing power packed into a reasonably small area without breaking the laws of physics. One day we might be able to replicate that with AI, or maybe have even more computing power. But questions remain. Can we feasibly create such computing power when it took evolution billions of years? And if we can, should we?

References

The GitHub repository for this project is at crackalamoo/neuron-models.

  1. Backpropagation calculus (Grant Sanderson, 3Blue1Brown) ^
  2. How Computationally Complex Is a Single Neuron? (Allison Whitten, Quanta Magazine) ^
  3. Action Potentials (Bill Yates, Pitt Medical Neuroscience, 2022) ^
  4. Vander's Principles of Physiology (Widmaier, Raff, & Strang, 15th edition) ^
  5. Huss, M., Wang, D., Trané, C. et al. An experimentally constrained computational model of NMDA oscillations in lamprey CPG neurons J Comput Neurosci 25, 108–121 (2008). https://doi.org/10.1007/s10827-007-0067-1 ^
  6. Yasunori Hayashi, Molecular mechanism of hippocampal long-term potentiation – Towards multiscale understanding of learning and memory, Neuroscience Research, Volume 175, 2022, Pages 3-15, ISSN 0168-0102, https://doi.org/10.1016/j.neures.2021.08.001. ^
  7. Goetz T, Arslan A, Wisden W, Wulff P. GABAA receptors: structure and function in the basal ganglia. Prog Brain Res. 2007;160:21-41. doi: 10.1016/S0079-6123(06)60003-4. PMID: 17499107; PMCID: PMC2648504. ^
  8. Zhang XC, Yang H, Liu Z, Sun F. Thermodynamics of voltage-gated ion channels. Biophys Rep. 2018;4(6):300-319. doi: 10.1007/s41048-018-0074-y. Epub 2018 Nov 16. PMID: 30596139; PMCID: PMC6276078. ^
  9. Purves D, Augustine GJ, Fitzpatrick D, et al., editors. Neuroscience. 2nd edition. Sunderland (MA): Sinauer Associates; 2001. Long-Term Synaptic Depression. Available from: https://www.ncbi.nlm.nih.gov/books/NBK10899/ ^
  10. Young JJ, Almasi A, Sun SH, et al. Orientation pinwheels in primary visual cortex of a highly visual marsupial. Science Advances 2022 ^
  11. Visual Processing: Cortical Pathways (Valentin Dragoi, Ph.D., Department of Neurobiology and Anatomy, McGovern Medical School) ^
  12. Broca's Area, Wernicke's Area, and Other Language-Processing Areas in the Brain (Bruno Dubuc, McGill University) ^
  13. MNIST digits with node.js (github/cazala) ^