There is this beautiful thing: GPT-3 (Generative Pre-trained Transformer 3), which is one of the largest Neural Network models. It uses deep learning to write original text that is indistinguishable from what a human would have written. GPT-3 can write programming code or guitar sheets. It can even ponder the meaning of life.
This model is capable of amazing things, but it all comes at the cost of computing. To train GPT-3, we need about 1.3 GWh. This is a huge amount of energy. It can power ~100 houses for over a year. The carbon footprint is over 550 tons of CO2 emissions. This is like flying from San Francisco to New York and back three times.
→ Where the power is burned?
In modern computers, the data is stored in memory, and CPU accesses it over a bus. The problem is that the process of fetching the data from the memory and back is slow and power-hungry. You can build the fastest ever CPU, but it’s going to spend most of the time idle, waiting for the data. This is known as the Von Neumann bottleneck, and this bottleneck gets particularly critical when we are dealing with large sets of data, for instance in artificial intelligence applications.
The problem is that most of the energy goes not into computing, but into fetching the data from the memory. It takes at least 200 times more energy than a multiplication operation or 700 times more than an addition operation. If we could eliminate the need to read and write to the memory, this would make computing much faster and much more energy efficient.
To achieve this, we need a new type of processor, one that computes directly in memory. The idea is that the storage becomes the processor. There are two ways of implementing it. The most obvious way is to move the computation as close as possible to the memory. Samsung implemented it with their HBM-PIM technology. PIM (Processing-in-Memory) is a computation block that is placed inside High-Bandwidth Memory (HBM). In this case, you don't need to read the data from memory, store it somewhere, and then compute it. You can get the data from the memory, perform the operation, and then save it in the same bank without moving the data to the CPU. All of these can be done in parallel by all PIM blocks.
According to Samsung, it gives at least two times the performance improvement for applications like image classification and speech recognition. They actually claim a 70% power reduction per task. If we go back to our CO2 footprint example, this is an emission of one flight instead of three. The interesting thing here is that the benefit Samsung gets is a factor of two. That's great! However, the big question is: how do we get to the factor of 100? For that, we need to go analog.
When we talk about computing, we typically mean digital. The information is encoded in a binary fashion, with 0s or 1s. Analog, on the other hand, means continuous. We do not only have those ones or zeros but, like in the real world, also all the values in between.
IBM is working on an analog AI chip that computes in memory. We used to think about memory as a place to store zeros and ones. Meanwhile, IBM is using another type of memory called Phase-Change Memory (PCM). In this memory, you can encode analog information. The NN weights are encoded into the resistance of PCM cells by applying an electrical pulse.
Now we apply the input voltage to PCM cells and get current at the output, which is voltage divided by resistance, or in other words, voltage times the conductance, which practically means we multiply activation by weight—this is a multiplication operation. Then all these currents sum up according to Kirchhoff’s law.
This is a matrix multiply-accumulate operation performed in an analog way using voltage and current. The advantage of this approach is that all the computation is done massively in parallel. As there is no movement of data, the computation can be done in a fraction of the time and power.
This IBM Analog chip is still a research project. Meanwhile, a US-based startup, Mythic, has built the first commercial Analog processor for Artificial Intelligence workloads. This tiny processor can be fitted to a tiny camera or to a Google Nest Smart Home device. This chip is capable of running multimillion-parameter NNs. It can store up to 80M weights and execute massive matrix multiplication on a device at the edge without any external memory.
I think Analog in-memory computing could enable the next big leap in computing, especially for AI inference applications. It will allow us to achieve x100 performance improvement, at the same time having much less carbon footprint. What do you think?
If you enjoy my work, consider supporting me on Patreon. The full version of this post is available there.
IBM Analog Chip for AI Explained
The Book: Artificial Intelligence Hardware Design
Thanks for reading Deep in Tech Newsletter! Subscribe for free to receive new posts and support my work.
The advantage of multiple possible states in the same space have to be set against the reliability and accuracy that each state can be measured. A huge advantage of single location, one of two values, digital systems is simplicity, but the downsides are as you mention. Analogue computing was how things were done for centuries and now we are returning to the idea with modern measurement methods. This is super exciting as a way to vastly increased low power compute. Thanks for sharing!