Advances in computing often come from odd angles; routing and switching came out of the need to connect multiple proprietary systems in some way, locally and over long distances (hence the long standard emphasis on open standards and interoperability in the network engineering world). Another interesting development of this type is the boom in Graphic Processing Units (GPUs). These were originally designed for one very specific purpose: to create graphics which are then displayed on a computer monitor. Graphics processing, however, normally involves things like ray tracing, which means computing the path of a beam of light, and how it interacts with various surfaces. While this sort of processing can be done in a serial way—if there are 10,000 rays to trace, they are traced one at a time—this kind of problem is very easy to parallelize. Each ray can be traced by a different processor in parallel.
And this is the origin of the GPU. While general purpose processors (CPUs) are designed to process long running processes serially (a word processor, for instance, needs to work as the user types; it is idle between keystrokes, and thus can be serialized), GPUs are designed to process a “lot of little jobs” very quickly in parallel. There are a number of other applications for this kind of processing, such as data analytics, and Machine Learning (ML), even reaching into Artificial Intelligence (AI). A typical ML process is performed on a neural network, in which a lot of nodes are connected over a logical network, as shown below.
Each of these nodes may perform different functions, or the same function in a slightly different way; the key is as the network is “taught,” the connections between the nodes are either strengthened or weakened for various kinds of inputs. It is the strength or weakening of these connections that represent the “intelligence” in these neural networks. It should be pretty obvious why something like a GPU, which is designed for computing across massively parallel problems, is ideal for this application.
What do composable systems bring to the table in these massively parallel problems? One of the problems a designer working in this space must answer is: how many processors is this job going to require? In a serialized application, the answer is actually pretty simple. How many threads can the job be split in to, and how long will it take at a specific clock speed. In the massively parallel world, however, the answer is much more complicated. The question breaks into three subquestions.
The first of these is how is the problem structured? Much of the code written in this space is designed to break the problem into as small of pieces as possible. A single processor, or core, can take on a single piece, complete it, and then be given another piece. Many times, however, the processing proceeds in stages; the answers to the first stage must be available to start working on the second stage. If you throw more processors at the problem than a single stage can use, some processing power is going to go idle. This can be called the blocking problem—how often will some piece of processing block while waiting for some other piece to complete?
The second of these is how fast do you want the answer? Given the blocking problem above, the more processor you throw at the problem, the faster it will be solved.
The third is what is the structure of the data? This question returns to the blocking problem, above. Learning which images contain cats can be far different than learning which images contain buildings, even when using the same software. The structure of the data is completely different in these two cases; buildings are generally much larger, and have less variation in their features, etc. The building problem might be much easier to solve than the cat problem, which might mean fewer processors are required, even though the input might be the same.
This is where composable systems can make a difference in the GPU world. The ability to tie a varying number of sets of CPUs and GPUs into a single logical compute system, along with storage and networking, allows the system on which a particular job is placed to be more finely tuned to the problem, the code, and the data set. This can allow compute resources to be used more efficiently, and hence allow the effective processing density of a particular fabric to be driven up.
Are GPUs coming to a data center near you? Probably so. This is a good reason to keep an eye on composable systems.
Make sure to check out what Liqid presented at the OCP Summit this week, as well as their upcoming appearance at Nvidia’s GPU Technology Conference. Senior Solutions Architect Mark Noland put together a post with more information on the announcements.