The Spring 2016 meeting of the Open Networking User Group wrapped up with a town hall discussion between some of the smartest people in networking today. Great folks like David Meyer, Andy Warfield, and Tom Edsall jumped into the fray to debate whether or not networking was becoming the bottleneck in the data center when it comes to fast data processing. You can see the live blog coverage of the discussion here.
Throw More Bandwidth At It
One of the key points brought up by moderator Lane Patterson of Yahoo is that moving large amounts of stored data across data centers is a huge issue. When you are looking at doing work on the scale of Yahoo or Google or Facebook you start thinking in the hundreds or terabytes or even the petabyte scale. That much data is tough to move. Patterson mentioned that it took a full day to move that data from one side of the country to the other. He used this point to ask a question about when bandwidth is going to catch up to the needs of data relocation.
The first comment here is that bandwidth by itself isn’t the only solution to this issue. What’s really at issue here is a latency problem. It’s not that there wasn’t enough bandwidth to complete the request to move 1,000,000 gigabytes of data from New York to Los Angeles. The data was able to be moved with no issues. The real issue is that it didn’t happen instantly.
Bandwidth issues are usually latency issues in disguise. We have enough bandwidth to move things around provided you have a realistic timescale. As mentioned during this panel, we are eventually going to reach scales where the speed limit isn’t just processing power but the speed of light itself. Imagine how long serializing 1,000,000,000,000,000 of data would be even with a .01 millisecond serialization delay. That amount of time has to elapse before the data is even encoded for transport.
The latency between kicking off a huge file transfer and it’s completion can be solved with more and more bandwidth. But there are practical limits to this. Even with 100Gig links between everything containing data in your data center, you’re still limited to the speed of the slowest link. When you hit the wider Internet, those links can often be slowed down to something approaching 1Gig or even less depending on traffic conditions. And that’s something you have no control over.
One of the solutions offered on the panel was the ability to split the data being transferred into chunks to be sent across parallel circuits and reassembled on the other side. This would increase the amount of processing capability needed to reassemble those fragments, but it would also solve a major limitation of bandwidth limits. Given that the majority of these file transfers are headed for a cloud storage system, the increased need for processing power for reassembly is a bit of non-issue. But why do we even need to move that data in the first place?
Bring The Cloud To The Mountain
One of the points that David Meyer brought up in the panel was that data has gravity. A large amount of data has a very specific gravity. That means moving that much data is difficult under the best of circumstances. Backup and recovery of that data can be difficult at first, but those operations are optimized for incremental data changes over time. Why then are we looking to move terabytes or even petabytes of data quickly in large volumes?
That’s because of the cloud. Companies like Yahoo sometimes need to do a lot of processing on huge data repositories. That means moving data to a cloud processing facility, doing complex operations, and then moving it back to your own repository to avoid paying huge storage charges on that data. If you only need to do this kind of processing three or four times a year on specific dates it’s not difficult to schedule the process to make sure that data is where it needs to be on the day it needs to be there.
But some companies that are looking for near instantaneous access to infinite computing resources in the cloud to do complex analytics at a moment’s notice. That means that companies have moved the latency slider to zero and hope that they can burst that data to the cloud quickly and cheaply. But why does that data need to be moved there?
David Meyer gave us a hint of a better solution. He suggested spinning up compute resources closer to the data rather than moving large amounts of data frequently. Allywyn Sequeira of VMware said it best with, “Data is a huge issue. You need to take the cloud to the data.” That’s where the promise of containers comes into play.
Earlier, we discussed the idea of breaking data up into multiple distinct packet streams and sending it over several routes to reduce latency. This same idea can apply to data analysis as well. Rather than building a system designed to ingest millions or billions of records and sit idle in the mean time, why not create thousands of short-lived operations designed to read hundreds of records, return results, and then destroy themselves to free up resources. Now, the infinite power of the cloud still exists in our private data centers. It just might have a slower processing time as these containers are built, processed, and destroyed.
Now, the need to move data back and forth to the public cloud becomes less of an issue. Rather than relying on WAN connections or complex processing of parallel packet streams, we can build simple compute devices by the tens of thousands to accomplish simple goals and then vanish to release resources. That means less reliance on hardware refresh cycles and the need to over provision huge Top-of-Rack and End-of-Row uplinks. Instead, we can build software networking constructs to help containers do their work faster and transmit their results with more efficiency.
Networking is probably the last remaining non-integrated bottleneck in the data center. But rather than looking at all the ways that you should buy more bandwidth or change the way you provision systems in your data center, perhaps it’s time to take a look at the way you’re processing your data. Maybe the problem isn’t the network after all.