A real-world problem that defines “big data”: How do you move massive volumes of data across long distances?
Just dump 300TB of data onto a USB drive and ship it. Sounds like a great idea, but a few problems arose.
1) They don’t really make 300Tb USB drives. One provider has some 20TB USB arrays. We could have done Thunderbolt or USB3 with add-on cards, but we’d still be limited by the fact that…
2) It takes a while to load data off a server’s disks onto some other sort of disk. I usually base my rough math on 20-40MByte/sec to pull data off a disk array. Add this to the time it takes to unload it on the other end, and you realize that “we can send it by Fedex overnight” isn’t exactly the speedy proposition it sounds like.
3) There were some policies that would have complicated the ship-overnight part. But those were the simplest to deal with, and not technological, so not really in my scope.
Read the whole story by Robert Novak: In pursuit of the other kind of Big Data(tm)