Is NFS v3 Really That Bad?

Do we really need parallel NFS?

Did some pNFS proponent slip a love potion into the coffee at EMC? Suddenly it’s pNFS time at the company known for its reluctance to embrace file sharing and filesystems in general. The purple prose is flying, with Chad Sakac declaring himself “a big fan of the application of NFS” and Chuck Hollis extolling the “inherent simplicity and ease-of-management of NFS.” The NetApp guys must be amused by the bear hug from Hopkinton, but many are seeing  deja-vu all over again.

Chad’s Icky Bits

(Apologies for that heading, but those are Chad’s words, not mine)

Chad Sakac’s red rose for pNFS included a few thorns aimed at good old NFSv3. He calls these the “icky bits” and spills some ink over them:

  1. “NFS Server failure behavior,” says Chad, leads to issues as serious as “a guest OS crash” and administrators “resorting to unnatural acts” to compensate. He talks about EMC’s DART OS being optimized to fail over in under a minute to avoid application issues and the difficulty in actually accomplishing this feat.
  2. Chad also points out that “NFS client limitations” can lead to “unexpected bottlenecks.” Load balancing large workloads across multiple gigabit Ethernet NICs means hand-tuning, since NFS pins traffic to a single MAC address.

Certainly these limitations were known to many in the storage industry, but haven’t they also been addressed repeatedly? NetApp, EMC, and BlueArc do indeed suggest adjusting NFS heartbeat values to allow time for the cluster to recover, but this seems more a limitation of their clustered server architecture than of NFS itself. Scale-out NFS servers from Isilon and HP don’t seem to require these “unnatural acts.”

As for client limitations, manually balancing client loads is a reality in many large storage architectures, not just NFS. Perhaps the fact that NFS can handle so many more I/O requests in a given timeslice makes this more of an issue, but it tends to be transient.

Chad has repeatedly expressed his love for NFS, especially as a datastore for VMware. Clearly, he intended to point out these “icky bits” to highlight the possibilities for pNFS. But the method used (calling them “icky” for one) resembles mud slinging.

Chuck Wants pNFS

(Chuck’s titles also lend themselves to mis-reading)

Chuck Hollis is more careful in his wording, extolling the virtues of pNFS without calling anything “icky”. Indeed, there’s just one NetApp dig: He says their “emulated containers of LUNs” are “hardly optimized”, which is a welcome change of tone from previous debates.

But the underlying message is the same: pNFS is new and wonderful, encouraging proliferation of hand-holding, flower distribution, and rainbows. Again I ask, is this really true? Is pNFS ready for this kind of adulation when, as Chuck points out, “it’s going to take a while before the rest of the portfolio, industry and ecosystem catches up.  Maybe a year or so.”

Seriously? A year until pNFS is ready for mass enterprise adoption? Admittedly, EMC has been working on pNFS (as MPFS) for a long time, but predictions of “just another year” for a major protocol transition set off warning bells. This is doubly true when most clients (including VMware) don’t yet offer even basic support.

Stephen’s Stance

One wonders if airing this dirty laundry is an attempt to highlight EMC’s pNFS work or to discredit plain old NFS as a datacenter protocol. As I wrote about in  Our New Thing Is Awesome (‘Cause Our Old Thing Sucked), the “parade of progress” sometimes degenerates into “out with the old,” and this is perilous for purveyors of durable goods like storage systems.

I am also very concerned with the proliferation of “layout types” within pNFS. It seems that every vendor has a hand in the protocol, and each is adding their own technology to the mix. We started with files and now have both objects and blocks. Will these be widely supported? Do we really need them? Or will pNFS start looking like Bluetooth: Bloated, incompletely-implemented, and ignored except for special use cases.

But my motivation behind this post is simpler than that. I would like to pose a question: Is NFS (v3) really that “icky”? Do we really need pNFS? Or have these problems been solved previously?