DCB: How to Engineer your way out of a poor architecture decision!
I recently gave a presentation to the New Zealand Network Operators Group (NZNOG) 2011 conference on “Data Centre 3.0”. During my research over the last 8 months coupled with the fact checking I had been following up during the creation of the slides, I kept asking myself:
“Would we need all these protocols if we, as an industry, had made better technology implementation decisions?”
I understand the background and requirements for some of the different technology proposals, particularly Layer 2 Multi-path and the various Data Centre Bridging (DCB) QoS standards, but I cant help but feel that we are trying to bring features of the higher layer protocols down into Layer 2.
Back when I started studying networking (probably around mid 2001 when I first obtained my CCNA), the CCNA curriculum was quite clear on the OSI layer and how each layer had a very particular purpose, with clear functional definitions:
As the popularity of Ethernet Switching (Ivan: You know I mean bridging!) continued to grow and with the majority of Layer 2 networks standardising on Ethernet as the de-facto Layer2 Standard, we started to see individual Layer 2 domains span larger and larger areas. No longer were these simply a series of hosts on a shared bus segment (eg 10Base2) or even a simple hub and spoke segment on a single hub/bridge (eg 10BaseT) but rather a large interconnected mesh of bridges spreading across floors, buildings and campuses.
Now we needed a way of classifying traffic based on priorities that would be consistent across these large layer 2 domains. This was addressed in the 802.1p standard, which allowed priority classification on 802.1Q trunk links – but did nothing for access ports.
Various proposals have been put forward in an effort to address the need for end to end QoS control of Ethernet traffic. One of the driving forces behind this is the requirement of “Lossless Ethernet” in converged storage networks.
The history of SCSI, FibreChannel and FCoE is documented elsewhere, but needless to say some bright spark decided the best solution would be to embed SCSI commends directly into Layer 2 (plus some L2 headers of course), and not build in any error or packet loss checking. Had they chosen instead to use an IP based protocol, they could have easily used the functions already existing TCP/IP to detect these problems, but instead now boffins and propeller heads are busily creating an array of standards to try and combat the fear of dropped packets in storage networks. All of this adds up to new hardware, new chips, and more places for things to break!
On top of this, we have the wonderful phenomenon of “Virtualisation”. With the poor architecture choice of a single vendor (and those that copied them), we now have an army of SysAdmins shouting the mantra of Layer 2 Data Centre Interconnect. Not only do we need to have multiple locations for redundancy, but they must be in the same Layer 2 segment for this design to work correctly!
Traditional (and sensible) network design would put each of these locations into separate IP subnets, and utilise IP routing for clear separation of the distinct networks. Now vendors of network equipment – including load balancers and security devices are scrambling to re-architect their products to support this new design paradigm.
Greg Ferro and I were chatting a while back about all the things people are trying to tack onto “Ethernet” – QoS, OAM, end-to-end communication etc, and this question came up:
How Far do you go before it stops being ethernet?
Why is it that we continually are making a rod for our own back? When do we stop trying to extend protocols with functions they were not designed for, especially when we already had to solutions available to us elsewhere?
I’m not sure where this is all headed, but with the growth of Layer 2 networks spanning across geographic locations fuelled by the growth of virtualisation and converged storage networking are we treading down a well worn path to failure? What costs will there be for organisations when they need to re-evaluate the designs currently considered “Best Practice” by certain vendors?
As always, your thoughts (and flames) are welcome 🙂