Why Design Process Matters – Developing a Concern for “Why”


Earlier in the year I had the opportunity to sit down virtually with Ethan Banks and Chris Wahl on the Datanauts podcast to discuss two of my favorite topics: design process and documentation. Being a Datanauts listener since the launch of the show in 2015, and a Packet Pushers listener since 2013, it was an honor to be able to contribute content to a platform that has done so much to continually encourage my career development.

As this series is meant to be a companion to the podcast, I’d recommend giving the episode a listen using the link or embedded audio below. We had a great discussion and I believe it’s well worth the time invested.

Datanauts 168: Why Design Process Matters For Data Centers And The Cloud

With that out of the way, I imagine a few questions might come to mind:

  • What makes design process and documentation so important?
  • Aren’t there new, cool technologies that should be talked about, instead?

In short, I believe there’s more than enough news-of-the-day type commentary on specific technologies, and instead I thought I’d share my thoughts on the topics that have drastically altered the trajectory of my career.

Aside from maintaining a general curiosity and investing off-hours time in developing relevant skills, I consider attention to design process and documentation responsible for much of my professional progress.

If you are in the IT infrastructure space, creating well-structured designs and effectively communicating your decisions will go a long way toward improving your work and others’ perception of it.

Before we dive into the details of these topics, though, I wanted to provide a bit of background on my career and how I came to understand and appreciate them. Hopefully the context proves useful as we move forward.

Developing a concern for “why”

Stage 1 – User Focus

My career in IT began in Managed Services where I started off as a systems technician deploying, migrating to and supporting Windows-based environments. While this role was far from glamorous, it exposed me to a wide variety of end-users, the applications they used, and the back-end infrastructure that supported their operation.

All the while I was under intense pressure to simultaneously think on my feet, learn quickly and provide a high level of customer service. At this level, being friendly, resourceful and responsive were probably the most useful techniques available to me, and I relied on them to get me through this user-centric phase.

Stage 2 – Technology Focus

As a result of this initial exposure and the growth that accompanied it, I was able to progress through the ranks to an infrastructure engineer role and focus more on the underlying infrastructure I was most interested in. Along with this transition came a separation from users, their needs and day-to-day complaints. It was a very welcome change.

In its place, a concern for the customer-wide impact of technology and the decisions I made developed. I found as my technical skills broadened, so did the scope of my responsibility and perspective. Detailed understanding of technology, impact of changes, overall work output and a can-do attitude were my go-to techniques when navigating this technology-centric phase.

Stage 3 – Business Focus

At some point, being immersed full time in the implementation and support of infrastructure technologies became less appealing, and I pursued a transition to a much more customer-facing pre-sales architecture role. That experience exposed me to organizations of all sizes with varying levels of internal IT expertise, process maturity and infrastructure complexity, which was a (mostly) welcome change.

As I soon discovered, the techniques I relied on in my previous roles were no longer enough. Being friendly, responsive and resourceful are table-stakes attributes for senior level positions.

A high level of work output is also assumed, as efficiency and multi-tasking are required to perform these new duties. And instead of being beneficial, a detailed understanding of technology, and exposing it during conversation with the wrong audience, can actually prove detrimental.

Glazing-over a customer executives’ eyes with an improperly-timed technical tangent is a quick (and painful) way to learn this lesson. What, then, was required to be successful when handling these new responsibilities of solution design?

A working understanding of the customers business, their goals and project-specific requirements was needed, at a minimum. Beyond this, there was still a need for a structured way to communicate decisions and rationale. The customer needs to understand how you intend to provide value and reduce risk, after all.

The answer, as I eventually discovered, was formalized design process and structured documentation, informed by a curiosity for the business side of things and driven by a concern for “why”.


Getting to a functional understanding of design across multiple technology silos wasn’t completely straightforward, though.

For each of the technologies I worked with, including virtualization and cloud, there were different sets of design guidance, significant variations in quality and sometimes conflicting advice to be reconciled.

As I suspect I am not the only one who has had this experience, I am hoping the lessons I learned will be useful to those traveling along the same path.

Throughout the remainder of this series, we’ll take a look at design process in general, specific guidance offered by both VMware and AWS, see if we can come to a working synthesis and provide a few helpful documentation tips along the way.

Stay tuned!


TFDx @ DTW ’19 – Get To Know: Big Switch

In the final post of this series ahead of TFDx @ Dell Technologies World 2019, we will be focusing on Big Switch Networks, their evolving relationship with Dell EMC and their presence here at the show.

I’d like to start out by acknowledging that partnerships are a dime-a-dozen, and many vendors tentatively put their “support” behind things just to check a box and say they have a capability. In addition, I have noticed a not-uncommon discrepancy between the messaging contained in vendor marketing materials and the messaging (or, general enthusiasm) of their SE’s. As a partner peddling vendor wares, this type of scenario is less than inspiring.

Fortunately, that does not appear to be the case with Dell EMC and their embrace of Open Networking. In discussions with multiple levels and types of Dell EMC partner SE’s, it is consistently mentioned as something that gives them an edge vs. other vendors, and it appears to be a point of pride. They are all about it.

Within this context, the recent news of the agreement between Dell EMC and Big Switch to OEM Big Switch products under the Dell EMC name makes a lot of sense. Dell EMC will provide the merchant-silicon based switching, Big Switch will provide the software, and the customer will get an open, mutually-validated and supported solution.

The primary components within this solution are Dell EMC S-Series Open Networking switches and Big Switch Big Cloud Fabric (BCF) software, so let’s talk a bit about those next.

Dell EMC S-Series Open Networking Switches

For purposes of brevity, I am going to focus on the switch type most relevant to the datacenter, the newly released line of 25Gbit+ switches. According to Dell EMC contacts, the per-port price is very competitive compared to the 10Gbit variants, and adoption of 25Gbit (and above) looks to be accelerating.

Within this lineup, there are a number of port densities and uplink configurations available, including the following:

  • S5048F-ON & S5148F-ON: 48x25GbE and 6x100GbE or 72x25GbE
  • S5212F-ON: 12x25GbE and 3x100GbE
  • S5224F-ON: 24x25GbE and 4x100GbE
  • S5248F-ON: 48x25GbE and 6x100GbE
  • S5296F-ON: 96x25GbE and 8x100GbE
  • S5232F-ON: 32x100GbE
  • S6010-ON: 32x40GbE or 96x10GbE and 8x40GbE
  • S6100:32x100GbE, 32x50GbE, 32x40GbE, 128x25GbE or 128x100GbE (breakout)

Obviously, it’s always impressive to see the specifications associated with the top model in a product line. With up to 128×100 Gbit ports available, the S6100 is no exception.

What stands out to me, though, is the inclusion of a very interesting half-width 12-port model. With this, a customer can power a new all-flash HCI (or other scale-out) environment of up to 12 nodes and occupy only 1U of rack space for networking. All while retaining network switch redundancy.

With compute and storage densities where they are in 2019, you can house a reasonably-sized environment with 12 HCI nodes. It can also be useful to keep HCI-specific east/west traffic off of the existing switching infrastructure, depending on the customer environment.

Not all customers in need of new compute and storage are ready to bite the bullet on a network refresh or re-architecture, either. This gives solution providers a good tool in the toolbelt for these occasions, and other networking vendors should take note.

The star of the show is…a 12-port switch? In a way, yes.

Common within the Dell EMC S-Series of Open Networking switches is the inclusion of the Open Network Install Environment (ONIE), which enables streamlined deployment of alternative OS’es, including Big Switch Networks BCF. Dell’s own OS10 network OS is also available for deployment, should the customer want to go that direction in some instances.

Underpinning all of this is merchant silicon, so customers don’t need to worry about lack of hardware capability, vendor expertise or R&D as much here. This approach allows specialist vendors like Broadcom and Bigfoot to focus on what they do best, chip engineering, while Dell EMC and software vendors like Big Switch can focus on how to get the most from provided capabilities. Hardware parity also brings costs down and encourages innovation through software, which is a beneficial thing.

Although a full analysis of Dell’s use of merchant ASIC’s in their networking gear is outside the scope of this post (and my wheelhouse), I’d recommend checking out this analysis on NextPlatform for more info. I think it’s safe to say the arguments against “whitebox” and for proprietary solutions are beginning to lose their potency, though.

An Open Networking switch equipped with ONIE doesn’t move frames by itself, though. For that, you’ll need an OS like Big Switch BCF, which we’ll touch on next.

Big Switch Networks Big Cloud Fabric

Big Switch Networks Big Cloud Fabric is available in two variants: Public Cloud (BCF-PC) and Enterprise Cloud (BCF-EC). Since we are focusing on the deployment of Big Switch as part of a Dell EMC Open Networking solution, we’ll keep things limited to BCF-EC, for now.

At its foundation, BCF is a controller-based design that moves the control plane off of the switches themselves and onto an intelligent central component (controller). This controller is typically implemented as a highly-available pair of appliances to ensure control services are resistant to failure.

As network changes are needed throughout the environment, these are made in automated fashion through API calls between the controller and subordinate switches. These switches are powered by a combination of merchant silicon and the Switch Light OS and are available from a number of vendors, including Dell EMC.

Big Switch diagram showing an example leaf-spine architecture powered by Big Cloud Fabric

There are a number of benefits associated with the resulting configuration, including simplified central management, increased visibility into traffic flows and behavior, and improved efficiency through automation. One great use-case for this type of deployment is within a VMware-based SDDC. A solid whitepaper expanding on the benefits of the combined Big Switch and Dell EMC networking solution within a VMware-based virtualization environment can be found here.


All in all, I think this OEM agreement is good news in support of competition and customer choice. It’s also encouraging that Dell EMC appears to be bought-in to Open Networking, both in word and in practice.

Despite this, I still think Dell EMC could do a better job of promoting and selling their network line. It’s not a one-way street, though. It’s also the responsibility of partners (all architects and decision-makers, really) to re-evaluate solutions as they evolve and adjust previous conclusions, as appropriate. Increasingly often, you can come up with a good answer without using the C-word (Cisco).

I look forward to talking more with the Big Switch team about BCF on Dell EMC Open Networking switching during their session at TFDx this Wednesday at 16:30. Be sure to check out the livestream and submit any questions/comments on Twitter to the hashtag #TFDx.

TFDx @ DTW ’19 – Get To Know: Kemp

Next up in our Get To Know series, we have a well-known vendor whose primary solutions many of us are already familiar with: load balancers and application delivery controllers. When I run into these components in the real world, they are typically implemented in front of, or between tiers, in a multi-tier application. However, the use case Kemp is bringing to the table for Dell Technology World 2019 and Tech Field Day may not be the one you would expect.

The big news coming out of Kemp ahead of the conference is that they are the only load balancing solution to be certified under the Dell EMC Select program for use with Dell EMC’s Elastic Cloud Storage solution. Although it’s easy enough to understand why a load balancer would be useful within the context of a scale-out storage solution, I am not intimately familiar with ECS itself, so let’s take a quick look at how that solution works.

Dell EMC Elastic Cloud Storage

At a high level, the ECS solution consists of object-based storage software available for deployment on-premises, in public cloud or consumption as a hosted service. Nodes are organized and presented under a single, global namespace and the solution is intended to scale horizontally through addition of nodes.

ECS has been designed to accommodate deployment across multiple locations and/or regions simultaneously, which is a key part of a global data management strategy. As you might expect, a number of data protection schemes are possible, and the available storage can be consumed using a number of protocols, supporting the “unifying” part of Dell EMC’s messaging.
More information on the architecture of ECS can be found here.

While ECS is functional as a standalone product, Dell EMC highly recommends that this solution be deployed in conjunction with a load balancer, which brings us to our next subject.

Dell EMC diagram showing high-level services provided by a distributed deployment of ECS.

Kemp Load Master

At some level, the challenges with this type of architecture are not dissimilar to the ones seen when scaling a multi-tier app or creating a multi-region design for said application.

As we begin to scale horizontally, it becomes critical to have a central point of communication brokerage so load can be distributed and failure can be handled in a graceful way. Management of traffic across geographic regions according to environment load, failure events and user location, can also be important.

This, as you might guess, is where Kemp comes into play. An example of how this joint solution might be deployed is shown below:

Dell EMC diagram showing multi-site deployment of ECS with Kemp load balancing.


The desire to be the single, unifying Object storage platform employed in the cloud and on-premises for broad consumption by customer applications is not unique. Many other vendors are targeting the same goal.

With so many options for scalable object storage available, I will be very interested to hear more about the value proposition of this joint solution, as well as learn how the solution handles issues of scale, availability and performance. I expect Kemp has some differentiators to emphasize here, otherwise they wouldn’t be the only load balancer within the EMC Select program.

If you will be attending Dell Technologies World this year, pay Kemp a visit at booth #1546 to hear more about how the LoadMaster product works within an ECS deployment.

I’d also recommend checking out their TFDx session on Wednesday 5/1/19 at 15:00. The live stream can be accessed here. If you have any questions or comments during their session, feel free to submit them to the hashtags #TFDx and #KempAX4Dell

TFDx @ DTW ’19 – Get To Know: Liqid


It’s been said that innovation begets innovation, and Liqid has developed a very interesting composable platform that builds upon recent developments in the areas of interconnect and fabric technology. But before we get into the technical specifics, let’s quickly touch on a few of the drawbacks of traditional infrastructure that composable solutions look to improve upon:

  • Procuring, deploying, and managing datacenter infrastructure is labor-intensive and can be complex.
  • Bespoke configurations, common lack of centralized management and automation capabilities can impact consistency and reproducibility.
  • Statically-configured resources can be over or under utilized, either leading to performance issues or preventing maximum return on investment.
  • Operations teams responsible for said infrastructure can struggle to be as responsive as their application owners and developers would like.

Composable solutions, on the other hand, take a building-block based approach, where resources are implemented as disaggregated pools and managed dynamically through software.

Depending on which vendor you ask, the definitions of “composable” and “disaggregated”, as well as the types of resources available for composition, will vary. The common theme here is that we are moving away from static configurations toward a systems architecture that is dynamically configurable through software.

Liqid, as you will see, has a very different take on composability than HPE and Dell, but that doesn’t mean HPE and Dell hardware can’t be part of the Liqid solution. Thus their presence at Dell Tech World 2019, I suppose. 🙂

At its core, their solution consists of three primary components: the Fabric, the Resources, and the Manager. We’ll take a closer look at these next.

The Fabric

What is the self-described “holy grail of the datacenter fabric” that makes the Liqid approach to composability possible? Infiniband? No. It’s not Ethernet, either. It’s PCIe.

Liqid argues that because PCIe currently is, and has been, leveraged heavily in modern CPU architectures, it is uniquely positioned to connect compute to peripheral resources across a switched fabric. This architecture decision allows Liqid to avoid additional levels of abstraction or protocol translation, which at a minimum keeps things more elegant.

At its core, the fabric is powered by a 24-port PCI Express switch, with each port being capable of Gen3 x4 speeds. This equates to a per-port bandwidth of 8GB/s full-duplex and a total switch capacity of 192GB/s full-duplex. Devices can be physically connected via copper (MiniSAS) or Photonics, proving some flexibility in connecting the required resources.

Overall, the approach of using a native PCIe fabric allows Liqid to be one step closer to true composability than the bigger players, because a larger number of resource types can be pooled and dynamically allocated. More on this in a moment.

Overview of disaggregated resources and their relationship to the Liqid PCIe fabric.

The Resources

In reading over the benefits of available composable systems, it’s easy to get the impression that compute, network and storage resources are the only relevant resource types. HPE Synergy, as an example, introduces hardware resources in the form of an improved blade chassis (frame) with abstracted, virtual networking and internal storage presented over an internal SAS fabric.

Resources can be dynamically and programmatically managed, but the scope of the sharing domain is limited to the frame. Although this limits flexibility, there are still a number of benefits to HPE Synergy vs. a traditional architecture. This is just one interpretation of what composable should look like.

Liqid takes a different approach and deploys pools of resources using commodity hardware attached to their PCIe switch fabric. Because of the use of PCIe, a number of additional resource types are available for composition, including GPU’s, NVMe storage, FPGA’s and Optane-based memory. Compute resources are provided by commodity x86 servers containing both CPU and RAM. This additional flexibility is a primary differentiator for Liqid vs. the other available composable solutions.

Commodity resources attached to x86 compute over the Liqid PCIe fabric

The Manager

Bringing the solution together is the management component, the Liqid Command Center. This provides administrators with a way to graphically and programmatically create systems of the desired configuration using compute, storage, network and other resources present on the fabric. In short, the features you’d expect to be present are here, and it looks like some attention has been paid to the style of the interface. A brief demonstration is available on YouTube and gives a good preview of the look/feel and capabilities:


Although there’s a significant amount of marketing fluff to sift through at times when looking into composable solutions, I don’t believe composability is just another meaningless throw-around term.

There are benefits to be had, both on the technical and operational side of things. Based on my initial research, the Liqid approach appears to be a step in the right direction. However, achieving true composability looks to be a work in progress for all solution vendors.

I look forward to talking with the Liqid team about that point and more this Wednesday 5/1/19 at TFDx. Check out the live stream at 13:30 using the link below, and feel free to send your questions via Twitter using the hashtag #TFDx and #DellTechWorld.