If You’re Not Living on the Edge, You’re Taking Up Too Much Room

We talked about the Edge last time.  But we really didn’t get into the differences that we might expect.  I just said “we’ll get to that later”.  I suppose it’s later by now.

The Edge in NSX-T is little more than a container.  No, not “container” in a Docker kind of sense, but in a “pool of resources for network services” kind of sense.

See, the Edge is still a virtual machine, except when it isn’t.  In NSX-T, we have our choice of form factors – 3 sizes of virtual machine, and then bare metal.  Need a really, really big Edge?  Get out that server, we’re making a big network device!

So what are the form factors for an Edge these days?

  • Small – 2 vCPU, 4 GB RAM, 120 GB disk
  • Medium – 4 vCPU, 8 GB RAM, 120 GB disk
  • Large – 8 vCPU, 16 GB RAM, 120 GB disk
  • Bare Metal – 8 CPU, 32 GB RAM, 200 GB disk (minimums, naturally)

The virtual machine Edges come with 4 vmxnet3 vNICs installed.  Bare metal, well, we’ve got to watch out.  Chances are, your Intel X520, X540, X550, or X710s will work.  Check the docs for specifics – it’s all spelled out there in the System Requirements section of the Installation Guide

If you give a mouse an Edge, he’ll probably realize that it’s a single point of failure, and he’ll ask for another.

We can’t use Edges right after they’ve been deployed.  They’re really just Fabric Nodes at that point, joined to the Management Plane and just taking up resources.  They need to be promoted to Transport Nodes before they’ll be useful.

Because, while an Edge will still act as the North/South perimeter of our Software-Defined Network, it’s more than that.  An Edge is actually an active participant in the network – NSX host switches and TEPs will be installed on the Edge as part of this process.  And it runs distributed routing processes so that traffic coming through the Edge destined to a workload on a logical switch has an efficient routing path to get there. But TEPs and distributed routers are not all that are deployed to the Edge.

If you recall, from the logical routing post, the services router (SR) was mentioned.  The SR always lives on an Edge, whether the SR belongs to a Tier 0 or a Tier 1 router.  You certainly can influence on _which_ Edge a SR is running, but it’ll always be on one.

Let’s recap the kinds of things that might be deployed as a SR:

  • NAT
  • BGP (Tier 0 only)
  • Firewall
  • Load Balancer

That’s an awful lot of stuff in just 4 bullet points.  We’ll get into the specifics of those later.  These services are generally stateful, and generally need to be centralized.  So we did.

Because some things are centralized, there should probably be some measures taken for high availability (thus my nod to Laura Numeroff and Felicia Bond a couple of paragraphs ago, in case you missed the reference and thought I was simply going mad).  For HA of SRs, we need an Edge Cluster.

An Edge Cluster is simply a grouping of 1 or more identical form factor Edge transport nodes (maximum of 8) put together as a larger logical construct.  A container for your network services containers, if you will.  It’s pretty straightforward to create an Edge Cluster, simply add one in the Edge Clusters tab of Fabric > Nodes, add the Edge Transport Nodes you want participating, define the cluster profile, and off you go.

There’s not much to an Edge Cluster, really.  And not much to the cluster profile, either.  The profile simply defines the BFD probe interval, how many hops are allows, and how many probes have to be lost before we declare an Edge officially dead.

Most services operate statefully, and support only Active/Standby failover.  

We can configure routers as Active/Active or Active/Standby, and that will define what you can do with that router.  It’s worth noting that, if you configure a router as Active/Active, it is essentially a stateless device.  NAT is only available as Reflexive (or Stateless) NAT. The Edge Firewall is also stateless.

In Active/Standby, however, you can so stateful NAT and firewalling.  You can attach a Load Balancer.  But you have to choose your failover mode. If you ask me, you’re really choosing your fallback mode, but that’s not how it’s called out in the UI.  You have two failover mode choices:  Preemptive and Non-Preemptive.  What does that even mean?

It’s actually pretty simple.  In preemptive mode, you define a preferred member of your Edge cluster. This is the Edge we want to use.  If it fails, we’ve got others, so we’re not out for long.  What preemptive means, however, is that when our preferred Edge returns to service, NSX will preempt the service to move it back to the preferred node.  An automatic fallback, if you will.

In a non-preemptive configuration, we do not define a preferred node, so there’s no drive by NSX to move a service back to a preferred location.  No automatic fallback of the service.

Is there more to talk about? Of course there is.  We’ve still got services to talk about, and tooling, and all kinds of good stuff.  Stay tuned!

 

~$ history
Introduction: From NSX-V to NSX-T. An Adventure
NSX-T: The Manager of All Things NSX
The Hall of the Mountain King. or “What Loot do We Find in nsxcli?”
Three Controllers to Rule Them All (that just doesn’t have the same ring to it, does it?)
Beyond Centralization: The Local Control Plane
Transport Zones, Logical Switchies, and Overlays! Oh, My!
Which Way Do We Go? Let’s ask the Logical Router!

Leave a comment