NSX-T: The Manager of All Things NSX

You’ve seen it before. The monolithic NSX Manager from which all VMware SDN is spawned. The API endpoint. Provider of the UI. NSX Manager is the centerpiece of the world of NSX.

Welcome back to my adventure in moving from NSX-V to NSX-T!

NSX-T, just like NSX-V, is split into three functional planes: Managment, Control, and Data.

The Managment Plane is mostly the NSX Manager, but it also includes Managment Plane Agents on the hosts. The Managment Plane is a lot of things: my source of truth for network configuration, the persistent repository for the network state that I want, the API and UI provider, and more.

Just like in NSX-V, you deploy the NSX Manager as a virtual appliance. VMware ships the appliance in two different formats now – OVF and qcow2. You see, NSX-T is not nearly as beholden to vSphere as its cousin NSX-V. NSX-T is perfectly happy without VMware’s hypervisor and management stack. You can run happily with only RHEL or Ubuntu as your KVM platform, should you desire. This makes NSX-T a great option for those driving OpenStack for their private SDDC plaftorm.

There are so many more options in the OVF deployment now – 4 different size options (Small, Medium, Medium Large, and Large)

Small – 2 vCPU, 8 GB RAM, 140 GB disk
Medium – 4 vCPU, 16 GB RAM, 140 GB disk
Medium Large – 6 vCPU, 24 GB RAM, 140 GB disk
Large – 8 vCPU, 32 GB RAM, 140 GB disk

You get to choose your managment network (as usual), and decide whether your managment will run on IPv4 or IPv6.

3 sets of passwords for the admin, root, and audit users (yep, you have easily accessible root access here!). You can also specify different usernames for the admin and audit roles, if you don’t like the defaults.

Then you’ve got the host identity and role. Standard IP address and hostname stuff here, with the addition of the NSX role. Here, again, you have choices:

nsx-manager: This is the NSX Manager we know and love. The focal point for UI and API interaction.

nsx-policy-manager: Want to start automating security policies and the like? You need one of these, too (yep, a second appliance).

nsx-cloud-service-manager: Got NSX Cloud? Then get one of these.

nsx-manager+nsx-policy-manager – this multi-role option is only supported on VMConAWS deployments – don’t try this on-prem.

Finally, you set up your DNS configuration, NTP, and whether you want to allow SSH logins. And then you wait a minute for everything to deploy.

Once your done deploying, you can power on that bad boy of a VM. BTW, the memory is all reserved , so watch out.

Next step, logging into the web interface. Just point a browser at your NSX Manager IP or (preferably) hostname, and login with the admin credentials you just set during deployment. You’ll be presented with a beautiful Clarity-driven UI, with a dozen tiles for varying functions at the landing page.

Here, we can get into all kinds of trouble, from configuring load balancers to logical switches. But we’ve got more setup to do by deploying the Central Control Plane. We’ll get to that in another segment.

Before we get into all that, however, stay tuned for the next part in this series – I’ll take you on a tour of the NSX Manager admin CLI and show off some useful tools.

Introduction: From NSX-V to NSX-T. An Adventure

From NSX-V to NSX-T. An Adventure.

I posted a while ago that NSX-T is the future, and the future is now.

And I entirely stand by that statement. While NSX-V is currently the software-defined networking standard at VMware, its time will come.
NSX-T is the architecture of the future. It’s the platrform for both NSX Data Center and NSX Cloud. The tooling you will use to define your networking and security capabilities and policies consistently between on-prem and off.

As it stands, today, NSX-T is really more for developer clouds. It has different capabilities than NSX-V, though the gap is shrinking dramatically with each release – NSX-T 2.2.0 can do an awful lot of cool stuff, and much of the stuff you may be accustomed to doing now with NSX-V. The proverbial “tomorrow” is close, indeed, and tomorrow, NSX-T will take the crown as king of the VMware SDN kingdom. Fortunately, this is not going to be a coup, but rather a peaceful transition (well, maybe not if you want to migrate in-place, but that’s a whole different discussion).

What I want to do in this series is to lay out the similarities and differences of the two platforms, as they stand today (NSX-V 6.4.1 and NSX-T 2.2.0). I will not cover positively everything – just what I would consider the basics. That’s still a significant list – my outline is crazy right now. Maybe it’ll become more manageable, maybe I’m just going to spend an awful lot of time writing. Hopefully, I will provide the information you need to essentially translate your NSX-V vocabulary to NSX-T.

That’s the goal. Remember, this isn’t going to be deep technical content – just a whirlwind tour through the new platform with comparisons to what you’re already familiar with.

I won’t be getting into the API, as that’s just not my wheelhouse right now – I can spell JSON, but not much more than that. So I won’t be covering cool stuff like Dashboard customizations, but there’s plenty for me to work on without that.

Join me on my journey through the wilds of NSX, here’s to hoping that we’ll both learn something!

Well, that escalated quickly…

Ok, so not really. I’ve been thinking about this for a month or two now. I’ve replaced my 2016 MacBook Pro with Touch Bar with, well, a 2013 Mac Pro…and I’m not looking back.

“But John, you travel! How will you work while you’re on the road?” you may be asking. I can’t tell you how much I like my iPad Pro. That’ll be upgraded to a 10.5” shortly – I think that’ll be just right.

Here’s the thing. My job role is changing. Much of that is driven by me. I want to get off the road. I have a new house (and acreage to maintain – that takes time!). My wife and I are raising our grandson. I need to be home more.

Some of it is definitely being driven by the business, too. Education is working on some cool video-based products. And I have most of a studio set up already. It lines up nicely with what I want to be doing.

So if I’m not traveling so much, what am I doing? I’m creating. I’m actually hoping to make this a much more frequent destination for my time. I spent the two months after VMworld kicking out an NSX Micro-Segmentation course. Nothing fancy, but you should go check it out if you haven’t had any NSX training – I think it’s great! Right after we got the first delivery of that out of the way, I made a temporary move to our Curriculum Development team. We’re cranking out new NSX classes, and we’re trying to make ‘em awesome. So there’s a lot of work going into that.

But is that all? Of course not. VMware Learning Zone is a big thing for us right now. You should check that out, too. I’m recording content for that (when time allows). Nothing major right now, but definitely more in the pipeline.

And then, with all of this content creation work I’m doing, I got Scrivener back out, and actually started learning how to use it in earnest. This is one of the greatest things I think I’ve ever found. I can create content until I don’t want to create anymore, and I can do whatever I need to do with it. I think more importantly, it’s helped me start actually organizing thoughts into consumable snippets, and gives me a platform on which to build.

So this has driven me to a change in my daily driver. Earlier this year (once the hype chilled out a bit), I got my hands on a sweet MacBook Pro with Touch Bar. And I _love_ it. I’ve read lots of complaints about the Touch Bar, and whether it’s useful – I hope Apple will be launching a Magic Keyboard with Touch Bar soon. Seriously.

The MBP doesn’t quite fit what I need right now. I bought an OWC Thunderbolt 3 dock to go with it. Which is spectacular. I’ve got my old Thunderbolt Cinema Display rocking out a big screen, and I just picked up a Dell U3417W as a primary display. I do kinda miss the HiDPI joy of Retina displays, but the amount of real estate I have now is unreal, and I’m ok with the tradeoff.

The tradeoff I’m not cool with anymore is the lack of resources for portability. My MBP has 4 hyperthreaded CPU cores, and they’re fast. But they’re not enough for me. I also maxed this thing out at a whopping 16 GB of RAM. Still not enough. Storage, I bumped to a full terabyte, and that’s groovy, but I’ve also got my Synology hanging around in the background for more space if I need it.

With all of this content I’m working through (and the tooling and processes we use), I have a full-time Windows VM I have to run, and I want that to be responsive, so that’s chewing up more than half of my resources right now. And then there’s Camtasia, Logic Pro X, and any other editing tools I need when I’m doing audio or video. And Mail, and Scrivener, and more than one web browser, and whatever else I’m running.

So I took advatage of Other World Computing’s online store (and Black Friday/Cyber Monday), and found a heck of a deal on a trash can Mac Pro, adding another 2 Xeon cores and twice the RAM of the MacBook Pro. Sure, I’m getting what should be considered an old machine, but for what I want to be able to do, it makes a ton more sense. The current Pro certainly doesn’t fit everyone’s use case, but it works great for what I want and need it to do. And I can add memory. Holy crap I miss having that flexibility!

Will I be frustrated late in 2018 when a new Mac Pro is launched? Sure. Am I upset that I’m not waiting another few weeks to get my hands on an iMac Pro? Nah, but that’s an envy-inducing rig right there. I wonder if I can make a business case for my next machine refresh at work………..

Do you know what I’m gonna miss, at least a little bit? Of all things, USB-C. And the Touch Bar, but since I’ve been using the MBP essentially as a desktop, that’s been hidden away from me for a couple of months. But I’m really digging USB-C, for all it’s little gotchas. I like it. And I won’t have any more of it until I refresh this new (old) machine in a while. By then I’m sure we’ll be on a whole new USB spec. And Thunderbolt 4.

Anyway, I’m back to the desktop for a while. I can do everything I need to on the road with my trusty iPad Pro, Pencil, and Smart Keyboard (oh, don’t forget the Spotlight).

What’s out there looking forward? Content. Content on all things NSX. And whatever else I come up with. And I’m going to try to put some here. I’ll see you on the flip side!

NSX Controller Logs

Have you ever wondered what log files matter for day-to-day troubleshooting on the NSX Controller nodes?  There are certainly a plethora to choose if you just type show log and press ‘Enter’.

If you haven’t looked at the new VMware Documentation site yes, I encourage you to check it out.  There’s a whole new layout.  Once you get accustomed to it, I think it’s actually easier than the old web-based documentation.

Anyway, I specifically wanted to call out the NSX CLI Cheat Sheet 1 that’s in the documentation, which walks through common things an NSX Administrator may need to know.

In the Troubleshooting and Operations course, we mention NSX Controller logs a couple of times, and I’d like to expand on that content just a bit.

syslog is, well, the core OS system log.  Not entirely unlike any other Linux system. In addition to the standard logging content, however, some HTTP access logs are also included.

Then, there’s the Zookeeper log (cloudnet/cloudnet\_java-zookeeper<timestamp>.log). This log contains the logged data related to the Zookeeper process that enables NSX Controller Clustering. Some things you may see in this log are disk latency warnings, that could indicate issues with Controller syncing:

Finally, we have the core NSX Controller log file, cloudnet/cloudnet.nsx-controller.root.log.INFO.<timestamp>. This file contains a wealth of information about the operation of the NSX Controller. Let’s look at some of these messages individually.

What we’re seeing in the above screenshot is an issue with the Controller cluster. Fortunately, it’s very short-lived and does not trigger a control plane issue. The Controller Cluster can’t find any functional nodes, so it announces that the cluster will shut down in 30 seconds. This will trigger all connections to this surviving node to drop, causing a control plane outage. A cluster member, however, joined before the 30 second timer completed. The cluster shutdown is aborted, and the Sharding Manager is invoked to distributed slices to the new cluster member.

The next image simply shows us a VTEP Leave Report being acted upon by the Controller:

Here’s an interesting one:

What we see here is that a host sent a VTEP Join Report to the Controller, but the VTEP was already joined to the VNI. If we look carefully, we see that the existing VTEP Join Report came across Connection ID 7 (connId=7), while the new, conflicting report came across Connection ID 8. Also worth noting here is that the control plane sync state for the original VTEP Report was good (isOutOfSync=False), where the new connection has not yet resynchronized its control plane (isOutOfSync=True).

And have you ever wondered about hosts sending ARP information for VMs after the VM has been identified? Take a look at this:

There’s a lot to look at when you get into log analysis, but once you can narrow down the important files, interpreting them is actually pretty straightforward.

Lab Network

So I’ve had a few questions about the network in my lab, since I’m teaching almost nothing but NSX these days.  So let’s talk about it for a bit.

My network is purposefully simple.  And I’ve just rebuilt pretty much everything, so it seems like a good time to document it.

At the edge of my network is a Ubiquiti Networks EdgeRouter Lite (ERL).  It deals with all of my routing inside the network, as well as routing to the outside world.  It’s a 3 interface device – one to the outside world (cable modem), one to my default VLAN and home network, and the third interface is carved into a bunch of sub-interfaces for my VLANs in the lab.

The two internal-facing interfaces are attached to a Cisco SG300-20 that I could also use for routing, but I chose to let the router deal with that.  This is where I have several VLANs set up for my different environments, and that’s all I’ve done with the Cisco switch – no IGMP Snooping, no routing, just VLANs:

  • Local Management – this is where all of my common stuff lives – the vCenter for my physical hosts, vROps, Log Insight, etc
  • Production Management – this is where my GA-versioned vESXi hosts live, along with their relevant supporting pieces – vCenter, NSX Manager, etc
  • Production NSX Control – I set this up simply to have a dedicated network for my NSX 6.1 Controllers.  These could just as easily gone into my Production Management VLAN
  • Production NSX Transport – this is here to simulate a dedicated VXLAN transport network.  Currently, this is superfluous, as NSX 6.0/6.1 VTEPs don’t deal well with VLAN tagging in a nested environment.  Not sure what that’s all about, <sarcasm>I must be running in an unsupported config </sarcasm>
  • Production Management Branch – this network provides a simulation of a remote site
  • Production NSX Transport Branch – again, simulation of a remote site, but much like the Production NSX Transport, this one’s completely superfluous at the moment.
  • I’ve got a matching set of VLANs for my non-GA environment, so that I can have a stable and unstable environments and maintain some level of isolation.  

Since my lab is completely nested, I also have VSAN and vMotion VLANs configured on my distributed switches, but they don’t map to anything in the physical network.

On the NSX side of things, well, I’m rebuilding that right now.  My thought process, since this is a lab, is to attach my outside-facing Edge VMs to the relevant Management network, depending on where I need the Edge.  This sort of flies in the face of having a dedicated Edge cluster, but hey, this is a lab 🙂  

Inside the Edge, my DLR(s?) will attach to a common Transit network, as will the inside interfaces of the Edges.  I’ll set up some OSPF areas so that the EdgeRouter Lite can advertise some networks into the Edge.  The DLR will also advertise its routes up to the Edge, which will in turn advertise back to the ERL.  This should be a pretty simple OSPF config.  I could eliminate the need for OSPF between the Edge and the ERL simply by configuring a default route, but what fun is that?

Then my workloads will attach to whatever Logical Switches I want them attached.  The sky’s the limit inside the SDN.  

For simplicity’s sake at this point, each network segment (VLAN or VXLAN) will have its own /24, though many of them could make due with a /28 or /29 pretty easily.  But I’m not strapped for IP addresses, thanks to our friend RFC 1918, so I’m not going to make things any more complicated than I need to.  

Everything works pretty well.  Sure, I run into some goofy behavior once in a while (see the VTEP VLAN tagging thing above), but this environment is entirely unsupported.  Honestly, it’s a miracle that any of this works at all, and is a galvanizing testament to what VMware software is actually capable of doing.  

Someday, maybe I’ll draw this all up.  But today is not that day.