Nested ESXi on an NSX-T Logical Switch

TIL (Today I Learned) (ok, full disclosure – it wasn’t actually today – this post has been sitting in my drafts for a couple of weeks now) that MAC Learning on a vSwitch (distributed or NSX) doesn’t like that ESXi automagically inherits the physical MAC address of vmnic0. If you’re curious about this, prepare for a wall of text.

============

Here’s the scenario (there are two options here, but both behave the same)

1a. you have a lab running ESXi 6.7 and are using the native MAC Learning function introduced to the DVS. (see here for some great info on getting that set up)

1b. You have a lab running NSX-T and are using a non-default Mac Managment switching profile that has MAC Learning enabled.  Generally, nested ESXi is the use case for this switching profile.

2. You have deployed ESXi VMs, and attached them to a MAC Learning enabled port group

Symptom: single-MAC VMs (think a nested vCenter server, or NSX Manager, or just a Linux or Windows VM) function just fine – they can communicate with all kinds of things. Except your ESXi managment interfaces.

Extra fun bonus configuration: You have a separate VMkernel port dedicated to vMotion, with a separate subnet address, though still using the same uplink (example: Virtual Switch has 2 port groups: Managment and vMotion. vmk0 is attached to Managment PG, and vmk1 is attached to vMotion PG. Uplink vmnic0 is attached to the virtual switch. vMotion interfaces can communicate with each other just fine (ping ++netstack vmotion -I vmk1 <other host vmk1 IP>)

So, this exact scenario has been driving me bonkers. When I upgraded my lab to vSphere 6.7, I tried the native MAC Learning on my port groups. That was a mistake, as all 4 of my nested environments just evaporated. So I scrambled to undo the MAC Learning config, and went back to enabling Promiscuous Mode, Forged Transmits, and MAC Address changes on the port groups.  So much for the new features.

In this configuration, with NSX-V logical switches, my nested environments continued to work just fine, so long as I remembered to set the security options on any newly-created logical switch port groups to Accept.  So I just ran with it.

Fast forward to a couple of weeks ago. I ripped NSX-V off of the physical environment, and rebuilt it with NSX-T. That’s another adventure. Anyway, MAC Learning switching profiles are the option – there’s no editable dvPortGroup for NSX-T logical switches. So I had to figure this one out.

As you may know, when you install ESXi, it takes the MAC address of vmnic0 and assigns it to vmk0. Normally, that’s not a big deal. But when you enable MAC Learning, something goofy happens and Ethernet frames don’t get forwarded from vmnic0 all the way up to vmk0. So, here’s what I did to work around that little challenge.

I set the /Net/FollowHardwareMac flag in esxfg-advcfg to 1, from the default 0 (https://kb.vmware.com/s/article/1031111). After a reboot, this didn’t change the MAC address of vmk0, as the purpose of the FollowHardwareMac setting is to define whether the VMkernel MAC address should change when the underlying vmnic is replaced.  I didn’t replace the vmnic, so nothing should have changed.  This is not a necessary change, but I wanted to try it for completeness sake.  However, this is also useful if you install ESXi to a USB stick, and then move that USB stick to a different physical host, or clone your ESXi VM.

So I had to wipe out and recreate vmk0 from each of my hosts. I was a bit concerned about this, as my VMkernel ports were on a distributed switch. The only gotcha there is that you have to specify the dvPort ID for both deleting AND creating the new VMkernel port with esxcfg-vmknic.  That’s easily identified with esxcfg-vswitch -l.

And now, that nested environment works again. On an NSX-T logical switch.

Hopefully, those of you with home labs won’t have to spin your wheels so much if you run into this.

Lab Network

So I’ve had a few questions about the network in my lab, since I’m teaching almost nothing but NSX these days.  So let’s talk about it for a bit.

My network is purposefully simple.  And I’ve just rebuilt pretty much everything, so it seems like a good time to document it.

At the edge of my network is a Ubiquiti Networks EdgeRouter Lite (ERL).  It deals with all of my routing inside the network, as well as routing to the outside world.  It’s a 3 interface device – one to the outside world (cable modem), one to my default VLAN and home network, and the third interface is carved into a bunch of sub-interfaces for my VLANs in the lab.

The two internal-facing interfaces are attached to a Cisco SG300-20 that I could also use for routing, but I chose to let the router deal with that.  This is where I have several VLANs set up for my different environments, and that’s all I’ve done with the Cisco switch – no IGMP Snooping, no routing, just VLANs:

  • Local Management – this is where all of my common stuff lives – the vCenter for my physical hosts, vROps, Log Insight, etc
  • Production Management – this is where my GA-versioned vESXi hosts live, along with their relevant supporting pieces – vCenter, NSX Manager, etc
  • Production NSX Control – I set this up simply to have a dedicated network for my NSX 6.1 Controllers.  These could just as easily gone into my Production Management VLAN
  • Production NSX Transport – this is here to simulate a dedicated VXLAN transport network.  Currently, this is superfluous, as NSX 6.0/6.1 VTEPs don’t deal well with VLAN tagging in a nested environment.  Not sure what that’s all about, <sarcasm>I must be running in an unsupported config </sarcasm>
  • Production Management Branch – this network provides a simulation of a remote site
  • Production NSX Transport Branch – again, simulation of a remote site, but much like the Production NSX Transport, this one’s completely superfluous at the moment.
  • I’ve got a matching set of VLANs for my non-GA environment, so that I can have a stable and unstable environments and maintain some level of isolation.  

Since my lab is completely nested, I also have VSAN and vMotion VLANs configured on my distributed switches, but they don’t map to anything in the physical network.

On the NSX side of things, well, I’m rebuilding that right now.  My thought process, since this is a lab, is to attach my outside-facing Edge VMs to the relevant Management network, depending on where I need the Edge.  This sort of flies in the face of having a dedicated Edge cluster, but hey, this is a lab 🙂  

Inside the Edge, my DLR(s?) will attach to a common Transit network, as will the inside interfaces of the Edges.  I’ll set up some OSPF areas so that the EdgeRouter Lite can advertise some networks into the Edge.  The DLR will also advertise its routes up to the Edge, which will in turn advertise back to the ERL.  This should be a pretty simple OSPF config.  I could eliminate the need for OSPF between the Edge and the ERL simply by configuring a default route, but what fun is that?

Then my workloads will attach to whatever Logical Switches I want them attached.  The sky’s the limit inside the SDN.  

For simplicity’s sake at this point, each network segment (VLAN or VXLAN) will have its own /24, though many of them could make due with a /28 or /29 pretty easily.  But I’m not strapped for IP addresses, thanks to our friend RFC 1918, so I’m not going to make things any more complicated than I need to.  

Everything works pretty well.  Sure, I run into some goofy behavior once in a while (see the VTEP VLAN tagging thing above), but this environment is entirely unsupported.  Honestly, it’s a miracle that any of this works at all, and is a galvanizing testament to what VMware software is actually capable of doing.  

Someday, maybe I’ll draw this all up.  But today is not that day.  

New Lab Server

Here I am, procrastinating on other stuff, to talk about the new lab setup.  I promised in the last video, I’d do a writeup, and I figured “no time like the present”!

So what do I have going on?  I’ve ripped out the ML370 and ASUS RS500A boxes, and replaced them with a veritable steal from the Dell Outlet.  I decided to go all-in for a nested lab, since I can’t come up with a good reason to put together a physical lab.

So I found a Precision T5610 with a pair of Xeon E5-2620v2s.  Twelve hyper threaded cores of processor get up and go.  The Scratch and Dent unit I bought had 32 GB of RAM installed, along with a 1 TB spindle.  Not a bad start.  But I had gear to work with, and needed more RAM.  

So I ordered a nice 64 GB upgrade kit from Crucial (well, 4 16 GB kits, technically, hoping it’d give me a 96 GB box to work with.  The pre-installed RAM and the Crucial RAM didn’t play too nicely together (Windows wouldn’t load from the spindle, nor would the ESXi installer start).  Boo.  So I pulled out the factory RAM, ran memtest86 for about a day, just to be safe, and am proceeding, for the moment, with 64 GB.

Still just a start, though, as I needed storage.  I had an Icy Dock 4-bay SATA chassis and an IBM M1015 SAS RAID controller in my little HP Microserver.  That box didn’t need those things anymore, so a transplant was necessary.  4 screws later, I had lots of room for 2.5” SATA drives.  I already had a pair of 120 GB Intel 520 SSDs in Icy Dock trays, and I pulled the 2 1 TB Crucial M550 SSDs from the ASUS box.  Now I have plenty of lab storage.  If I need more, I can always take the performance hit and attach something from the DS412+ (like I’ve already done with my “ISO_images” NFS share.

**EDIT** I ran into another snag.  The LSI controller and Icy Dock combination seem to not be playing nicely with the host.  The SSDs seem to randomly just drop offline periodically, which makes me sad.  It could be a power thing (these big SSDs are kind of notorious for power problems, especially in small drive chassis or NAS units), but I’m not going to push it any further.  I pulled the controller and drive cage out of the system, and I should have a SATA power splitter after UPS shows up today (I was too lazy to leave the house LOL), so I’m just going to run the SSDs off the on-boad SATA controller channels, and be happy about it.  

At this point, I thought I was ready for ESXi, but upon installation, I hit a snag.  The T5610 has an Intel Gigabit NIC.  As such, I expected no issues, but the 82579 isn’t noticed by ESXi 5.5u2 for whatever reason.  No biggie – this thing has a nice tool-less case, and less than a minute later, I had a dual-port Intel NIC installed and ready to go (82571EB if you’re curious).  One more reboot and ESXi was installing.

Everything’s pretty happy at the moment.  Here’s what it looks like now that I’ve built up all my virtual ESXi hosts:

Screenshot 2014 09 24 08 47 22

 

Screenshot 2014 09 24 08 48 53

 

Oh, and the “Remote_External” cluster is a set of ESXi virtual machines I have running in VMware Workstation on a Precision M4800 laptop.

I’ll follow up with some network details shortly, since that’s also gotten _way_ complex recently.  All in the name of scenario-based play with NSX. More fun later!

-jk

New Lab is here!

I’m a happy camper!  My new lab gear is in.

I believe we all need some kind of lab environment to play with, otherwise we just don’t learn the hands-on stuff nearly as well or as quickly.  Some employers have lab environments in which to test.  My employer is no different, but I prefer to have control over what I deploy, and when I deploy it.  That way I have no one to blame about anything but myself 🙂

That said, I was running my lab in an old Dell Precision 390 with nothing but 4 cores, 8GB of RAM, and local storage.  That was great a couple of years ago when I put it together, but now, not so much.

The new gear is actually server-grade stuff.  And reasonably inexpensive, if you ask me.

For my storage, I stumbled on a great deal on a N40L Proliant MicroServer from HP.  after repurposing some disk I had laying around the house, I had a small, reasonable storage server.  I installed a bunch of SATA disk: 3 7200 RPM 500GB spindles and a 1TB 7200 RPM spindle in the built-in drive cage.  But that wasn’t quite enough for what I had in mind.  So I bought an IcyDock 4-bay 2.5″ drive chassis for the 5.25″ bay in the MicroServer, and added an IBM M1015 SAS/SATA PCI-e RAID controller to drive the 2.5″ devices.  I had an Intel 520 Series 120GB SSD (bought for the ESXi host, but it didn’t work out) and a WD Scorpio Black 750GB drive just hanging around.  So I added another SSD and Scorpio Black so I could mirror the devices and have some redundancy.

So there’s my SAN and NAS box.  I installed FreeNAS to a 16GB USB stick, and carved up 4 ZFS pools – platinum, gold, silver, and bronze.  Creative, I know LOL.

  • Platinum is a ZFS mirror of the 2 SSDs
  • Gold is a RAID-Z set of the 3 500GB spindles
  • Silver is a ZFS mirror of the 2 Scorpio Blacks
  • Bronze is a ZFS volume on the single 1TB spindle

ZFS Volumes

I debated on swapping Gold and Silver at length, but in the end, left the layout as described.

There are two things I don’t like about this setup, and they both revolve around the networking baked into the MicroServer.

 

  1. Jumbo Frames aren’t supported by the FreeBSD driver for the onboard BroadCom NIC.  This could be fixed in the future with a driver update or the official release of FreeNAS 8.2 (I’m running beta 2 at the moment)
  2. There’s only one onboard NIC.  I’d have liked two NICs, but for the price, maybe I’ll add a PCI-e dual-port Intel Gig card.  That would solve both dislikes.

Platinum, Gold, and Silver are presented via the iSCSI Target on the FreeNAS box as zVol extents.  Bronze is shared via NFS/CIFS, primarily for ISO storage.

As for the ESXi host itself, well here we go:

  • ASUS RS-500A-E6/PS4 chassis
  • 2 x AMD Opteron 6128 8-core CPUs
  • 64GB of Kingston ECC RAM
  • 250GB 7200RPM spindle from the MicroServer
  • 1TB 7200RPM spindle that was recycled from the old lab gear

I chose this seemingly overpowered setup for a few reasons (yep, another bullet point list)

  • Price (the server and its constituent parts only ran me ~$2100USD)
  • Nearly pre-assembled.  I’m not one for building machines anymore
  • Capacity.  Instead of running multiple physical ESXi hosts, I chose to run my lab nested.
  • Compatibility.  This server’s Intel counterpart is on the VMware HCL.  That didn’t mean this one would work, but I felt the odds were high.  The onboard NICs are also both Intel Pro 1000s, which helps.
  • LOM was included.  This is important to me, as I don’t want/need/have tons of extra monitors/keyboards hanging around

So all the parts came in, I put them installed the disks, CPUs, and RAM, dropped an ESXi CD in the drive, booted it up, and wondered – where’s the remote console?  I hadn’t thought about that, so I jacked in a monitor and keyboard only to find that the Delete key is necessary to get into the BIOS to configure the iKVM.  Well, in my case, that posed a little bit of a problem.  See, the only wired keyboards (or wireless, for that matter) are Apple keyboards, since I recently let the last physical Windows box leave my house.  So I had to see if the iKVM pulled DHCP.  I got out iNet, my trusty Mac network scanning utility, scanned my network, and there it was – a MAC address identifying as “ASUSTek Computer, Inc”.  That had to be it, so I fired up a web browser and plugged in the IP.  Now I just had to figure out the username and password.  Documentation to the rescue!  So I got everything configured up, and booted to the ESXi installer, and there you have it, one nice, 16-core 64GB of RAM ESXi host.

Host Summary

It’s doing rather well so far, I’ve got the storage attached, networking set up, and all kinds of VMs running right now, including vCenter Operations, View, SQL, vCloud Director, VMware Data Recovery, vShield Manager, a couple of Win7 desktops, and a few virtualized ESXi hosts, and this is what the box is doing:
Resource Usage
Just to reinforce the importance of Transparent Page Sharing, at the moment, this host is sharing ~17GB of RAM.
Shared Memory
Not to repeat myself, but I’m a happy camper.  I’ve got View set up, so I can work with the environment while I’m on the road, and my next step is to get vCD rolling and happy with a couple of virtualized ESXi hosts so I can start plugging away at building class-specific vApps so I can keep up with the different courses we run.
I hope this helps and perhaps even gives you some inspiration for your own lab environment.  I’m happy to answer any questions you may have about the setup, just drop me a line!

VCP5

Well, they finally let the cat out of the bag!  A couple of months ago now, I took a day trip down to Martin, Tennessee to sit the VCP5 Beta exam.

After a few trials and tribulations getting there (remember all the flooding in the midwest earlier this year?), and them more challenges once I got to the testing facility (one of the other testing software packages didn’t much get along with the Pearson packages), I finally got to sit the exam.  I made it to the facility on time, but with the system problems, it was another 30-45 minutes of waiting before I could actually participate in the exam.

And _wow_ it was long!  And challenging!  This isn’t your father’s VCP exam, assuming most of the questions make it to the final product.  This exam was all about understanding – the product, it’s use cases, everything.  This new generation of VCP will be the sharpest yet (and I think that’s saying something – we’ve had some great exams over the years), and VMware is doing a great job of keeping the value of the VCP at a premium level.

The VCP5 is not a data-regurgitation exam, not can it be explicitly taught.  VMware still has the Install, Configure, Manage requirement if you hold no VCP.  But the class is not going to teach you the exam, just like it’s been since I got into this VMware thing in 2006.  It will, however, provide you with a good foundation with which to start.

If you’re a current VCP4, you can sit the exam with no class requirement until February 29, 2012.

Oh, and I”m writing all this because I finally got my VCP5 exam results.  I passed!  That makes the whole trip to Tennessee and all its trials worth it!

 

Auto Deploy with the vCenter Server Appliance

Auto Deploy is probably one of my favorite new features of vSphere 5.  The ability to build an ESXi image (with Image Builder), and automate the deployment of stateless hosts quickly and seamlessly just gives me a warm fuzzy.

So how do we set this up?

There are two options:

  1. Install Auto Deploy from the vCenter DVD, set up an external DHCP and TFTP server, setup your images, and go
  2. Deploy the vCenter Server Appliance (vCSA), configure the existing DHCP server, start the DHCP and TFTP servers, setup your images, and go.

I went with option number 2, since there was that much less to install.  Just configure and run!

I started by adding a NIC to the vCSA, since I didn’t want my management network also serving up DHCP.  Since everything I have at the moment in the lab is virtual, I chose to set up a deployment vSwitch just for this purpose.  In your lab or production environment, you may attach that deployment network to an existing network.

I copied the ifcfg-eth0 file in /etc/sysconfig/networking/devices/ to ifcfg-eth1 (the 2nd NIC will be eth1) and edited the new one

# cp /etc/sysconfig/networking/devices/ifcfg-eth0 /etc/sysconfig/networking/devices/ifcfg-eth1

# vi /etc/sysconfig/networking/devices/ifcfg-eth1

It should look something like this (I’m using 10.1.1.0/24 as my deployment network):

ifcfg-eth1

Then I created a new symlink in /etc/sysconfig/network to the new file

# ln -s /etc/sysconfig/networking/devices/ifcfg-eth1 /etc/sysconfig/network/ifcfg-eth1

This provides a persistent configuration for the network device should you reboot your vCenter Server Appliance.

That finishes up the Deployment Network configuration.  Now we need to configure all of the other services.

I started with DHCP.  In poking around the /etc/ directory on the appliance, I found that VMware kindly provided a mostly pre-configured configuration template for the DHCP server: /etc/dhcpd.conf.template

/etc/dhcpd.conf.template

So, being kind of lazy, I simply backed up the existing dhcpd.conf file:

# cp /etc/dhcpd.conf /etc/dhcpd.conf.orig

And then copied the template into place as the config:

# cp /etc/dhcpd.conf/template /etc/dhcpd.conf

And got to editing.  My final config file looks like this:

edited /etc/dhcpd.conf

Once that’s done, you can start the DHCP server:

# /etc/init.d/dhcpd start

Then you need to start the TFTP server:

# /etc/init.d/atftpd start

At this point, I have an ESXi VM PXE booting and doing all the right things – SUCCESS!.

I don’t have Auto Deploy configured from PowerCLI quite yet.  I’ve got a default image loaded up, but without Auto Deploy rules waiting, it’s a wash.  I’ll update when I have things set up more completely.  You probably know more about PowerShell and PowerCLI than do I, but this is what I’m getting (even right after I Connect-VIServer). Something’s wacky with PowerCLI communications:

PowerCLI error

I’ll get it figured out, but until then, take this as a start to your Auto Deploy adventures with the vCenter Storage Appliance!

***EDIT***

Well, stilly me figured out the “cannot connect” problem with PowerCLI. Turns out the Auto Deploy services weren’t started on my vCenter Server Appliance. A quick jaunt to https://:5480, then to the Services tab, then clicking the magic “Start ESXi Services” button resolved that one. I think the “Stopped” status for ESXi Autodeploy was what gave it away 🙂 I’m off and running again!

Up and coming

So, I really do work stuff, along with all the tinkering lately.  That’s the problem with new gadgets!

I’ve been gearing up on the vSphere 5 courses from VMware, and I gotta say, you should take these.  Even if it’s just the 2-day What’s New course for you VMware gurus.  What’s New is the condensed “look at all the cool new stuff” class that gets you some hands on time with the new knobs and dials as well as gets you some good discussion time.  The new Install, Configure, Manage class is no slouch, but we’re gently massaging it to work better for most that will likely be taking it.

Add to that the fact that I’m working on a post (more back-burner) about my take on why customers should think about the cloud.  And I’m tossing around a post about automation, and why.  Not so much how, but why.

On the front burner, however, I’m in the process of working through the new Auto Deploy feature of vSphere 5, specifically the integration of Auto Deploy and its related components into the vCenter Server Appliance (vCSA).  Everything’s baked in, so I’m doing a “what to edit and how to make it work” post.  I’m having just a touch of difficulty I think due to the wacky nature of my lab (should be taken care of soon enough, I hope), but the framework is there.

Oh, and add to that my DSL modem gave up the ghost.  I’d say it let out all its magic smoke (you know, the magic smoke that all electronics run on – when the smoke escapes, the electronics don’t work anymore!), but there was no puff of smoke.  It just stopped.  I looked up and there were no lights.  No biggie, I’ve got a U-Verse installation scheduled already to replace the DSL with a fatter pipe, and my cable modem is still the primary pipe.  It just means that my next class won’t have any network redundancy if something goes wrong.

So that’s what’s going on.  Blog breaking, DSL dying, vCSA tinkering fun.  Stay tuned for more goodness!

 

vRAM Licensing Reprise

So I promised to follow up, and here I am. I briefly touched on the new licensing structure the other day, and left with the words “DON’T PANIC”.

I stand by my words today, and here’s why:

The vRAM-based license entitlement is only a factor of what needs to be observed. We still need to purchase licenses per-socket, the virtual memory entitlement is pooled at the vCenter level, and we only need to worry about it for VMs that we’ve powered on. We no longer need to be concerned about the number of cores per socket or physical memory limitations in the host.

Let’s think about this for a moment.

Say I’m a small shop – I figure I need 3 hosts to virtualize my physical environment (say, all of 30 hosts). So I shell out for 3 dual-socket, 4-core hosts, with 32GB of RAM each, and vSphere Standard licenses all around (maybe I saved some cash and bought Essentials or Essentials Plus – those both use the Standard license). 6 total licenses are necessary, each giving me an entitlement to 24GB of vRAM. 144GB of vRAM total. If I virtualize each of my physical machines and give them each 4GB of RAM, I’m looking at 120GB of vRAM allocated. I’m still 24GB under my entitlement. Sure, I’m overcommitting a bit, with only 96GB of physical RAM available to the cluster, but at the same time, I’m going to guess that all of those hosts don’t require 4GB of RAM.

In this scenario, I still have room to kick each host up to 48GB of RAM before I really have to worry about memory overcommitment in earnest.

But let’s take this scenario out just a little farther. I’ve upgraded my hosts to 48GB of RAM each, and as my environment’s grown, I’m finding that I’m getting ready to overcommit memory. I know my environment, and realize that a little bit of overcommitment isn’t a bad thing. I just need to buy 1 more vSphere CPU license, and my vRAM entitlement grows.

Or let’s throw another curveball – instead of upgrading RAM, I replace my hosts. I keep the memory specs the same – 32GB each, but I get get new boxes with 8 cores per socket. In vSphere 4.1, that meant either upgrading my licenses to Enterprise Plus, or purchasing an additional, say, Standard license for each socket. Now, all I have to do is turn up the new boxes, remove the license from the 4-core node, and reapply it to the 8-core node. Problem solved, and no more money spent on software.

What about a fairly large shop? What if I’ve got a pair of 20-node clusters running a ton of VMs?

My large shop has been virtualizing a long time, and has a virtualize-first policy. It has also matured its provisioning processes to go along with virtualization. Virtual machines in this environment are generally provisioned with 1GB of vRAM. The hosts are 4-socket, 6-core systems. We’re running Enterprise Plus. This gives a 48GB vRAM per license entitlement. This means that I can deliver 3840GB of vRAM to each cluster. 3800VMs per cluster (assuming the VMs are provisioned with 1GB vRAM each). Now, that’s 192 VMs per host, which is fairly uncommon consolidation as far as I’ve seen.

The highest consoidation I’ve seen (with my own eyes) is 60:1. But more typically, I tend to see closer to 20:1. Even at 4GB vRAM per VM, at 20:1 consolidation, you’re still only allocating 80GB of vRAM per host, which is well under the 192GB of vRAM entitlement based on the 4 sockets licensed on the host. That gives us a lot of breathing room.

Sure, your VMs will vary in size, 1GB here, 8GB there, but the point is still the same. In most cases, your licensing will not cause you any trouble in most cases. I really think that most customers will find better flexibility in this new licensing model.

What’s even better is that the vRAM entitlement is pooled in a vCenter, so you’re not stuck with a workload on one host.

Change is tough, but it’s not as bad as it may seem at first glance.