You may already have heard about known issue with Intel Broadwell Family CPUs (Intel Xeon CPU E5-26xx v4 CPUs) where you can hit Exception 14 PSOD if you are running older microcode.
If not, you can read about it in kb2146388
From there you can clearly see you need to be running CPU microcode 0x0B00001B or newer to protect yourself from it.
After I read about it, I knew we could be potentially impacted by this, as we do have two servers using Broadwell CPUs, so I checked microcode revision which we have.
If you don’t know you how to check CPU microcode version, just ssh into your ESXi and run following commands:
You will get following information, from which you can also see microcode information:
Number of microcode updates:0
As you can see mine was older, therefore I proactively contacted Lenovo support to get me a fix before we hit PSOD and have potential outage (I do believe correct approach would be opposite – Vendor should be the one contacting you and providing fix for such issues).
Their response was:
If the client is only querying the KB? please bear in mind that our “change history” does not include every fix listed. Support for these CPU’s was added to UEFI version 2.00: Version 2.00 – BuildID: C4E122S —————————————————————————— Problem(s) Fixed: Enhancements: – Support Intel Broadwell Processors If the nodes are running 2.11 then I see no issue with installing the OS Thanks
So you would assume you are safe, anyway I reminded them we are running older microcode, to double check.
Guess what happened two days later, meanwhile they were checking (maybe)?
PSOD with Exception 14:
If you are running Lenovo System x servers with Broadwell CPUs even with the latest uEFI and older microcode revision, I suggest you to open a case and bombard them and same would be for another vendor too.
So indeed I uploaded them all logs including coredump. And they are still asking to open VMware Case as well, as they are claiming “Support for Broadwells was added in the previous uEFI”. Well I don’t care about support as I can boot the server, I do care about CPU microcode revision.
So let’s see how long it will take to provide fix for us.
Just a little note: you can also upgrade Intel CPU Microcode by yourself as VMware has a way to do it and Intel’s microcodes are publicly available, however you it is always better to get it from your vendor to be sure you are running supported configuration.
I didn’t want to be mentioning Lenovo here, as the issue is with Intel CPUs, however their ignorance caused us outage and people should be aware about it before considering them. This is not a first time I had problems with them, when approaching with something new and I may write later about it….
I’m interested to know, if other vendors already updated their firmwares or still waiting for customers to reach out first. So if you know just leave a comment. Thanks
Update August 30, 2016 : VMware support just confirmed my suspicion and suggested to upgrade microcode. What a surprise 🙂
Update August 31, 2016 : My case got escalated higher and Lenovo support acknowledged the issue, it is supposed to be fixed in Septembers uEFI release, it took them “only” 8 days!
Update November 16, 2016 : new uEFI was released on September 28, 2016
Latest posts by Dusan Tekeljak (see all)
- Set up an alert for port blocked by vSwitch security policy - June 12, 2017
- Enabling agentless Guest (VM) RAM monitoring with vRealize Operations 6.3+ - February 14, 2017
- Just Another ESXi 6.0 Storage APD Handling Bug - November 15, 2016