Broadwell ESXi 6.0 Exception 14 PSOD and Lenovo support fail

You may already have heard about known issue with Intel Broadwell Family CPUs (Intel Xeon CPU E5-26xx v4 CPUs)  where you can hit Exception 14 PSOD if you are running older microcode.

If not, you can read about it in kb2146388

From there you can clearly see you need to be running CPU microcode 0x0B00001B or newer to protect yourself from it.

After I read about it, I knew we could be potentially impacted by this, as we do have two servers using Broadwell CPUs, so I checked microcode revision which we have.

If you don’t know you how to check CPU microcode version, just ssh into your ESXi and run following commands:

vsish

cat /hardware/cpu/CPU/0

You will get following information, from which you can also see microcode information:

APIC ID:0x00000000
Core:0
Package:0
Node:0
Number of microcode updates:0
Original Revision:0x0b000014
Current Revision:0x0b000014

As you can see mine was older, therefore I proactively contacted Lenovo support to get me a fix before we hit PSOD and have potential outage (I do believe correct approach would be opposite – Vendor should be the one contacting you and providing fix for such issues).

Their response was:

If the client is only querying the KB? please bear in mind that our “change history” does not include every fix listed. Support for these CPU’s was added to UEFI version 2.00: Version 2.00 – BuildID: C4E122S —————————————————————————— Problem(s) Fixed: Enhancements: – Support Intel Broadwell Processors If the nodes are running 2.11 then I see no issue with installing the OS Thanks

So you would assume you are safe, anyway I reminded them we are running older microcode, to double check.

Guess what happened two days later, meanwhile they were checking (maybe)?

PSOD with Exception 14:

exception14

If you are running Lenovo System x servers with Broadwell CPUs even with the latest uEFI and older microcode revision, I suggest you to open a case  and bombard them and same would be for another vendor too.

So indeed I uploaded them all logs including coredump. And they are still asking to open VMware Case as well, as they are claiming “Support for Broadwells was added in the previous uEFI”. Well I don’t care about support as I can boot the server, I do care about CPU microcode revision.

So let’s see how long it will take to provide fix for us.

Just a little note: you can also upgrade Intel CPU Microcode by yourself as VMware has a way to do it and Intel’s  microcodes are publicly available, however you it is always better to get it from your vendor to be sure you are running supported configuration.

I didn’t want to be mentioning Lenovo here, as the issue is with Intel CPUs, however their ignorance caused us outage and people should be aware about it before considering them. This is not a first time I had problems with them, when approaching with something new and  I may write later about it….

I’m interested to know, if other vendors already updated their firmwares or still waiting for customers to reach out first. So if you know just leave a comment. Thanks


Update August 30, 2016 : VMware support just confirmed my suspicion and suggested to upgrade microcode. What a surprise 🙂

Update August 31, 2016 : My case got escalated higher and Lenovo support acknowledged the issue, it is supposed to be fixed in Septembers uEFI release, it took them “only” 8 days!

Update November 16, 2016 : new uEFI was released on September 28, 2016

 

The following two tabs change content below.
Dusan has over 6 years experience in Virtualization field. Currently working as Senior VMware plarform Architect at one of the biggest retail bank in Slovakia. He has background in closely related technologies including server operating systems, networking and storage. Used to be a member of VMware Center of Excellence at IBM, co-author of several Redpapers. His main scope of work consists from designing and performance optimization of business critical virtualized solutions on vSphere, including, but not limited to Oracle WebLogic, MSSQL and others. He holds several IT industry leading certifications like VCAP-DCD, VCAP-DCA, MCITP and the others. Honored with #vExpert2015 and 2016 awards by VMware for his contribution to the community. Opinions are my own!

About Dusan Tekeljak

Dusan has over 6 years experience in Virtualization field. Currently working as Senior VMware plarform Architect at one of the biggest retail bank in Slovakia. He has background in closely related technologies including server operating systems, networking and storage. Used to be a member of VMware Center of Excellence at IBM, co-author of several Redpapers. His main scope of work consists from designing and performance optimization of business critical virtualized solutions on vSphere, including, but not limited to Oracle WebLogic, MSSQL and others. He holds several IT industry leading certifications like VCAP-DCD, VCAP-DCA, MCITP and the others. Honored with #vExpert2015 and 2016 awards by VMware for his contribution to the community. Opinions are my own!
Bookmark the permalink.

Leave a Reply