How to fix driver issues when upgrading VMware ESXi.

Kinamo maintains multiple dedicated VMware vSphere clusters for our customers.
One of the critical tasks in maintaining a healthy and performant vSphere cluster is keeping up with updates.
This blogpost will share one way of dealing with driver issues when upgrading a VMware ESXi 7.0 Update 2 to VMware ESXi 7.0 Update 2a.

We recently found ourselves in a situation where one of our customers’ vSphere clusters ended up running ESXi 7.0 Update 2 Build 17630552.
A build which has been taken offline by VMware due to upgrade-impacting issues on 12 March 2021.
When trying to upgrade these servers to ESXi 7.0 Update 2a (Build 17867351), we encountered some errors.

Please note that apart from the DellEMC vendor add-ons for PowerEdge Servers provided by Lifecycle Manager itself, these images were not customized at all.

The issue here was a downgrade of an add-on component, namely “Mellanox Native OFED ConnectX-3 Drivers” (version 3.19.70.1). The downgrade is unsupported an presented a blocking issue in the vSphere cluster, the hosts could not move forward in the upgrade path.

Since this driver is not used by out host systems, the issue could be fixed by effectively removing the driver from the ESXi host operating system.
The rest of this article will give you a step by step approach to removing a driver from the ESXi.

How do we remove a driver from the ESXi?

First, you enable SSH access to the host, then you login using the “root” credentials over SSH.
Next step is to remove the Mellanox drivers using the following command:

esxcli software vib  remove -n nmlx5-core -n nmlx5-rdma -n nmlx4-core -n nmlx4-en -n nmlx4-rdma

This will generate the following output:

Removal Result
Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.

Reboot Required: true

VIBs Installed:
VIBs Removed: MEL_bootbank_nmlx4-core_3.19.70.1-1OEM.670.0.0.8169922, MEL_bootbank_nmlx4-en_3.19.70.1-1OEM.670.0.0.8169922, MEL_bootbank_nmlx4-rdma_3.19.70.1-1OEM.670.0.0.8169922, MEL_bootbank_nmlx5-core_4.19.70.1-1OEM.700.1.0.15525992, MEL_bootbank_nmlx5-rdma_4.19.70.1-1OEM.700.1.0.15525992

VIBs Skipped:

Your next and final step would be to reboot the host. Either from the SSH command line using the command “reboot” or from the GUI.
Now the offending driver will not be blocking the upgrade of Lifecycle Manager.

We generally discourage making changes to your hosts using SSH and esxcli, but in this case it does bring a solution.
The manual change to the host is undone by the upgrade process which we run directly after this change, bringing them back in compliance.

Please do know that running esxcli commands from SSH may bring your system in an unsupported state to VMware Support!
As with all commands copy-pasted from the Internet, use them carefully and wisely.
Remember to perform these actions while your host is in maintenance mode and not running any virtual machines!

Kinamo & VMware

Kinamo has more then 18 years experience in cloud hosting, managed services, DevOps and development.
As with all Kinamo services we love projects that require an “out of the box” approach. We’re proud about the fact that we can offer accessible and expert support to our clients, all this thanks to our specialist team here in Antwerp, Belgium.

Do you have a question about this article? Or are you wondering if this accessible and expert support in Antwerp is real? Let’s talk.