Networking was lost after updating HP’s Service Pack for ProLiant on two cluster nodes. The servers were both BL460c G6 blades with two NICs that had first been teamed, then split back into two network connections using VLAN tagging. (Multiple Networks had been assigned in Virtual Connect and then HP’s Network Configuration Utility (NCU) was used to team the NICs and configure the VLANs.) Both servers were running Windows 2008 R2 SP1, with one being a file server cluster node and the other a member of an Exchange 2010 DAG. The fact that the nodes were clustered is probably not important, but it was in the Failover Cluster Manager where the failures were most obvious, with our client-facing Cluster Network showing as “Failed,” while the connection used for backups remained functional.
On the first server, disabling/enabling the NICs in Windows’ Network Connections provided temporary relief, where the client-facing (Public) NIC would work for several seconds before failing again as seen in Failover Cluster Manager. Disabling *both* NICs, then re-enabling the Public NIC first, then the Backup NIC second allowed both NICs to remain up. This was not a permanent solution, obviously, so all networking as seen from Windows was torn down and recreated in Windows’ Network Connections and HP’s NCU, and the configuration was tested with a reboot just to be sure.
A call to HP revealed that this behavior was not unexpected. Their advice when doing these updates is:
- Break the team (and reconfigure one or more now-un-teamed NICs with production IP addresses if network connectivity during upgrade is required).
- Perform the Service Pack for ProLiant upgrade.
- Recreate the team.
The HP technician also noted that the previously-installed version of NCU was two years old and that there had been one or more significant updates in the interim. When asked if frequent updates of the HP software would allow skipping breaking/recreating the team when doing these updates, he said HP’s advice would still be to do the above steps.
The second server proved more nettlesome, with HP’s Service Pack for ProLiant causing the production NIC to go offline soon after the start of the installation, losing the Remote Desktop session with the server. The RDP session could not be re-established, and the server had to be accessed via iLO. In iLO, the console session was not responsive to mouse or keyboard input, and the server had to be ungracefully reset (resetting iLO through its administrative web page did not help). After logging in with cached credentials, the Service Pack for ProLiant installation was repeated “successfully” and the server rebooted.
After reboot, both NICs in Windows were showing as “unplugged” and HP’s NCU did not show any NICs at all, and both of these issues were after the NCU displaying an error that “The version of the miniport driver(s) for the following adapters are not compatible with the HP Network Configuration Utility software installed.” Running the Service Pack for ProLiant a third time did not remedy the situation, and the CPxxxxxx.EXE file on the Service Pack for ProLiant DVD had to be tracked down and installed individually. (The Service Pack for ProLiant was identifying the NIC driver as already upgraded, so it was not offering to upgrade it by default. It may have been possible to force the upgrade from within the Service Pack for ProLiant, without tracking down the individual driver installation file on the DVD.)
After the reboot following the NIC driver upgrade, the NICs did not function normally on this server. A tear-down and recreate of the team did not help as it had on the previous server either. With eight NICs (FlexNICs) available to this blade, the solution was to create two separate teams (one for client traffic and the other for backup traffic).
With other blades in the chassis not having the luxury of eight NICs, the solution for those servers may be to remove teaming altogether. Hopefully, this won’t be required and simply following HP’s advice to dissolve the team prior to the Service Pack for ProLiant upgrade and recreate it after will avoid the problem entirely.