Categories
Uncategorized

HP BL460c G6 NC532i drivers blue-screen crash

We saw repeated blue screens in our 2012 R2 Hyper-V environment during periods of high network activity (Live Migrations) or when inspecting the properties/disabling of our Network Connections.

Reference: https://community.hpe.com/t5/HPE-BladeSystem-Server-Blades/Proliant-bl460c-G6-BSOD-when-configure-nc532i-NIC/td-p/6927531

[Update] http://h20566.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=3958220&docLocale=en_US&docId=emr_na-c05389904

  • Our scenario involved Windows 2012 R2, but this also applies to Windows 2008, 2008 R2, 2012, and 2016
  • Hardware: HP C7000 enclosure, BL460c G6 blade.
  • If you mess up at any point during these instructions, you may have to start over from the beginning, especially when it comes to disabling the NIC in the BIOS and removing drivers.
  • Per the HP advisory/alert, this problem is caused by the October 2016 HP SPP, so if you haven’t install that SPP and you would like to, you should probably install cp31808.exe beforehand so that you will already be running a newer version of the NIC driver and that particular part of the upgrade will be skipped.
  • Our NICs were previously teamed, and were un-teamed as part during the work below because teaming may have been part of the problem. Un-teaming also simplified matters.
  • Get cp31808.exe from HP onto your server (e.g. copy to C: drive)
    • Run cp31808 and extract contents into a folder
    • Retain cp31808.exe
  • Boot into BIOS (optional, use “One Time Boot from: RBSU” in OA)
  • Disable NIC(s)
    • PCI Device Enable/Disable
    • Disable any/all HP NC532i adapters
  • Boot server
  • Launch Device Manager
    • Select Show Hidden Devices
    • Uninstall offending NC532i/Broadcom devices under Network Adapters (select delete driver option if present)
    • Uninstall offending NC532i/Broadcom devices under System Devices (select delete driver option if present)
    • Optional: There’s an option in recent versions of HP’s Virtual Connect to Hide Unused FlexNICs which will reveal in Windows/your OS only the NICs you need; this may make your job easier and may benefit you in other ways
  • Prevent drivers from auto-installing using GPO
    • gpedit.msc
    • Expand Computer Configuration, expand Administrative Templates, expand System, expand Device Installation, and then click Device Installation Restrictions
    • In the right window, double-click Prevent installation of devices not described by other policy settings
    • Click to select Enabled, and then click OK
    • https://support.microsoft.com/en-us/help/2500967/how-to-stop-windows-7-automatically-installing-drivers
  • Reboot into BIOS (optional, use “One Time Boot from: RBSU” in OA)
  • Enable NIC(s)
    • PCI Device Enable/Disable
    • Enable previously-disabled HP NC532i adapters
  • Reboot
  • Use gpedit to set driver auto-install GPO setting above back to “Not Configured”
    • If re-enabling this now causes problems, then delay this step until the end
  • Launch Device Manager; devices will have repopulated
  • Under System Devices ONLY, “manually” update Broadcom drivers (do NOT update the drivers under Network Adapters at this point)
    • Right-click on Broadcom NIC and Update Driver Software
    • Browse my computer (do not let Windows search for the drivers automatically)
    • Go to folder where you extracted cp31808
    • Broadcom drivers will change to “NC532i”
    • You may only have to do this once and it will take effect for all similar devices, or you may have to do it for each System Device, one at a time
    • Now update the drivers under Network Adapters just as you did with the System Devices, one at a time if necessary
  • Reboot
  • Optional: Launch cp31808.exe and see if it reports as up-to-date and optionally install if not
Categories
Uncategorized

Intel Smart Response can’t be enabled with Windows 8.1 / 2012 R2

Scenario: Single 500GB HDD and single 240GB SSD.  I wanted to use SSD cache feature to accelerate hard disk access.  Both HDD and SSD are initialized as GPT, with the “system” disk containing a Recovery Partition, an EFI System Partition, and the C: drive (Boot Partition), with the entire disk allocated.  The SSD is shown as Unallocated.  Intel’s Rapid Storage Technology would not let me enable their Smart Response Technology (SSD caching), however.  The only performance accelerator in the Rapid Storage Technology UI was for Dynamic Storage Accelerator.

On my HP EliteDesk 800 G1 SFF, I had to change the disk access mode in the UEFI/BIOS from AHCI to RAID.  Windows then failed to boot (no surprise there).  After switching back to AHCI, I followed the advice from the following post to reboot in Safe Mode to enable RAID access, and that worked.  If I recall correctly, in Windows’ Device Manager, there was no obvious Intel-provided driver under Storage Controllers until after I enabled RAID mode – only a “Microsoft Storage Spaces Controller.”  I obtained the most recent version of the driver from Intel, but was still unable to enable SSD caching.
http://www.eightforums.com/installation-setup/24141-convert-ahci-mode-raid-mode-without-re-installing.html

After following advice from Tom_GPT on the following thread, I shrunk my C: drive by 1GB (that size was arbitrary).  This resulted in 1GB of unallocated space at the end of the 500GB hard disk.  I was then able to launch Intel’s RST and enable Smart Response Technology.  Using Intel’s latest versions of both their RST and storage drivers is probably advisable.
https://communities.intel.com/thread/45540?start=15&tstart=0

Categories
Uncategorized

NetScaler Login exceeds maximum allowed users after 10.1 upgrade

Shortly after our recent NetScaler upgrade from 9.3 -> 10.1, users reported getting the error “Login exceeds maximum allowed users” in their browsers when attempting to log in to the Access Gateway (NetScaler Gateway).  A remote session with a Citrix technician revealed that we had indeed hit our license limit as seen under NetScaler Gateway / Active User Sessions. We did see that some users were logged in two or more times, and it’s possible that the way licenses are consumed under 10.1 is different from 9.3, which might be why we never hit the licensing limit before.  The options presented by the Citrix tech were:

  1. Ask users to deliberately log out of the Access Gateway when they are done (vs. just allowing their sessions to time out) in order to free up their license.  This would, of course, require user education.
  2. Switch our Access Gateway Virtual Server from SmartAccess Mode (includes VPN access) to Basic Mode (ICA proxy-only).  Without taking additional steps such as allowing VPN for just a subset of our users, this option would remove VPN ability for all users from the gateway but allow unlimited connections through the gateway to our apps.
  3. Lower the timeout value for our Access Gateway, forcing users to re-authenticate to the gateway during the workday.

If memory serves, the technician also mentioned that the 10.5 version of NetScaler would allow a user who logged into the Access Gateway more than once to “assume” the license from his/her previous session.  An immediate upgrade to 10.5 was not an option in our case.

After a quick review of our environment, the technician suggested we switch to Basic Mode on our Virtual Server under NetScaler Gateway / Virtual Servers as no VPN was required in our environment.

Categories
Uncategorized

NetScaler Integrated Caching behavior after 9.3 -> 10.1 upgrade

After a recent NetScaler upgrade from 9.3 to 10.1, we noticed a change in the behavior of the Integrated Caching feature.  Integrated Caching had been enabled for the previous two years, but with the Memory Usage Limit set to zero, caching had been effectively disabled.  After the upgrade, our PeopleSoft application began displaying incorrect content after users logged in.

We were able to tell that Integrated Caching was delivering cached content by visiting Optimization / Integrated Caching / Content Groups and seeing both “non-304 Hits” and “304 Hits” for the DEFAULT Content Group, along with a non-zero value under Memory Usage.

Integrated-Caching-10-1

Since we run in HA mode, we could consult our not-yet-upgraded, 9.3 NetScaler node.  Visiting Integrated Caching / Content Groups / DEFAULT revealed the expected values of zero for Memory Usage, Non-304 Hits, and 304 Hits.

Integrated-Caching-9-3

 

Our solution was to disable Integrated Caching in System / Settings / Configure Basic Features as it wasn’t needed.  As soon as we did this, the undesired content stopped displaying within our PeopleSoft application.

Integrated-Caching-Disable-10-1

Categories
Uncategorized

Redirect URL for SSL_BRIDGE Virtual Server on NetScaler

When you create an SSL_BRIDGE Virtual Server (VIP) in NetScaler, there is no way to specify a Redirect URL (the field is grayed out).  So if your back-end servers are down, there’s no way to specify an outage page.  If you try to create a Responder policy as a workaround, you will be unable to bind it to the SSL_BRIDGE Virtual Server if it that policy contains anything other than a DROP or RESET action (http://discussions.citrix.com/topic/336769-ssl-bridge-virtual-server-response-when-down/).

You can, however, use a Listen Policy to deliver your outage page.  Create a new SSL Virtual Server alongside the existing SSL_BRIDGE Virtual Server using the same IP address and port.  (You can’t normally do this, but you can when you specify a Listen Policy on the Advanced tab.)

Example steps for setting up the new SSL Virtual Server in version 9.3 of the GUI (no changes are made to the existing SSL_BRIDGE Virtual Server):

  • Add your new SSL Virtual Server using the same IP and same port as the existing SSL_BRIDGE Virtual Server
  • Do not bind any Services to the new SSL Virtual Server (it will always be DOWN)
  • Set http://outage_page.your_domain.com (or whatever you please) as the Redirect URL on the new SSL Virtual Server
  • Bind a Listen Policy to the new SSL Virtual Server by setting a Listen Priority of 1 and a Listen Policy Rule of SYS.VSERVER(“ssl_bridge_virtual_server_name”).STATE.NE(up)
  • Add the same SSL certificate that is bound to the SSL_BRIDGE Virtual Server to the new SSL Virtual Server

Click for larger image:

Capture

References: http://support.citrix.com/article/CTX139276 and http://support.citrix.com/proddocs/topic/ns-system-10-map/cb-br-aws-acc-encryp-mapi-smb-ssl-br-win-dom-tsk.html

Categories
Uncategorized

Exchange 2010 Mailbox Move fails with MapiExceptionCallFailed

Moving a mailbox within a single Exchange 2010 SP2 RU8 server failed repeatedly with error text like the following.  Dismounting/remounting the database and running various New-MailboxRepairRequest commands did not fix the issue or provide guidance.

Error: MapiExceptionCallFailed: IExchangeFastTransferEx.TransferBuffer failed (hr=0x80004005, ec=1162)
Diagnostic context:
Lid: 55847 EMSMDBPOOL.EcPoolSessionDoRpc called [length=3004]
Lid: 43559 EMSMDBPOOL.EcPoolSessionDoRpc returned [ec=0x0][length=685][latency=15]
Lid: 23226 — ROP Parse Start —
Lid: 27962 ROP: ropFXDstCopyConfig [83]
Lid: 27962 ROP: ropTellVersion [134]
Lid: 27962 ROP: ropFXDstPutBufferEx [157]
Lid: 17082 ROP Error: 0x48A
Lid: 31329
Lid: 21921 StoreEc: 0x48A
Lid: 27962 ROP: ropExtendedError [250]
Lid: 1494 —- Remote Context Beg —-
Lid: 1238 Remote Context Overflow
Lid: 1947 StoreEc: 0x48A
[…edited…]
Lid: 1750 —- Remote Context End —-
Lid: 26849
Lid: 21817 ROP Failure: 0x48A
Lid: 22630

The mailbox’s failed Move Request Log in the EMC showed that the mailbox had 2819 folders total, which seemed high.  There was also a Litigation Hold on the mailbox, which could be part of the problem.  (In the past, a Litigation Hold in combination with an Exchange bug had resulted in a 1TB mailbox on our servers: a 2GB mailbox with 1TB of redundant calendar data.)

Opening the mailbox using OWA simply to check out those 2819 folder revealed the problem, a massive number of nested Junk E-mail folders totaling about 250MB.

Junk E-mail

After the folders were deleted using Outlook and OWA, the mailbox moved successfully.  If deleting the folders using Outlook/OWA was impossible due to the sheer number of folders, then Outlook Cached Mode would have been tried next, then perhaps a server-side tool from Microsoft, and then possibly an IMAP client.

Categories
Uncategorized

HP Service Pack for ProLiant 2013.02.0 update broke NIC teams, VLANs using tagging

Networking was lost after updating HP’s Service Pack for ProLiant on two cluster nodes.  The servers were both BL460c G6 blades with two NICs that had first been teamed, then split back into two network connections using VLAN tagging.  (Multiple Networks had been assigned in Virtual Connect and then HP’s Network Configuration Utility (NCU) was used to team the NICs and configure the VLANs.)  Both servers were running Windows 2008 R2 SP1, with one being a file server cluster node and the other a member of an Exchange 2010 DAG.  The fact that the nodes were clustered is probably not important, but it was in the Failover Cluster Manager where the failures were most obvious, with our client-facing Cluster Network showing as “Failed,” while the connection used for backups remained functional.

On the first server, disabling/enabling the NICs in Windows’ Network Connections provided temporary relief, where the client-facing (Public) NIC would work for several seconds before failing again as seen in Failover Cluster Manager.  Disabling *both* NICs, then re-enabling the Public NIC first, then the Backup NIC second allowed both NICs to remain up.  This was not a permanent solution, obviously, so all networking as seen from Windows was torn down and recreated in Windows’ Network Connections and HP’s NCU, and the configuration was tested with a reboot just to be sure.

A call to HP revealed that this behavior was not unexpected.  Their advice when doing these updates is:

  • Break the team (and reconfigure one or more now-un-teamed NICs with production IP addresses if network connectivity during upgrade is required).
  • Perform the Service Pack for ProLiant upgrade.
  • Recreate the team.

The HP technician also noted that the previously-installed version of NCU was two years old and that there had been one or more significant updates in the interim.  When asked if frequent updates of the HP software would allow skipping breaking/recreating the team when doing these updates, he said HP’s advice would still be to do the above steps.

The second server proved more nettlesome, with HP’s Service Pack for ProLiant causing the production NIC to go offline soon after the start of the installation, losing the Remote Desktop session with the server.  The RDP session could not be re-established, and the server had to be accessed via iLO.  In iLO, the console session was not responsive to mouse or keyboard input, and the server had to be ungracefully reset (resetting iLO through its administrative web page did not help).  After logging in with cached credentials, the Service Pack for ProLiant installation was repeated “successfully” and the server rebooted.

After reboot, both NICs in Windows were showing as “unplugged” and HP’s NCU did not show any NICs at all, and both of these issues were after the NCU displaying an error that “The version of the miniport driver(s) for the following adapters are not compatible with the HP Network Configuration Utility software installed.”  Running the Service Pack for ProLiant a third time did not remedy the situation, and the CPxxxxxx.EXE file on the Service Pack for ProLiant DVD had to be tracked down and installed individually.  (The Service Pack for ProLiant was identifying the NIC driver as already upgraded, so it was not offering to upgrade it by default.  It may have been possible to force the upgrade from within the Service Pack for ProLiant, without tracking down the individual driver installation file on the DVD.)

After the reboot following the NIC driver upgrade, the NICs did not function normally on this server.  A tear-down and recreate of the team did not help as it had on the previous server either.  With eight NICs (FlexNICs) available to this blade, the solution was to create two separate teams (one for client traffic and the other for backup traffic).

With other blades in the chassis not having the luxury of eight NICs, the solution for those servers may be to remove teaming altogether.  Hopefully, this won’t be required and simply following HP’s advice to dissolve the team prior to the Service Pack for ProLiant upgrade and recreate it after will avoid the problem entirely.

Categories
Uncategorized

2003 R2 Windows Update fails with Error Number 0x80072EE2

After installing Windows Server 2003 R2 in a VM for some quick testing, I installed SP2, rebooted, then went to Windows Update to download an expected multitude of Windows patches.  However, Windows Update failed repeatedly with “Error Number 0x80072EE2” displayed in the browser window.  Updating Windows Update to “Microsoft Update” and IE6 to IE 8 did not help, nor did deleting the SoftwareDistribution folder.

Oddly, turning off the Windows Firewall fixed the problem.  Below is a link to an article that indicated turning the firewall *on* solved the same problem.  With the server fully patched and after re-enabling the firewall, I am still seeing the same behavior (117 patches were installed post-SP2).  I don’t yet know if Automatic Updates (non-browser) will experience the same problem with the firewall enabled.

Reference: http://forums.techarena.in/windows-update/707782.htm

Categories
Uncategorized

The previous installation path could not be found in the registry

I recently upgraded (or tried to upgrade) an Exchange 2010 SP1 + Rollup 6 server to SP2 + Rollup 1 in our test lab.  (I had copied the Rollup 1 installation to the Updates folder within the SP2 installation point so that it would be installed along with SP2.    With the upgrade to SP2+RU1 halfway completed, the upgrade failed with the following error:

“The previous installation path could not be found in the registry.  Only disaster recovery mode is available.”
“The previously installed version could not be determined from the registry. Only disaster recovery mode is available.”

The server was pretty much unrecoverable.  Exchange 2010 services still existed in Server Manager, but were all Disabled.  Re-running the installation, even from the command line (setup.com /mode:RecoverServer), did not work.  Removing the afflicted server from Exchange groups using ADUC and deleting the Exchange server itself from AD using ADSIEDIT.msc did not help.  NB: If I had not tinkered in ADUC and ADSIEDIT, I could have wiped the server, reinstalled the OS, and run setup.com /mode:RecoverServer, and that should have restored the server to its original state, as the server’s details are stored in AD.

Here is Microsoft’s explanation from http://technet.microsoft.com/en-us/library/ff637981.aspx:
“The Updates folder isn’t supported for use during a service pack installation. Therefore, you can’t include (that is, slipstream) an update rollup along with the installation of a service pack. The slipstream installation of an update rollup during a service pack installing hasn’t been tested. Therefore, you may experience unintended results.”

So, this works for new installations, but should not be used for service pack upgrades.  What Microsoft doesn’t mention is whether or not you can copy the Rollup into the Updates folder in a SP2 installation point when you’re installing a new SP2 server from scratch.

Reference: http://support.risualblogs.com/blog/2011/01/24/exchange-2010-sp1-upgrade-failed/

Categories
Uncategorized

LDAP requests to Active Directory, Mac Lion, and DFS fail when domain name points to web servers

We set up ldap.domain.com (ports 389 and 636) on our hardware load balancer, forwarding the LDAP requests on to our Active Directory domain controllers.  It works just fine for most of our applications.  We did find that it failed with some applications, however, because our domain controllers were performing an “LDAP referral,” referring the LDAP client initially connecting to “ldap.domain.com” to instead use “domain.com.”  Normally, LDAP requests to “domain.com” would work correctly in an Active Directory environment thanks to how Active Directory populates DDNS (courtesy of those “same as parent folder” A Records, and Domain Controllers respond in a Round Robin fashion to requests sent to the domain name).  However, in our environment, “domain.com” points to our web servers via our load balancer so that users can simply type “domain.com” into their web browsers.  The LDAP requests were simply being referred to something that couldn’t answer them.

We received some advice to use the standard ports 3268 and 3269 for clear-text and encrypted LDAP traffic, respectively, instead of 389 and 636 because AD would not do an LDAP referral when accessed via these ports.  This worked.  What might also work is configuring the hardware load-balancer (in our case) to divert port 389 and 636 traffic to the domain controllers while still sending port 80 traffic to our web servers.

Allowing additional ports 445, 139, 137, etc. through to the DCs also allowed us to set up our domain-based DFS namespace successfully.  Attempts at doing this had previously failed with the error “The namespace cannot be queried.  The RPC server is unavailable.”  Allowing these same ports through to the DCs also allows Mac OS X Lion (10.7) clients to connect to these domain-based DFS Namespaces/shares.  Their attempts to connect to the DFS domain-based namespace failed with “There was a problem connecting to the server “domain.com”.  The server may not exist or it is unavailable at this time.”

Giving your AD domain name to your web servers in DNS can cause other problems beyond the above.  We’ve so far been able to overcome any issues created by this design, but it would be better not to have this setup in the first place.