UCS Upgrade failed

UCS Upgrade failed

The last days I tried to upgrade my UCS Domain from 2.2.7(c) to 2.2.8(g).

I have two UCS Domains. One of them went through the upgrade fine, the other one not :/

See what happend and how we fixed it …

I went through the steps I described in

STUMBLING BLOCKS IN UPGRADING CISCO UCS (PART 1OF2)

STUMBLING BLOCKS IN UPGRADING CISCO UCS (PART 2OF2)

In short form:

  1. Upgrade UCS Manager
  2. Upgrade IO Modules of the Chassis
  3. Upgrade Fabric Interconnect

After the reboot the upgraded Fabric Interconnect came up in Setup / Config Mode. This means there were no active config after the reboot.

WARNING: Please do the following with a Cisco TAC Engineer or at your own risk!

 

First we tried to re-connect to the running FI and get his config:

Type the hot key to suspend the connection: <CTRL>Q

  Enter the configuration method. (console/gui) ?     console

  Installer has detected the presence of a peer Fabric interconnect. This Fabric interconnect will be added to the cluster. Continue (y/n) ? y

  Enter the admin password of the peer Fabric interconnect: 
    Connecting to peer Fabric interconnect... unable to connect! Password could be wrong.
    Please ensure that the authentication mode on peer Fabric interconnect is set to 'Local'
    Hit enter to try again or type 'restart' to start setup from beginning... 
    ? 

   Connecting to peer Fabric interconnect... done
    Retrieving config from peer Fabric interconnect... done
/isan/bin/getversion: error while loading shared libraries: libosiris.so: cannot open shared object file: No such file or directory
    Installer has determined that the peer Fabric Interconnect is running a different firmware version than the local Fabric. Cannot join cluster.
 
    Local Fabric Interconnect
      UCSM version     : 
      Kernel version   : 
      System version   : 
      local_model_no   : 6248

    Peer Fabric Interconnect
      UCSM version     : 2.2(8g)
      Kernel version   : 5.2(3)N2(2.27c)
      System version   : 5.2(3)N2(2.27c)
      peer_model_no    : 6248


  Do you wish to update firmware on this Fabric Interconnect to the Peer's version? (y/n): y
Updating firmware of Fabric Interconnect....... [ Please don't press Ctrl+c while updating firmware ]
 Updating images 
 Please wait for firmware update to complete.... 
 Checking the Compatibility of new Firmware..... [ Please don't Press ctrl+c ]. 
Verifying image bootflash:/installables/switch/ucs-6100-k9-kickstart.5.2.3.N2.2.27c.bin for boot variable "kickstart".
[#                   ]   0%[####################] 100% -- SUCCESS

Verifying image bootflash:/installables/switch/ucs-6100-k9-system.5.2.3.N2.2.27c.bin for boot variable "system".
[#                   ]   0%[####################] 100% -- SUCCESS

Verifying image type.
[#                   ]   0%[#####               ]  20%[#######             ]  30%[#########           ]  40%[###########         ]  50%[###########         ]  50%[###########         ]  50%[################### ]  90%[####################] 100%[####################] 100% -- SUCCESS

Extracting "system" version from image bootflash:/installables/switch/ucs-6100-k9-system.5.2.3.N2.2.27c.bin.
[#                   ]   0%[####################] 100% -- SUCCESS

Extracting "kickstart" version from image bootflash:/installables/switch/ucs-6100-k9-kickstart.5.2.3.N2.2.27c.bin.
[#                   ]   0%[####################] 100% -- SUCCESS

Extracting "bios" version from image bootflash:/installables/switch/ucs-6100-k9-system.5.2.3.N2.2.27c.bin.
[#                   ]   0%[####################] 100% -- SUCCESS

Performing module support checks.
[####################] 100% -- SUCCESS

Notifying services about system upgrade.
[####################] 100% -- SUCCESS



Compatibility check is done:
Module  bootable          Impact  Install-type  Reason
------  --------  --------------  ------------  ------
     1       yes      disruptive         reset  Incompatible image



Images will be upgraded according to following table:
Module       Image         Running-Version             New-Version  Upg-Required
------  ----------  ----------------------  ----------------------  ------------
     1      system         5.2(3)N2(2.28g)         5.2(3)N2(2.27c)           yes
     1   kickstart         5.2(3)N2(2.28g)         5.2(3)N2(2.27c)           yes
     1        bios      v3.6.0(05/09/2012)      v3.6.0(05/09/2012)            no
     1      SFP-uC                v1.1.0.0                v1.0.0.0            no
     1   power-seq                    v3.0                    v3.0            no
     3   power-seq                    v2.0                    v2.0            no
     1          uC                v1.2.0.1                v1.2.0.1            no


Switch will be reloaded for disruptive upgrade.

Install is in progress, please wait.

Performing runtime checks.
[####################] 100% -- SUCCESS

Setting boot variables.
[#                   ]   0%[####################] 100% -- SUCCESS

Performing configuration copy.
[#                   ]   0%[###                 ]  10%[####                ]  15%[#####               ]  20%[######              ]  25%[#######             ]  30%[########            ]  35%[#########           ]  40%[##########          ]  45%[###########         ]  50%[#############       ]  60%[##############      ]  65%[###############     ]  70%[################    ]  75%[#################   ]  80%[##################  ]  85%[################### ]  90%[####################]  95%[####################] 100%[####################] 100% -- SUCCESS

Converting startup config.
[#                   ]   0%[####################] 100% -- SUCCESS

Install has been successful.
 Firmware Updation Successfully Completed. Please wait to enter the IP address 





  Type 'reboot' to abort configuration and reboot system
  or hit enter to continue. (reboot/<CR>) ? 
    Peer Fabric interconnect Mgmt0 IPv4 Address: 10.0.0.1
    Peer Fabric interconnect Mgmt0 IPv4 Netmask: 255.255.255.0
    Cluster IPv4 address          : 10.0.0.3
 
    Peer FI is IPv4 Cluster enabled. Please Provide Local Fabric Interconnect Mgmt0 IPv4 Address  

  Physical Switch Mgmt0 IP address : 

   Mgmt0 IP must be specified

  Physical Switch Mgmt0 IP address : 10.0.0.2


  Apply and save the configuration (select 'no' if you want to re-enter)? (yes/no): 
  Type 'reboot' to abort configuration and reboot system
  or hit enter to continue. (reboot/<CR>) ? 


  Apply and save the configuration (select 'no' if you want to re-enter)? (yes/no): yes
  Applying configuration. Please wait.

Tue Oct 31 11:59:53 UTC 2017

  Type 'reboot' to abort configuration and reboot system
  or hit enter to continue. (reboot/<CR>) ?   Configuration file - Ok
2017 Oct 31 12:00:10 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256

2017 Oct 31 12:00:10 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256

2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256

2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256

2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:00:12 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256

2017 Oct 31 12:00:12 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:00:12 UCS1-B %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: dhcpd - pmon



User Access Verification
UCS1-B login: admin
Password: 
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2017, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php

At this time the FI does not accept any commands, so we did a “hard” reboot. Again, there was no config on the FI.

N5000 BIOS v.3.6.0, Wed 05/09/2012, 03:15 PM 

989CB4B4B4B4B4B4B4B4B4B49999999299A0A2A3A0A2A3B2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             B2Version 2.00.1201. Copyright (C) 2009 American Megatrends, Inc.                 Booting kickstart image: bootflash:/installables/switch/ucs-6100-k9-kickstart.5
.2.3.N2.2.27c.bin....
...............................................................................
...........................................Image verification OK

Usage: init 0123456SsQqAaBbCcUu

INIT: [   10.657597] I2C - Mezz absent 
Starting system POST.....
  Executing Mod 1 1 SEEPROM Test:...done (0 seconds)
  Executing Mod 1 1 GigE Port Test:....done (32 seconds)
  Executing Mod 1 1 PCIE Test:.................done (0 seconds)
  Mod 1 1 Post Completed Successfully
POST is completed
can't create lock file /var/lock/mtab~207: No such file or directory (use -n flag to override)
S10mount-ramfs.supnuovaca Mounting /isan 3000m
Mounted /isan
Creating /callhome..
Mounting /callhome..
Creating /callhome done.
Callhome spool file system init done.
nohup: redirecting stderr to stdout
autoneg unmodified, ignoring
autoneg unmodified, ignoring
Checking all filesystems..r.r.r. done.
Checking NVRAM block device ... done
The startup-config won't be used until the next reboot.
. 
Loading system software
Uncompressing system image: bootflash:/installables/switch/ucs-6100-k9-system.5.2.3.N2.2.27c.bin

Loading plugin 0: core_plugin...
Loading plugin 1: eth_plugin...
Loading plugin 2: fc_plugin...


13+1 records in
13+1 records out
10240 bytes (10 kB) copied, 5.7017e-05 s, 180 MB/s
ethernet end-host mode on CA
FC end-host mode on CA
n_port virtualizer mode.
---------------------------------------------------------------

INIT: Entering runlevel: 3

touch: cannot touch `/var/lock/subsys/netfs': No such file or directory
/isan/bin/muxif_config: fex vlan id: -f,4042
Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config
Added VLAN with VID == 4042 to IF -:muxif:-
cp: cannot stat `/isan/plugin_img/fex.bin': No such file or directory

---------------------
enabled fc feature
---------------------
2017 Oct 31 12:09:31  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: CLIS: loading cmd files begin  - clis

2017 Oct 31 12:09:34  %$ VDC-1 %$ Oct 31 12:09:34 %KERN-0-SYSTEM_MSG: [   10.657597] I2C - Mezz absent  - kernel

2017 Oct 31 12:09:41  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: CLIS: loading cmd files end  - clis

2017 Oct 31 12:09:41  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: CLIS: init begin  - clis

2017 Oct 31 12:09:49  %$ VDC-1 %$ %SNMPD-2-CRITICAL: SNMP log critical : load_mib_module :Error, while loading the mib module /isan/lib/libsvc_sam_extSnmpPlugin.so (/isan/lib/libsvc_sam_extSnmpPlugin.so: cannot open shared object file: No such file or directory)  

2017 Oct 31 12:09:54  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_controller crashed with crash type:32512

2017 Oct 31 12:09:54  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:54  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dme crashed with crash type:32512

2017 Oct 31 12:09:54  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:54  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dcosAG crashed with crash type:32512

2017 Oct 31 12:09:54  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:55  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_bladeAG crashed with crash type:32512

2017 Oct 31 12:09:55  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:55  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:32512

2017 Oct 31 12:09:55  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:55  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_hostagentAG crashed with crash type:32512

2017 Oct 31 12:09:55  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:56  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_nicAG crashed with crash type:32512

2017 Oct 31 12:09:56  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:56  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:32512

2017 Oct 31 12:09:56  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:56  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_cliD crashed with crash type:32256

2017 Oct 31 12:09:56  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:56  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_pamProxy crashed with crash type:32512

2017 Oct 31 12:09:56  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:57  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_controller crashed with crash type:32512

2017 Oct 31 12:09:57  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:57  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dme crashed with crash type:32512

2017 Oct 31 12:09:57  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:57  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dcosAG crashed with crash type:32512

2017 Oct 31 12:09:57  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:58  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_bladeAG crashed with crash type:32512

2017 Oct 31 12:09:58  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:58  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:32512

2017 Oct 31 12:09:58  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:09:59  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_hostagentAG crashed with crash type:32512

2017 Oct 31 12:09:59  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:00  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_nicAG crashed with crash type:32512

2017 Oct 31 12:10:00  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:00  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:32512

2017 Oct 31 12:10:00  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:01  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_cliD crashed with crash type:32256

2017 Oct 31 12:10:01  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:01  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_pamProxy crashed with crash type:32512

2017 Oct 31 12:10:01  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:02  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_controller crashed with crash type:32512

2017 Oct 31 12:10:02  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:03  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dme crashed with crash type:32512

2017 Oct 31 12:10:03  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:03  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dcosAG crashed with crash type:32512

2017 Oct 31 12:10:03  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:04  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_bladeAG crashed with crash type:32512

2017 Oct 31 12:10:04  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:04  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:32512

2017 Oct 31 12:10:04  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:04  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_hostagentAG crashed with crash type:32512

2017 Oct 31 12:10:04  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:05  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_nicAG crashed with crash type:32512

2017 Oct 31 12:10:05  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:05  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:32512

2017 Oct 31 12:10:05  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:05  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_cliD crashed with crash type:32256

2017 Oct 31 12:10:05  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:06  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_pamProxy crashed with crash type:32512

2017 Oct 31 12:10:06  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:06  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_controller crashed with crash type:32512

2017 Oct 31 12:10:06  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:06  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_controller - pmon

2017 Oct 31 12:10:06  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dme crashed with crash type:32512

2017 Oct 31 12:10:06  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:06  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_dme - pmon

2017 Oct 31 12:10:07  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dcosAG crashed with crash type:32512

2017 Oct 31 12:10:07  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:07  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_dcosAG - pmon

2017 Oct 31 12:10:07  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_bladeAG crashed with crash type:32512

2017 Oct 31 12:10:07  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:07  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_bladeAG - pmon

2017 Oct 31 12:10:07  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:32512

2017 Oct 31 12:10:07  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:07  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_portAG - pmon

2017 Oct 31 12:10:08  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_hostagentAG crashed with crash type:32512

2017 Oct 31 12:10:08  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:08  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_hostagentAG - pmon

2017 Oct 31 12:10:08  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_nicAG crashed with crash type:32512

2017 Oct 31 12:10:08  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:08  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_nicAG - pmon

2017 Oct 31 12:10:08  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:32512

2017 Oct 31 12:10:08  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:08  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_extvmmAG - pmon

2017 Oct 31 12:10:09  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_cliD crashed with crash type:32256

2017 Oct 31 12:10:09  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:09  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_cliD - pmon

2017 Oct 31 12:10:09  %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_pamProxy crashed with crash type:32512

2017 Oct 31 12:10:09  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

2017 Oct 31 12:10:09  %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_pamProxy - pmon

2017 Oct 31 12:10:27  %$ VDC-1 %$ %CALLHOME-2-EVENT: httpd.sh crashed with crash type:0

2017 Oct 31 12:10:27  %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

System is coming up ... Please wait ...
System is coming up ... Please wait ...
System is coming up ... Please wait ...
2017 Oct 31 12:10:46  %$ VDC-1 %$ %VDC_MGR-2-VDC_ONLINE: vdc 1 has come online 

System is coming up ... Please wait ...
nohup: appending output to `nohup.out'
2017 Oct 31 12:11:01 switch %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Running in PIO stats mode  - carmelusd

2017 Oct 31 12:11:01 switch %$ VDC-1 %$ %CALLHOME-2-EVENT: httpd.sh crashed with crash type:0

2017 Oct 31 12:11:01 switch %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH


           ---- Basic System Configuration Dialog ----

  This setup utility will guide you through the basic configuration of
  the system. Only minimal configuration including IP connectivity to
  the Fabric interconnect and its clustering mode is performed through these steps.

  Type Ctrl-C at any time to abort configuration and reboot system.
  To back track or make modifications to already entered values,
  complete input till end of section and answer no when prompted
  to apply configuration.

  
  Enter the configuration method. (console/gui) ? 
  Type 'reboot' to abort configuration and reboot system
  or hit enter to continue. (reboot/<CR>) ? 2017 Oct 31 12:11:34 switch %$ VDC-1 %$ %CALLHOME-2-EVENT: httpd.sh crashed with crash type:0

2017 Oct 31 12:11:34 switch %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH

We had to fix the boot partitions. During the boot process you have to constantly press “CRTL”+ “L”

N5000 BIOS v.3.6.0, Wed 05/09/2012, 03:15 PM 

989CB4B4B4B4B4B4B4B4B4B49999999299A0A2A3A0A2A3B2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             B2Version 2.00.1201. Copyright (C) 2009 American Megatrends, Inc.                 
User break into bootloader


loader>                                                                        dir
bootflash:
  span.log
  ucs-6100-k9-kickstart.5.0.3.N2.2.02q.bin
  ucs-6100-k9-system.5.0.3.N2.2.02q.bin
  chassis.img
  pnuos
  nuova-sim-mgmt-nsg.0.1.0.001.bin
  chassis2.img
  fexth.bin
  installables
  sysdebug
  distributables_hdr
  cores
  techsupport
  mts.log
  vdc_2
  vdc_3
  vdc_4
  distributables
  initial_setup.log
  license
  received

Now we loaded the bootflash:

loader>                                                                         boot bootflash:/installables/switch/ucs-6100-k9-kickstart.5.2.3.N2.2
<.2.3.N2.2.                                                                    27c.bin
Booting kickstart image: bootflash:/installables/switch/ucs-6100-k9-kickstart.5
.2.3.N2.2.27c.bin....
...............................................................................
...........................................Image verification OK
Usage: init 0123456SsQqAaBbCcUu
 INIT: [ 10.668160] I2C - Mezz absent 
Starting system POST.....
 Executing Mod 1 1 SEEPROM Test:...done (0 seconds)
 Executing Mod 1 1 GigE Port Test:....done (32 seconds)
 Executing Mod 1 1 PCIE Test:.................done (0 seconds)
 Mod 1 1 Post Completed Successfully
POST is completed
can't create lock file /var/lock/mtab~207: No such file or directory (use -n flag to override)
S10mount-ramfs.supnuovaca Mounting /isan 3000m
Mounted /isan
Creating /callhome..
Mounting /callhome..
Creating /callhome done.
Callhome spool file system init done.
nohup: redirecting stderr to stdout
autoneg unmodified, ignoring
autoneg unmodified, ignoring
Checking all filesystems..... done.
Checking NVRAM block device ... done
The startup-config won't be used until the next reboot.
. 
Loading system software
No system image Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2016, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php

At this time we had to configure the IP of the FI:

switch(boot)# conf terminal 

Enter configuration commands, one per line.  End with CNTL/Z.

switch(boot)(config)# interface mgmt 0


switch(boot)(config-if)# ip address 10.0.0.2 255.255.255.0 


switch(boot)(config-if)# no shutdown 


switch(boot)(config-if)# exit


switch(boot)(config)# ip default-gateway 10.0.0.254


switch(boot)(config)# exit

To get the Debug Plugin on the switch, you have to get it from a TFTP Server

switch(boot)# copy tftp://[my-tftp-ip]/ucs-dplug.5.2.3.N2.2.27c.gbin workspace:debuug_plugin/ucs-dplug.5.2.3.N2.2.27c.gbin

Trying to connect to tftp server......
Connection to server Established. Copying Started.....
|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\
TFTP get operation was successful
Copy complete, now saving to disk (please wait)...

switch(boot)#

In the following step we unmounted the Filesystems and repaired them:

witch(boot)# copy workspace:debug_plugin/ucs-dplug.5.2.3.N2.2.27c.gbin xy

Copy complete, now saving to disk (please wait)...

switch(boot)# load xy

Loading plugin version 5.2(3)N2(2.27c)
###############################################################
  Warning: debug-plugin is for engineering internal use only!
  For security reason, plugin image has been deleted.
###############################################################
Successfully loaded debug-plugin!!!
Linux(debug)# umount /dev/mtdblock2
Linux(debug)# umount /dev/mtdblock3
Linux(debug)# umount /dev/sda3
Linux(debug)# umount /dev/sda4
Linux(debug)# umount /dev/sda5
Linux(debug)# umount /dev/sda6
Linux(debug)# umount /dev/sda7
umount: /dev/sda7: not mounted
Linux(debug)# umount /dev/sda8
Linux(debug)# umount /dev/sda9
umount: /dev/sda9: not found
Linux(debug)# e2fsck -y /dev/sda3
e2fsck 1.35 (28-Feb-2004)
/dev/sda3: clean, 1746/2125760 files, 1832918/4247184 blocks
Linux(debug)# e2fsck -y /dev/sda7
e2fsck 1.35 (28-Feb-2004)
Couldn't find ext2 superblock, trying backup blocks...
/dev/sda7 was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (24237, counted=21663).
Fix? yes

Free blocks count wrong for group #1 (32461, counted=32442).
Fix? yes

Free blocks count wrong for group #3 (32461, counted=32459).
Fix? yes

Free blocks count wrong for group #8 (30266, counted=24180).
Fix? yes

Free blocks count wrong for group #13 (32463, counted=32462).
Fix? yes

Free blocks count wrong for group #14 (32462, counted=32463).
Fix? yes

Free blocks count wrong for group #23 (32463, counted=32460).
Fix? yes

Free blocks count wrong for group #26 (32460, counted=32456).
Fix? yes

Free blocks count wrong for group #27 (32461, counted=32460).
Fix? yes

Free blocks count wrong for group #29 (32463, counted=32462).
Fix? yes

Free blocks count wrong (982140, counted=973450).
Fix? yes

Free inodes count wrong for group #0 (9684, counted=9682).
Fix? yes

Free inodes count wrong for group #1 (9696, counted=9693).
Fix? yes

Directories count wrong for group #1 (0, counted=1).
Fix? yes

Free inodes count wrong for group #3 (9696, counted=9694).
Fix? yes

Directories count wrong for group #3 (0, counted=2).
Fix? yes

Free inodes count wrong for group #8 (9689, counted=9687).
Fix? yes

Free inodes count wrong for group #13 (9696, counted=9695).
Fix? yes

Directories count wrong for group #13 (0, counted=1).
Fix? yes

Free inodes count wrong for group #14 (9695, counted=9696).
Fix? yes

Directories count wrong for group #14 (1, counted=0).
Fix? yes

Free inodes count wrong for group #23 (9696, counted=9692).
Fix? yes

Directories count wrong for group #23 (0, counted=1).
Fix? yes

Free inodes count wrong for group #26 (9693, counted=9689).
Fix? yes

Directories count wrong for group #26 (1, counted=2).
Fix? yes

Free inodes count wrong for group #27 (9696, counted=9695).
Fix? yes

Directories count wrong for group #27 (0, counted=1).
Fix? yes

Free inodes count wrong for group #29 (9696, counted=9694).
Fix? yes

Directories count wrong for group #29 (0, counted=1).
Fix? yes

Free inodes count wrong (300543, counted=300523).
Fix? yes


/dev/sda7: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda7: 53/300576 files (1.9% non-contiguous), 28596/1002046 blocks
Linux(debug)# e2fsck -n -f /dev/sda8
e2fsck 1.35 (28-Feb-2004)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sda8: 19/501952 files (0.0% non-contiguous), 25374/1002046 blocks
Linux(debug)# 
Linux(debug)# e2fsck -n -f /dev/sda9
e2fsck 1.35 (28-Feb-2004)
e2fsck: No such file or directory while trying to open /dev/sda9


The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

Linux(debug)# 
Linux(debug)# 
Linux(debug)# tune2fs -j /dev/sda3
tune2fs 1.35 (28-Feb-2004)
The filesystem already has a journal.
Linux(debug)# reboot

INIT: 
INIT: Sending processes the TERM signal

Linux(debug)# 
INIT: /isanboot/sbin/loadplugin: line 172:  1490 Hangup                  $AUTORUN

switch(boot)# Sending all processes the TERM signal... 


Sending all processes the KILL signal... 
Saving random seed:  
Syncing hardware clock to system time 
Unmounting file systems:  
mount: you must specify the filesystem type
mount: /var not mounted already, or bad option
Please stand by while rebooting the system...
[ 1131.078240] Restarting system.
[ 1131.114620] machine restart
[ 1131.147864] Resetting board (uc)

After two reboots the FI has it’s config

UCS1-B# show cluster state 
Cluster Id: 0x357b36b2b45611e1-0xbaac547fee935324

B: UP, SUBORDINATE
A: UP, PRIMARY

HA NOT READY
Waiting for response from device.
Device count, expected: 3, active: 2
Detailed state of the device selected for HA storage:
Chassis 1, serial: MYSERIALNO, state: inactive
Chassis 2, serial: MYSERIALNO, state: active
Chassis 3, serial: MYSERIALNO, state: active

Fabric B, Unable to connect to local chassis-shared-storage management interface
:
MYSERIALNO

Warning: there are pending management I/O errors on one or more devices, failove
r may not complete
UCS1-B# 2017 Oct 31 15:36:27 UCS1-B %$ VDC-1 %$ %SATCTRL-FEX1  -2-SATCTRL: IOM-1   Module 1: Cold boot

2017 Oct 31 15:36:34 UCS1-B %$ VDC-1 %$ %PFMA-2-FEX_STATUS: Fex 1 is online (Serial number )

2017 Oct 31 15:36:34 UCS1-B %$ VDC-1 %$ %NOHMS-2-NOHMS_ENV_FEX_ONLINE: FEX-1 On-line

2017 Oct 31 15:36:34 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: FEX_ONLINE

2017 Oct 31 15:36:34 UCS1-B %$ VDC-1 %$ %PFMA-2-FEX_STATUS: Fex 1 is online (Serial number )

2017 Oct 31 15:40:22 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1217 is down (Error disabled)  server 1/5, VHBA vHBA_0B 

2017 Oct 31 15:40:22 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1295 is down (Error disabled)  server 1/7, VHBA vHBA_0B 

2017 Oct 31 15:40:22 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1177 is down (Error disabled)  server 1/6, VHBA vHBA_0B 

2017 Oct 31 15:40:23 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1153 is down (Error disabled)  server 1/8, VHBA vHBA_0B 

2017 Oct 31 15:40:24 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1569 is down (Error disabled)  server 1/1, VHBA vHBA_0B 


UCS1-B#

At this point the TAC Engineer told me he fixed this case.

Whooot???? Does not look like everything is fine! Ok, the FI has his config, but there are still errors.

Nerverless I had to open a new TAC Case to get this errors fixed:

[FSM:STAGE:RETRY:]: external VM manager extension-key configuration on local fabric(FSM-STAGE:sam:dme:ExtvmmMasterExtKeyConfig:SetPeer) F16898

[FSM:STAGE:REMOTE-ERROR]: Result: service-unavailable Code: unspecified Message: Error syncing extension key(sam:dme:ExtvmmMasterExtKeyConfig:SetPeer) F78338

[FSM:STAGE:REMOTE-ERROR]: Result: service-unavailable Code: unspecified Message: Error syncing extension key(sam:dme:ExtvmmProviderConfig:SetPeer) F78319

[FSM:STAGE:RETRY:]: external VM manager configuration on peer fabric(FSM-STAGE:sam:dme:ExtvmmProviderConfig:SetPeer) F16879

[FSM:FAILED]: external VM manager extension-key configuration(FSM:sam:dme:ExtvmmMasterExtKeyConfig). Remote-Invocation-Error: Error syncing extension key F999938

[FSM:FAILED]: external VM manager configuration(FSM:sam:dme:ExtvmmProviderConfig). Remote-Invocation-Error: Error syncing extension key F999919

At this time I was very disappointed with the TAC Support.

 

To got further I found the Bug CSCvf27661

So I opened a new TAC Case to exchange those SSH-Keys.

First to see the error:

UCS1-A# scope system
UCS1-A /system # show managed-entity detail

Managed Entity:
    Fabric ID: A
    Leadership: Primary
    State: Up
    Umbilical State: Full
    HA Ready: Yes
    SSH Internal Root Pub Key Checksum: aLongChecksum1
    SSH Internal Root Pub Key Size: 225
    SSH Internal Auth Keys Checksum: aLongChecksum2
    SSH Internal Auth Keys Size: 225
    SSH Internal Keys Status: Matched

    Fabric ID: B
    Leadership: Subordinate
    State: Up
    Umbilical State: Full
    HA Ready: Yes
    SSH Internal Root Pub Key Checksum: aLongChecksum3
    SSH Internal Root Pub Key Size: 219
    SSH Internal Auth Keys Checksum: aLongChecksum1
    SSH Internal Auth Keys Size: 225
    SSH Internal Keys Status: Mismatched

Get the keys machting on both systems and edit it with vi 😉

Linux(debug)# cat /root/.ssh/id_rsa.pub
ssh-rsa AveryLongSSHKey1== root@(none)
Linux(debug)# cat /var/home/samdme/.ssh/authorized_keys
ssh-rsa AveryLongSSHKey2== root@(none)

I cant say how wired the TAC Engineer does this, it took more than one hour to edit this two keys. But he also did a mistake! Be sure your key ends with “root@(none)”.

In my case the “(none)” was missing in one key. I changed it after the TAC Session and get all errors gone.

 

Hope this helps you …

Leave a comment and share!

 

UPDATE:

Please also check CSCva31113. This explained my issue exactly.

One thought on “UCS Upgrade failed

  1. Hello Chris,

    I read your post. It is very helpful. I have similar problem on clients UCS FI 2248UP. Can you provide to me debug plugin ucs-dplug.5.2.3.N2.2.27c.gbin if you can. I will appreciate this.

    Best regards
    Bosko Kecman

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.