UCS Upgrade failed
The last days I tried to upgrade my UCS Domain from 2.2.7(c) to 2.2.8(g).
I have two UCS Domains. One of them went through the upgrade fine, the other one not :/
See what happend and how we fixed it …
I went through the steps I described in
STUMBLING BLOCKS IN UPGRADING CISCO UCS (PART 1OF2)
STUMBLING BLOCKS IN UPGRADING CISCO UCS (PART 2OF2)
In short form:
- Upgrade UCS Manager
- Upgrade IO Modules of the Chassis
- Upgrade Fabric Interconnect
After the reboot the upgraded Fabric Interconnect came up in Setup / Config Mode. This means there were no active config after the reboot.
WARNING: Please do the following with a Cisco TAC Engineer or at your own risk!
First we tried to re-connect to the running FI and get his config:
Type the hot key to suspend the connection: <CTRL>Q Enter the configuration method. (console/gui) ? console Installer has detected the presence of a peer Fabric interconnect. This Fabric interconnect will be added to the cluster. Continue (y/n) ? y Enter the admin password of the peer Fabric interconnect: Connecting to peer Fabric interconnect... unable to connect! Password could be wrong. Please ensure that the authentication mode on peer Fabric interconnect is set to 'Local' Hit enter to try again or type 'restart' to start setup from beginning... ? Connecting to peer Fabric interconnect... done Retrieving config from peer Fabric interconnect... done /isan/bin/getversion: error while loading shared libraries: libosiris.so: cannot open shared object file: No such file or directory Installer has determined that the peer Fabric Interconnect is running a different firmware version than the local Fabric. Cannot join cluster. Local Fabric Interconnect UCSM version : Kernel version : System version : local_model_no : 6248 Peer Fabric Interconnect UCSM version : 2.2(8g) Kernel version : 5.2(3)N2(2.27c) System version : 5.2(3)N2(2.27c) peer_model_no : 6248 Do you wish to update firmware on this Fabric Interconnect to the Peer's version? (y/n): y Updating firmware of Fabric Interconnect....... [ Please don't press Ctrl+c while updating firmware ] Updating images Please wait for firmware update to complete.... Checking the Compatibility of new Firmware..... [ Please don't Press ctrl+c ]. Verifying image bootflash:/installables/switch/ucs-6100-k9-kickstart.5.2.3.N2.2.27c.bin for boot variable "kickstart". [# ] 0%[####################] 100% -- SUCCESS Verifying image bootflash:/installables/switch/ucs-6100-k9-system.5.2.3.N2.2.27c.bin for boot variable "system". [# ] 0%[####################] 100% -- SUCCESS Verifying image type. [# ] 0%[##### ] 20%[####### ] 30%[######### ] 40%[########### ] 50%[########### ] 50%[########### ] 50%[################### ] 90%[####################] 100%[####################] 100% -- SUCCESS Extracting "system" version from image bootflash:/installables/switch/ucs-6100-k9-system.5.2.3.N2.2.27c.bin. [# ] 0%[####################] 100% -- SUCCESS Extracting "kickstart" version from image bootflash:/installables/switch/ucs-6100-k9-kickstart.5.2.3.N2.2.27c.bin. [# ] 0%[####################] 100% -- SUCCESS Extracting "bios" version from image bootflash:/installables/switch/ucs-6100-k9-system.5.2.3.N2.2.27c.bin. [# ] 0%[####################] 100% -- SUCCESS Performing module support checks. [####################] 100% -- SUCCESS Notifying services about system upgrade. [####################] 100% -- SUCCESS Compatibility check is done: Module bootable Impact Install-type Reason ------ -------- -------------- ------------ ------ 1 yes disruptive reset Incompatible image Images will be upgraded according to following table: Module Image Running-Version New-Version Upg-Required ------ ---------- ---------------------- ---------------------- ------------ 1 system 5.2(3)N2(2.28g) 5.2(3)N2(2.27c) yes 1 kickstart 5.2(3)N2(2.28g) 5.2(3)N2(2.27c) yes 1 bios v3.6.0(05/09/2012) v3.6.0(05/09/2012) no 1 SFP-uC v1.1.0.0 v1.0.0.0 no 1 power-seq v3.0 v3.0 no 3 power-seq v2.0 v2.0 no 1 uC v1.2.0.1 v1.2.0.1 no Switch will be reloaded for disruptive upgrade. Install is in progress, please wait. Performing runtime checks. [####################] 100% -- SUCCESS Setting boot variables. [# ] 0%[####################] 100% -- SUCCESS Performing configuration copy. [# ] 0%[### ] 10%[#### ] 15%[##### ] 20%[###### ] 25%[####### ] 30%[######## ] 35%[######### ] 40%[########## ] 45%[########### ] 50%[############# ] 60%[############## ] 65%[############### ] 70%[################ ] 75%[################# ] 80%[################## ] 85%[################### ] 90%[####################] 95%[####################] 100%[####################] 100% -- SUCCESS Converting startup config. [# ] 0%[####################] 100% -- SUCCESS Install has been successful. Firmware Updation Successfully Completed. Please wait to enter the IP address Type 'reboot' to abort configuration and reboot system or hit enter to continue. (reboot/<CR>) ? Peer Fabric interconnect Mgmt0 IPv4 Address: 10.0.0.1 Peer Fabric interconnect Mgmt0 IPv4 Netmask: 255.255.255.0 Cluster IPv4 address : 10.0.0.3 Peer FI is IPv4 Cluster enabled. Please Provide Local Fabric Interconnect Mgmt0 IPv4 Address Physical Switch Mgmt0 IP address : Mgmt0 IP must be specified Physical Switch Mgmt0 IP address : 10.0.0.2 Apply and save the configuration (select 'no' if you want to re-enter)? (yes/no): Type 'reboot' to abort configuration and reboot system or hit enter to continue. (reboot/<CR>) ? Apply and save the configuration (select 'no' if you want to re-enter)? (yes/no): yes Applying configuration. Please wait. Tue Oct 31 11:59:53 UTC 2017 Type 'reboot' to abort configuration and reboot system or hit enter to continue. (reboot/<CR>) ? Configuration file - Ok 2017 Oct 31 12:00:10 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256 2017 Oct 31 12:00:10 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256 2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256 2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256 2017 Oct 31 12:00:11 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:00:12 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: dhcpd crashed with crash type:256 2017 Oct 31 12:00:12 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:00:12 UCS1-B %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: dhcpd - pmon User Access Verification UCS1-B login: admin Password: Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Copyright (c) 2002-2017, Cisco Systems, Inc. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under license. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or the GNU Lesser General Public License (LGPL) Version 2.1. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://www.opensource.org/licenses/lgpl-2.1.php
At this time the FI does not accept any commands, so we did a “hard” reboot. Again, there was no config on the FI.
N5000 BIOS v.3.6.0, Wed 05/09/2012, 03:15 PM 989CB4B4B4B4B4B4B4B4B4B49999999299A0A2A3A0A2A3B2 B2Version 2.00.1201. Copyright (C) 2009 American Megatrends, Inc. Booting kickstart image: bootflash:/installables/switch/ucs-6100-k9-kickstart.5 .2.3.N2.2.27c.bin.... ............................................................................... ...........................................Image verification OK Usage: init 0123456SsQqAaBbCcUu INIT: [ 10.657597] I2C - Mezz absent Starting system POST..... Executing Mod 1 1 SEEPROM Test:...done (0 seconds) Executing Mod 1 1 GigE Port Test:....done (32 seconds) Executing Mod 1 1 PCIE Test:.................done (0 seconds) Mod 1 1 Post Completed Successfully POST is completed can't create lock file /var/lock/mtab~207: No such file or directory (use -n flag to override) S10mount-ramfs.supnuovaca Mounting /isan 3000m Mounted /isan Creating /callhome.. Mounting /callhome.. Creating /callhome done. Callhome spool file system init done. nohup: redirecting stderr to stdout autoneg unmodified, ignoring autoneg unmodified, ignoring Checking all filesystems..r.r.r. done. Checking NVRAM block device ... done The startup-config won't be used until the next reboot. . Loading system software Uncompressing system image: bootflash:/installables/switch/ucs-6100-k9-system.5.2.3.N2.2.27c.bin Loading plugin 0: core_plugin... Loading plugin 1: eth_plugin... Loading plugin 2: fc_plugin... 13+1 records in 13+1 records out 10240 bytes (10 kB) copied, 5.7017e-05 s, 180 MB/s ethernet end-host mode on CA FC end-host mode on CA n_port virtualizer mode. --------------------------------------------------------------- INIT: Entering runlevel: 3 touch: cannot touch `/var/lock/subsys/netfs': No such file or directory /isan/bin/muxif_config: fex vlan id: -f,4042 Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config Added VLAN with VID == 4042 to IF -:muxif:- cp: cannot stat `/isan/plugin_img/fex.bin': No such file or directory --------------------- enabled fc feature --------------------- 2017 Oct 31 12:09:31 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: CLIS: loading cmd files begin - clis 2017 Oct 31 12:09:34 %$ VDC-1 %$ Oct 31 12:09:34 %KERN-0-SYSTEM_MSG: [ 10.657597] I2C - Mezz absent - kernel 2017 Oct 31 12:09:41 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: CLIS: loading cmd files end - clis 2017 Oct 31 12:09:41 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: CLIS: init begin - clis 2017 Oct 31 12:09:49 %$ VDC-1 %$ %SNMPD-2-CRITICAL: SNMP log critical : load_mib_module :Error, while loading the mib module /isan/lib/libsvc_sam_extSnmpPlugin.so (/isan/lib/libsvc_sam_extSnmpPlugin.so: cannot open shared object file: No such file or directory) 2017 Oct 31 12:09:54 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_controller crashed with crash type:32512 2017 Oct 31 12:09:54 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:54 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dme crashed with crash type:32512 2017 Oct 31 12:09:54 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:54 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dcosAG crashed with crash type:32512 2017 Oct 31 12:09:54 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:55 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_bladeAG crashed with crash type:32512 2017 Oct 31 12:09:55 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:55 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:32512 2017 Oct 31 12:09:55 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:55 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_hostagentAG crashed with crash type:32512 2017 Oct 31 12:09:55 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:56 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_nicAG crashed with crash type:32512 2017 Oct 31 12:09:56 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:56 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:32512 2017 Oct 31 12:09:56 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:56 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_cliD crashed with crash type:32256 2017 Oct 31 12:09:56 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:56 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_pamProxy crashed with crash type:32512 2017 Oct 31 12:09:56 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:57 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_controller crashed with crash type:32512 2017 Oct 31 12:09:57 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:57 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dme crashed with crash type:32512 2017 Oct 31 12:09:57 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:57 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dcosAG crashed with crash type:32512 2017 Oct 31 12:09:57 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:58 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_bladeAG crashed with crash type:32512 2017 Oct 31 12:09:58 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:58 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:32512 2017 Oct 31 12:09:58 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:09:59 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_hostagentAG crashed with crash type:32512 2017 Oct 31 12:09:59 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:00 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_nicAG crashed with crash type:32512 2017 Oct 31 12:10:00 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:00 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:32512 2017 Oct 31 12:10:00 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:01 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_cliD crashed with crash type:32256 2017 Oct 31 12:10:01 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:01 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_pamProxy crashed with crash type:32512 2017 Oct 31 12:10:01 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:02 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_controller crashed with crash type:32512 2017 Oct 31 12:10:02 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:03 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dme crashed with crash type:32512 2017 Oct 31 12:10:03 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:03 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dcosAG crashed with crash type:32512 2017 Oct 31 12:10:03 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:04 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_bladeAG crashed with crash type:32512 2017 Oct 31 12:10:04 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:04 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:32512 2017 Oct 31 12:10:04 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:04 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_hostagentAG crashed with crash type:32512 2017 Oct 31 12:10:04 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:05 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_nicAG crashed with crash type:32512 2017 Oct 31 12:10:05 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:05 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:32512 2017 Oct 31 12:10:05 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:05 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_cliD crashed with crash type:32256 2017 Oct 31 12:10:05 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:06 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_pamProxy crashed with crash type:32512 2017 Oct 31 12:10:06 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:06 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_controller crashed with crash type:32512 2017 Oct 31 12:10:06 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:06 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_controller - pmon 2017 Oct 31 12:10:06 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dme crashed with crash type:32512 2017 Oct 31 12:10:06 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:06 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_dme - pmon 2017 Oct 31 12:10:07 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dcosAG crashed with crash type:32512 2017 Oct 31 12:10:07 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:07 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_dcosAG - pmon 2017 Oct 31 12:10:07 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_bladeAG crashed with crash type:32512 2017 Oct 31 12:10:07 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:07 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_bladeAG - pmon 2017 Oct 31 12:10:07 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:32512 2017 Oct 31 12:10:07 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:07 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_portAG - pmon 2017 Oct 31 12:10:08 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_hostagentAG crashed with crash type:32512 2017 Oct 31 12:10:08 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:08 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_hostagentAG - pmon 2017 Oct 31 12:10:08 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_nicAG crashed with crash type:32512 2017 Oct 31 12:10:08 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:08 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_nicAG - pmon 2017 Oct 31 12:10:08 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:32512 2017 Oct 31 12:10:08 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:08 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_extvmmAG - pmon 2017 Oct 31 12:10:09 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_cliD crashed with crash type:32256 2017 Oct 31 12:10:09 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:09 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_cliD - pmon 2017 Oct 31 12:10:09 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_pamProxy crashed with crash type:32512 2017 Oct 31 12:10:09 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH 2017 Oct 31 12:10:09 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Restart count exhausted for process: svc_sam_pamProxy - pmon 2017 Oct 31 12:10:27 %$ VDC-1 %$ %CALLHOME-2-EVENT: httpd.sh crashed with crash type:0 2017 Oct 31 12:10:27 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH System is coming up ... Please wait ... System is coming up ... Please wait ... System is coming up ... Please wait ... 2017 Oct 31 12:10:46 %$ VDC-1 %$ %VDC_MGR-2-VDC_ONLINE: vdc 1 has come online System is coming up ... Please wait ... nohup: appending output to `nohup.out' 2017 Oct 31 12:11:01 switch %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Running in PIO stats mode - carmelusd 2017 Oct 31 12:11:01 switch %$ VDC-1 %$ %CALLHOME-2-EVENT: httpd.sh crashed with crash type:0 2017 Oct 31 12:11:01 switch %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH ---- Basic System Configuration Dialog ---- This setup utility will guide you through the basic configuration of the system. Only minimal configuration including IP connectivity to the Fabric interconnect and its clustering mode is performed through these steps. Type Ctrl-C at any time to abort configuration and reboot system. To back track or make modifications to already entered values, complete input till end of section and answer no when prompted to apply configuration. Enter the configuration method. (console/gui) ? Type 'reboot' to abort configuration and reboot system or hit enter to continue. (reboot/<CR>) ? 2017 Oct 31 12:11:34 switch %$ VDC-1 %$ %CALLHOME-2-EVENT: httpd.sh crashed with crash type:0 2017 Oct 31 12:11:34 switch %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH
We had to fix the boot partitions. During the boot process you have to constantly press “CRTL”+ “L”
N5000 BIOS v.3.6.0, Wed 05/09/2012, 03:15 PM 989CB4B4B4B4B4B4B4B4B4B49999999299A0A2A3A0A2A3B2 B2Version 2.00.1201. Copyright (C) 2009 American Megatrends, Inc. User break into bootloader loader> dir bootflash: span.log ucs-6100-k9-kickstart.5.0.3.N2.2.02q.bin ucs-6100-k9-system.5.0.3.N2.2.02q.bin chassis.img pnuos nuova-sim-mgmt-nsg.0.1.0.001.bin chassis2.img fexth.bin installables sysdebug distributables_hdr cores techsupport mts.log vdc_2 vdc_3 vdc_4 distributables initial_setup.log license received
Now we loaded the bootflash:
loader> boot bootflash:/installables/switch/ucs-6100-k9-kickstart.5.2.3.N2.2 <.2.3.N2.2. 27c.bin Booting kickstart image: bootflash:/installables/switch/ucs-6100-k9-kickstart.5 .2.3.N2.2.27c.bin.... ............................................................................... ...........................................Image verification OK Usage: init 0123456SsQqAaBbCcUu INIT: [ 10.668160] I2C - Mezz absent Starting system POST..... Executing Mod 1 1 SEEPROM Test:...done (0 seconds) Executing Mod 1 1 GigE Port Test:....done (32 seconds) Executing Mod 1 1 PCIE Test:.................done (0 seconds) Mod 1 1 Post Completed Successfully POST is completed can't create lock file /var/lock/mtab~207: No such file or directory (use -n flag to override) S10mount-ramfs.supnuovaca Mounting /isan 3000m Mounted /isan Creating /callhome.. Mounting /callhome.. Creating /callhome done. Callhome spool file system init done. nohup: redirecting stderr to stdout autoneg unmodified, ignoring autoneg unmodified, ignoring Checking all filesystems..... done. Checking NVRAM block device ... done The startup-config won't be used until the next reboot. . Loading system software No system image Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Copyright (c) 2002-2016, Cisco Systems, Inc. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under license. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or the GNU Lesser General Public License (LGPL) Version 2.1. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://www.opensource.org/licenses/lgpl-2.1.php
At this time we had to configure the IP of the FI:
switch(boot)# conf terminal Enter configuration commands, one per line. End with CNTL/Z. switch(boot)(config)# interface mgmt 0 switch(boot)(config-if)# ip address 10.0.0.2 255.255.255.0 switch(boot)(config-if)# no shutdown switch(boot)(config-if)# exit switch(boot)(config)# ip default-gateway 10.0.0.254 switch(boot)(config)# exit
To get the Debug Plugin on the switch, you have to get it from a TFTP Server
switch(boot)# copy tftp://[my-tftp-ip]/ucs-dplug.5.2.3.N2.2.27c.gbin workspace:debuug_plugin/ucs-dplug.5.2.3.N2.2.27c.gbin Trying to connect to tftp server...... Connection to server Established. Copying Started..... |/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\ TFTP get operation was successful Copy complete, now saving to disk (please wait)... switch(boot)#
In the following step we unmounted the Filesystems and repaired them:
witch(boot)# copy workspace:debug_plugin/ucs-dplug.5.2.3.N2.2.27c.gbin xy Copy complete, now saving to disk (please wait)... switch(boot)# load xy Loading plugin version 5.2(3)N2(2.27c) ############################################################### Warning: debug-plugin is for engineering internal use only! For security reason, plugin image has been deleted. ############################################################### Successfully loaded debug-plugin!!! Linux(debug)# umount /dev/mtdblock2 Linux(debug)# umount /dev/mtdblock3 Linux(debug)# umount /dev/sda3 Linux(debug)# umount /dev/sda4 Linux(debug)# umount /dev/sda5 Linux(debug)# umount /dev/sda6 Linux(debug)# umount /dev/sda7 umount: /dev/sda7: not mounted Linux(debug)# umount /dev/sda8 Linux(debug)# umount /dev/sda9 umount: /dev/sda9: not found Linux(debug)# e2fsck -y /dev/sda3 e2fsck 1.35 (28-Feb-2004) /dev/sda3: clean, 1746/2125760 files, 1832918/4247184 blocks Linux(debug)# e2fsck -y /dev/sda7 e2fsck 1.35 (28-Feb-2004) Couldn't find ext2 superblock, trying backup blocks... /dev/sda7 was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (24237, counted=21663). Fix? yes Free blocks count wrong for group #1 (32461, counted=32442). Fix? yes Free blocks count wrong for group #3 (32461, counted=32459). Fix? yes Free blocks count wrong for group #8 (30266, counted=24180). Fix? yes Free blocks count wrong for group #13 (32463, counted=32462). Fix? yes Free blocks count wrong for group #14 (32462, counted=32463). Fix? yes Free blocks count wrong for group #23 (32463, counted=32460). Fix? yes Free blocks count wrong for group #26 (32460, counted=32456). Fix? yes Free blocks count wrong for group #27 (32461, counted=32460). Fix? yes Free blocks count wrong for group #29 (32463, counted=32462). Fix? yes Free blocks count wrong (982140, counted=973450). Fix? yes Free inodes count wrong for group #0 (9684, counted=9682). Fix? yes Free inodes count wrong for group #1 (9696, counted=9693). Fix? yes Directories count wrong for group #1 (0, counted=1). Fix? yes Free inodes count wrong for group #3 (9696, counted=9694). Fix? yes Directories count wrong for group #3 (0, counted=2). Fix? yes Free inodes count wrong for group #8 (9689, counted=9687). Fix? yes Free inodes count wrong for group #13 (9696, counted=9695). Fix? yes Directories count wrong for group #13 (0, counted=1). Fix? yes Free inodes count wrong for group #14 (9695, counted=9696). Fix? yes Directories count wrong for group #14 (1, counted=0). Fix? yes Free inodes count wrong for group #23 (9696, counted=9692). Fix? yes Directories count wrong for group #23 (0, counted=1). Fix? yes Free inodes count wrong for group #26 (9693, counted=9689). Fix? yes Directories count wrong for group #26 (1, counted=2). Fix? yes Free inodes count wrong for group #27 (9696, counted=9695). Fix? yes Directories count wrong for group #27 (0, counted=1). Fix? yes Free inodes count wrong for group #29 (9696, counted=9694). Fix? yes Directories count wrong for group #29 (0, counted=1). Fix? yes Free inodes count wrong (300543, counted=300523). Fix? yes /dev/sda7: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sda7: 53/300576 files (1.9% non-contiguous), 28596/1002046 blocks Linux(debug)# e2fsck -n -f /dev/sda8 e2fsck 1.35 (28-Feb-2004) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/sda8: 19/501952 files (0.0% non-contiguous), 25374/1002046 blocks Linux(debug)# Linux(debug)# e2fsck -n -f /dev/sda9 e2fsck 1.35 (28-Feb-2004) e2fsck: No such file or directory while trying to open /dev/sda9 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> Linux(debug)# Linux(debug)# Linux(debug)# tune2fs -j /dev/sda3 tune2fs 1.35 (28-Feb-2004) The filesystem already has a journal. Linux(debug)# reboot INIT: INIT: Sending processes the TERM signal Linux(debug)# INIT: /isanboot/sbin/loadplugin: line 172: 1490 Hangup $AUTORUN switch(boot)# Sending all processes the TERM signal... Sending all processes the KILL signal... Saving random seed: Syncing hardware clock to system time Unmounting file systems: mount: you must specify the filesystem type mount: /var not mounted already, or bad option Please stand by while rebooting the system... [ 1131.078240] Restarting system. [ 1131.114620] machine restart [ 1131.147864] Resetting board (uc)
After two reboots the FI has it’s config
UCS1-B# show cluster state Cluster Id: 0x357b36b2b45611e1-0xbaac547fee935324 B: UP, SUBORDINATE A: UP, PRIMARY HA NOT READY Waiting for response from device. Device count, expected: 3, active: 2 Detailed state of the device selected for HA storage: Chassis 1, serial: MYSERIALNO, state: inactive Chassis 2, serial: MYSERIALNO, state: active Chassis 3, serial: MYSERIALNO, state: active Fabric B, Unable to connect to local chassis-shared-storage management interface : MYSERIALNO Warning: there are pending management I/O errors on one or more devices, failove r may not complete UCS1-B# 2017 Oct 31 15:36:27 UCS1-B %$ VDC-1 %$ %SATCTRL-FEX1 -2-SATCTRL: IOM-1 Module 1: Cold boot 2017 Oct 31 15:36:34 UCS1-B %$ VDC-1 %$ %PFMA-2-FEX_STATUS: Fex 1 is online (Serial number ) 2017 Oct 31 15:36:34 UCS1-B %$ VDC-1 %$ %NOHMS-2-NOHMS_ENV_FEX_ONLINE: FEX-1 On-line 2017 Oct 31 15:36:34 UCS1-B %$ VDC-1 %$ %CALLHOME-2-EVENT: FEX_ONLINE 2017 Oct 31 15:36:34 UCS1-B %$ VDC-1 %$ %PFMA-2-FEX_STATUS: Fex 1 is online (Serial number ) 2017 Oct 31 15:40:22 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1217 is down (Error disabled) server 1/5, VHBA vHBA_0B 2017 Oct 31 15:40:22 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1295 is down (Error disabled) server 1/7, VHBA vHBA_0B 2017 Oct 31 15:40:22 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1177 is down (Error disabled) server 1/6, VHBA vHBA_0B 2017 Oct 31 15:40:23 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1153 is down (Error disabled) server 1/8, VHBA vHBA_0B 2017 Oct 31 15:40:24 UCS1-B %$ VDC-1 %$ %PORT-2-IF_DOWN_ERROR_DISABLED: %$VSAN 3370%$ Interface vfc1569 is down (Error disabled) server 1/1, VHBA vHBA_0B UCS1-B#
At this point the TAC Engineer told me he fixed this case.
Whooot???? Does not look like everything is fine! Ok, the FI has his config, but there are still errors.
Nerverless I had to open a new TAC Case to get this errors fixed:
[FSM:STAGE:RETRY:]: external VM manager extension-key configuration on local fabric(FSM-STAGE:sam:dme:ExtvmmMasterExtKeyConfig:SetPeer) F16898
[FSM:STAGE:REMOTE-ERROR]: Result: service-unavailable Code: unspecified Message: Error syncing extension key(sam:dme:ExtvmmMasterExtKeyConfig:SetPeer) F78338
[FSM:STAGE:REMOTE-ERROR]: Result: service-unavailable Code: unspecified Message: Error syncing extension key(sam:dme:ExtvmmProviderConfig:SetPeer) F78319
[FSM:STAGE:RETRY:]: external VM manager configuration on peer fabric(FSM-STAGE:sam:dme:ExtvmmProviderConfig:SetPeer) F16879
[FSM:FAILED]: external VM manager extension-key configuration(FSM:sam:dme:ExtvmmMasterExtKeyConfig). Remote-Invocation-Error: Error syncing extension key F999938
[FSM:FAILED]: external VM manager configuration(FSM:sam:dme:ExtvmmProviderConfig). Remote-Invocation-Error: Error syncing extension key F999919
At this time I was very disappointed with the TAC Support.
To got further I found the Bug CSCvf27661
So I opened a new TAC Case to exchange those SSH-Keys.
First to see the error:
UCS1-A# scope system UCS1-A /system # show managed-entity detail Managed Entity: Fabric ID: A Leadership: Primary State: Up Umbilical State: Full HA Ready: Yes SSH Internal Root Pub Key Checksum: aLongChecksum1 SSH Internal Root Pub Key Size: 225 SSH Internal Auth Keys Checksum: aLongChecksum2 SSH Internal Auth Keys Size: 225 SSH Internal Keys Status: Matched Fabric ID: B Leadership: Subordinate State: Up Umbilical State: Full HA Ready: Yes SSH Internal Root Pub Key Checksum: aLongChecksum3 SSH Internal Root Pub Key Size: 219 SSH Internal Auth Keys Checksum: aLongChecksum1 SSH Internal Auth Keys Size: 225 SSH Internal Keys Status: Mismatched
Get the keys machting on both systems and edit it with vi 😉
Linux(debug)# cat /root/.ssh/id_rsa.pub ssh-rsa AveryLongSSHKey1== root@(none) Linux(debug)# cat /var/home/samdme/.ssh/authorized_keys ssh-rsa AveryLongSSHKey2== root@(none)
I cant say how wired the TAC Engineer does this, it took more than one hour to edit this two keys. But he also did a mistake! Be sure your key ends with “root@(none)”.
In my case the “(none)” was missing in one key. I changed it after the TAC Session and get all errors gone.
Hope this helps you …
Leave a comment and share!
UPDATE:
Please also check CSCva31113. This explained my issue exactly.
One thought on “UCS Upgrade failed”
Hello Chris,
I read your post. It is very helpful. I have similar problem on clients UCS FI 2248UP. Can you provide to me debug plugin ucs-dplug.5.2.3.N2.2.27c.gbin if you can. I will appreciate this.
Best regards
Bosko Kecman