Have you ever found yourself seeing an error message and not knowing exactly what’s happening? If you work in IT it likely happens weekly. Most of the time we just try to figure out what’s causing the error and correct it enough to get on with our day. What about the people that need to understand the underlying issue and get a permanent fix done?
Christopher Hart works on the Cisco TAC helpdesk and he had a very interesting problem that needed to be solved. The original issue related to an error that he was seeing when he tried to save a configuration file to the startup disk. He thought it might be related to hardware failure and started to investigate. However, the deeper he dug into the problem the more interesting it became.
In his write up, you can learn a lot more about how the system stores files and writes in-memory configuration files to persistent storage. And also how programming interacts with users in a fascinating way:
Next, I applied almost all of the configuration from one of the customer’s switches. I sanitized certain bits and pieces of the configuration (such as user accounts, mgmt0 interface configuration, etc.) but left mostly everything else. Sanitizing very little of the configuration is important because if some element of configuration is needed to reproduce the issue, then we need to leave as much of it as intact as possible. We don’t currently know what specific part of the configuration is important. If we already knew this, we wouldn’t need to reproduce this issue in the first place – therefore, we need to leave as much of the configuration intact as possible so that we don’t accidentally ruin our chances of reproducing the issue.