The Significance of Proactive Monitoring to Minimize Negative User Impact and Optimize User Satisfaction
July 10, 2015
Given the complexity of a UC deployment with its variety of software, servers, circuits, gateways, routers, switches, and devices, user-impacting issues are understandably unavoidable. When you consider the number of potential steps between a problem occurring and it ultimately being addressed and factor in human latency, the calculus of optimizing user experience by minimizing time to resolution can be daunting, to say the least. The ideal IT Pro environment eliminates as many steps between the issue and resolution and condenses the time between each remaining step. Proactive tools give IT Pros the best chance at success.
Hindsight may indeed by 20/20 and looking back at incident may eventually make the cause of a system failure obvious. Wouldn’t it be better, though, to close the barn door before the horses are out? Given the complexity of UC, I believe a mixed metaphor is forgivable.
Steps to Resolution
The theory of problem resolution is fairly straightforward: the problem is reported, the root cause is identified, and the issue is resolved. The reality is much more complex:
It is important to accept the reality that system failure is inevitable. There are too many moving parts to not have something go wrong occasionally. The key is to identify these failures, remove them as quickly as possible, and control the user impact.
Unless there are monitoring tools in place, the only way the IT Pro knows if the system fails is when users are impacted. The time between system failure and user impact is, for the most part, inconsequential. It is once users are impacted that IT is held accountable in the user’s mind. At this point, the key is how quickly an impacted user reports the failure so IT is aware they are being evaluated.
If the world were fair, the clock would start for IT once the issue was reported. Unfortunately, the reality is that the clock starts ticking once the first user is impacted. In an ideal scenario, when the first user is first impacted by the system failure, the issue is reported. The nightmare scenario is that users do not initially report the failure and the number of users impacted continues to increase and the overall dissatisfaction with the UC deployment begins to grow exponentially. That ideal scenario is rare; the nightmare all too common.
Only once the system failure has been reported can IT begin to investigate the issue. Much like a criminal investigation, the success of a UC failure investigation is dependent on the quickness of the report and the amount of information/evidence. If a user opens a ticket immediately after a bad call and can report on the time of the call, the parties involved, and which modalities were used, the IT pro can focus their search and stands a better chance of quickly identifying the root cause and remediating the situation. If the user waits a week and reports a series of poor calls with fuzzy details, the crime scene becomes much broader and the investigation much longer.
Unless you have data about the specific call that caused the issue to report the failure, IT must try to recreate the failure. The IT Pro must be watching the system when the failure surfaces in order to truly identify the root cause. If the failure is constant, this can be easy. If the failure was an isolated an incident, it will be impossible to recreate the situation. On the bright side, if the failure was an isolated incident, it likely will not happen again. The absolute worst case for the IT Pro is the intermittent issue. Trying to recreate an intermittent failure and correctly identify the root cause makes looking for a needle in a haystack seem like a cakewalk.
Root Cause Analysis
If the IT Pro is fortunate enough to recreate the user scenario and the user-experienced failure, the next step is to determine the root cause of the failure. Considering the variety of paths and components that a UC conversation passes through, it can be daunting to find the exact location and circumstances that created the failure. Often, root cause analysis is the longest part of the resolution process. It can also be the most frustrating because there is no satisfying progress to report to users or management that may be monitoring the situation.
Once the root cause has been accurately identified, the issue can finally be resolved. Fortunately, an accurate root cause identification makes the resolution fairly quick. If nothing else, it is generally predictable and the IT Pro tool can articulate the plan to resolution and report against progress.
Once the resolution has been completely implemented, the system functionality is restored. If only restoration were the end of the process. . .
Once the system restoration has been completed, the IT Pro can notify the impacted user(s) that the problem has been identified, the issue has been resolved, and functionality has been restored. User notification is an important step because it lets users know they can start trusting their communications platform again. It also reinforces to users that IT is responsive and capable of responding to and resolving user issues.
Though user notification is an important step in the process, it is not the most significant. It merely opens the door for the lynch pin of the process: user trust. Before users can begin to trust the UC system again, they must first acknowledge the issue has been resolved and they should not expect the same problem to reoccur.
There are a lot of variables that determine the time between user acknowledgement of issue resolution and user trust in the deployment. Time between user impact and acknowledgement is critical, but so is the user’s perception of the magnitude of the impact. The user’s trust of the system before the failure is another important consideration. Regardless, the ultimate goal of the IT Pro is to restore user satisfaction.
There are 11 steps from system failure to user satisfaction, with progression between each subject to human latency and the potential for human error. The mission of the IT Pro can indeed seem daunting. Is there anything that can bring hope? Fortunately, there is.
Be sure to check out our next blog post where we explore methods and tools to reduce the overall time to resolution by bypassing certain steps and reducing human latency in those that remain.