When I started in IT, the most glorified and revered person in IT was the firefighter. We all knew that one person who in the midst of audible alarms, smoking disk drives and the clamoring of users at the computer room door would sit Zenlike at the monitor. They would quickly plow through whatever the issue was and restore order and productivity to the kingdom of bits and bytes. Well, that King of IT is dead, long live the King.
The new King of IT is more akin to Smokey the Bear, doing everything to prevent the fire and its ensuing collateral damage. The new King uses a group of three primary tools and practices to keep an IT event from shutting down your business. The first tool is a repeatable, reliable process. It is the foundation of the three tools and methodologies in moving your support model from reactive to proactive. On top of a foundation of reliable processes you have an intuitive, proactive monitoring tool with analytics. Proactive monitoring doesn’t just report outages, it looks for trends and uses analytics to predict issues before they impact end users. Now you can use this information to feed the next tier – automation. Automation via scripting, CRON, schedulers or an agent-based monitoring tool that can run jobs will address that predictive event and implement a work around, often before it impacts end users. Let’s look at those three tiers in detail.
A reliable process is one that will effectively resolve an issue and can be repeated. The key to building this process is doing a thorough discovery of your environment to understand business drivers around systems and business processes so a reliable resolution process for an event can be developed that takes your business needs in to consideration.
One boss I had used to state that when someone did something they shouldn’t have or handled a situation wrong causing an outage, it was purely the lack of a reliable process. I agree with that about 90% of the time. Sometimes it is the lack of good documentation and a thorough process for the resolver to follow. In some cases, though, the processes are in place and the resolver, thinking, “I got this”, ignores the procedure and decides to handle it their way. So, on top of reliable processes you need a culture that encourages and rewards the use of the reliable processes. This can be a required checklist, performance metrics based on use and updating of the procedures, or some other program that encourages people to follow the processes.
One way to evaluate an MSP is to review their documentation. If all their documentation is based on fixing things, they are a reactive MSP. You want to see a healthy mix of processes that involve preventing issues, for example processes regarding how to react to an alert on table space issues at a lower threshold, not at 100%.
Proactive Monitoring Tool with Analytics
Think in terms of how useless a reactive monitoring tool is. You get an alert that a network switch is down, based on, for example, 15-minute polling intervals. At about 13 minutes after a failure 25 people have called the Service Desk and others are probably texting IT. Even at 5-minute polling, the end user knows of the issue before IT and that is never good. Proactive means to intuitively learn from events in the past, understand how they impact delivery and warn that a repeat event may occur shortly. Proactive also means looking at use trends and predicting spikes, forecasting usage and using that to manage your environments via automation. Proactive is a learning processes and doesn’t get installed smart. Proactive is based on best practices but gets most effective once it has learned your environment and uses analytics and AI to develop predictive scripting
Automation has many benefits: it is faster; it is more reliable in following an exact sequence of instructions; and it is cheaper. Automation doesn’t require your admins to stop working on strategic activities or development work to troubleshoot an issue for 2 hours, then implement a fix and validate it. Automation doesn’t skip a step if you have built it based on the reliable processes in your foundation. Lastly, automation is immediately responsive, and it doesn’t take 30 minutes to wake up at night and get on line.
Automation must be managed. If automation is fixing the same issue every night, you need to use the analytics of your monitoring tools and find out the root cause and fix it. You need to again have a reliable process in place to review automation logs and identify where you need systemic changes to prevent repetitive incidents.
1+1+1 is greater than 3
As you can see these three IT management components work together to maximize value and effectiveness. All three tiers depend on the others. They comprise a living management organism to provide reliable and consistent service to your end users and customers.
How do you use this information? If you are looking for an MSP, this is the mindset you want to see in their presentations and in the demos of their tools and processes. Look for the proactive, otherwise you will just be hiring someone else for the end users to be frustrated with when they wait while an issue is resolved after the fact. Find this delivery mindset with an MSP and you will be able to spend your time and money on building your business, not simply maintaining the status quo.