![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
GroundWork Open Source Solutions, Inc.
2200 Powell Street Ste. 350 Emeryville CA 94608 ·
Email: [email protected] | Tel: (866) 899-4342
![]() Monitoring System Design PrinciplesMonitoring tools are critical to quickly identify and address problems that affect an IT organization’s service to its users. These tools scan for problems with network, systems, and application resources that a company relies on for its business. However, IT managers face significant challenges deploying and managing these systems to their maximum benefit. Key issues include: 1) high costs for commercial software licenses and maintenance fees; 2) waste from unused “shelf ware” and/or multiple tools with redundant functionality; 3) dependence on expensive consultants for installation and support; 4) lengthy and expensive system deployment, and 5) high resource cost associated with system management, administration, and support. Moreover, monitoring systems frequently are not configured to produce critically needed input to the IT business processes used for management of service levels, availability, capacity, incidents, problems, and operations. Well designed monitoring systems, those integrated with standard IT management practices, are key to achieving world class IT service effectiveness and are a major driver of corporate productivity and competitiveness. The design principles described below can be incorporated into a system that is simple and inexpensive to deploy, manage, and operate. The Hierarchy of Monitoring FunctionsMost operational challenges associated with monitoring are associated with deploying more monitors and features than are necessary to achieve effective systems management. Failure to methodically design the monitoring process creates systems that are expensive and unwieldy to deploy, manage and maintain. Figure 1. Hierarchy of Monitoring Functions |
Key Infrastructure Issue |
Implication for Monitoring System |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Monitoring system deployment should be conservative and simple. The goal is to use as few monitors as possible to minimize complexity and cost, while delivering maximum IT management and business value to the organization. This suggests an incremental approach of deploying the most important monitors first, and configuring and deploying them to take advantage of their full reporting, analysis, and dashboard and other capabilities.
Installing incrementally allows the discoveries, experience and early wins associated with the first deployment to drive the design choices made in later deployments. Results from an initial deployment often surface unanticipated issues requiring attention in later deployment while satisfying other issues in unexpected ways. For example, availability monitors deployed in an initial project will provide a baseline of metrics that help determine which additional monitors would provide the most business benefit. Installing the highest impact functions first will also provide information that accelerates integration of the system’s output data into the organization’s business processes.
The resulting system should make use of all data derived from each of the monitors. Monitors left unconfigured, or those whose output is ignored, typically generate expensive and distracting false alarms. Configuration options should include group assignments, filter assignments, threshold values, retry interval and limits, dependencies, scheduled maintenance, performance data, schedules, notification media choices, and the logical structure and schedule of escalation.
The relative utility of availability statistics illustrates the importance of a complete configuration. If the dependencies between network component availability and the collection of monitoring data are not correctly configured and maintained, the resulting availability statistics will be meaningless. Component outages will be recorded not because the components are inoperative, but because they are not available to be monitored.
As with most computer systems, the successful design of monitoring systems is highly dependent on the recognition that human operators can determine the success or failure of a project. Here are a few important factors to consider:
Alarm Fatigue. Getting paged in the middle of the night to respond to false alarms is wearing and reduces responsiveness to real problems. There are a number of ways to minimize the critical alarms requiring employees to be paged during and outside of business hours. These include: 1) automatic alarm verification; 2) automatic restarts (event handlers); 3) priority-based notifications; 4) escalation intervals adjusted for time of day; 5) “follow-the-sun” techniques, and 6) disabling of alarm channels during planned maintenance. Alarms should be enabled selectively: not all monitoring channels require alarms, and many should result in email-only notifications.
Calibration Approach. The set points for many alarms are affected by network propagation rates and typical variations of these rates. Set points should be determined by setting the alarms high and calibrating down rather than the other way around. This will allow the system to detect hard failures while avoiding nuisance alarms. This is particularly important during the initial days of installation.
Web Access. The monitoring system must permit responding personnel to have remote web access to minimize inconvenience associated with after hours support.
NOC Outsourcing. A well-designed monitoring system which generates a lot of real alarm traffic can exhaust a busy staff during periods of rapid change or significant infrastructure expansion. Consider keeping staff fresher during these times by outsourcing non-business hour level one alarm response to a managed service provider (MSP). Alarm response can be brought back in-house when system stability and associated costs allow.
Change Management. Manage the change in expectations for IT employee performance during the design process with good project management techniques.
Information technology organizations, working with experienced architects to apply sound principles to the design and deployment of monitoring systems, achieve significantly greater business benefits from monitoring, including higher application availability, higher employee productivity, and lower IT capital and operating costs. Whether you are using expensive monitoring software or open source software tools, the system’s value will be determined by the quality of the design and implementation plan, and the degree to which your organization’s business processes use the output – not the expense or features of the software purchased.