| | June 20179CIOReviewFrom Good to Great: The Path to Improved System and Application Uptime However, IT organizations are often hampered by a lack of end-to-end visibility of their production environmentsespecially when they span different data centers, vendors, and even internal IT teams. Case in point: In the survey referenced above, nearly three-quarters of respondents cited using more than 10 monitoring tools encompassing Network Performance Management (NPM), Application Performance Management (APM) and log data analytics to discover issues. An unwieldy number of silo-specific tools inhibit service triage activities. The modern computing environment needs centralized monitoring that provides a "single pane of glass" view spanning applications, infrastructure, and security. Fortunately, today many application monitoring tools exist, e.g., AppDynamics, that enable you to integrate infrastructure monitoring tools into a unified application monitoring dashboard, providing this integrated, single pane of glass view.The Role of Automation and Machine Learning The triage-to-fix process in this new world far exceeds human efficiency. Therefore, the key to meeting 99.9 percent uptime is to reduce reliance on humans and leverage machine learning to support higher resiliency in infrastructure and sophisticated monitoring and alerting to identify and mitigate issues precipitating downtime before they happen. Monitoring tools generate an extensive amount of databut how does IT separate what's really critical and what's just a false alarm? This is where today's leading-edge technologies, such as automation and machine learning, can serve to improve system performance and drive down costs. Today it's not uncommon to have hundreds or thousands of systems and applications generating millions of log events per day. Today Big Data analytics approaches can detect anomalies and exceptions and isolate data that deviates from the model. This automation allows organizations to scale operations with fewer human resources creates a sensing, responsive, autonomic fabric that proactively detects performance anomalies, and saves troubleshooting time with increased guidance to probable root causes.In today's business environment, downtime is lost opportunity and lost revenues. Putting the right people, processes, and tools in place provides service assurance to help organizations build information technology that is highly resilient to deliver the requisite computing power and uptime businesses need today. Monitoring tools generate an extensive amount of data-but how does IT separate what`s really critical and what`s just a false alarm?
<
Page 8 |
Page 10 >