AI for IT Operations
Image: Depositphotos
AI is beginning to transform IT operations in significant ways and impacting the bottom line. This article will discuss how IT operations can be transformed by embedding AI into IT operations. Key use cases impacted by AI across IT operations such as infrastructure and application deployment, management of deployed environment and remediation of issues will be discussed. An example will then be provided to better understand how to transform IT operations with AI.
Current Environment
Below are some of the key processes many IT Operations environments use that AI can improve as the processes often deal with a large amounts of structured and unstructured data:
· Incident identification and management: IT systems generate large amounts of events and understanding them including filtering out the noise can be time consuming. AI can identity many of the relevant events so staff can focus on more important activities
· Root Cause analysis: Performing root cause analysis often requires understanding different types of data from several systems. The analysis often requires subject matter experts who understand the domain and is prone to human errors. AI can complement the efforts of the SMEs and enable less skilled staff to more quickly identify the root causes of issues.
· Remediation: System need to be corrected based on the findings of the root cause analysis. This includes updating code, creating new test cases with relevant data, deploying a test environment and final deployment in production. This is often a manual process and ripe for automation built on AI
· Predictive maintenance: Most IT environments are reactive and finding a a problem before it occurs can significantly improve operations. Providing the right data to AI predictive analytics systems can enable the system to better forecast upcoming issues and resolve them before they happen.
· Vulnerabilities: Threats evolve and IT department struggle to identify and resolve them. AI can continuously analyze the IT environment to identify potential threats and offer solutions.
· Manual tasks: There are a lot of manual tasks such as ticket resolution, patch management and software updates that can be automated more effectively when built on top of AI.
Use Cases
Each of the above processes have multiple use case and four processes have been chosen to illustrate some key AI use cases that enterprises are building to improve IT operations:
Incident Identification:
• Automate ticket creation from email, chat, and self-service portals
• Use AI to classify and route requests to the right support team
• Retrieve knowledge base articles and suggest resolutions
• Escalate unresolved issues with full context to support engineers
• Update ticket status and notify users through integrated channels
Incident Management
• Automate ticket creation and routing from monitoring alerts
• Use AI to classify and prioritize incidents
• Trigger remediation workflows for common issues
• Enable conversational resolution through integrated virtual agents
Root Cause Analysis
• Augmented event management through techniques such as alert grouping, correlation, and impact radius analysis
• Use Gen AI for release summaries usage insights, bottleneck identification, & value stream metrics
• Provides a single pane of glass for mapping dependencies from user interaction, frontend code, backend services, and databases using AI
• Machine learning driven network performance management and observability, to reduce noise, find anomalies proactively, improve signal clarity and meant time to detect
Remediation
• Generation of runbooks and automation based on recommendations
• Generation of DevOps pipeline for remediated components
• Generation of automation scripts for middleware IT automation, runbooks and standard operating procedures
• Simplify Ansible playbook creation
• Test plan and test data creation
Example
A sample use case where AI is embedded into IT operations is provided below. Please note that we are assuming a human in the loop for the use case below, but the entire process below can be automated with AI for less complex issues.
• An incident occurs and AI is used to analyze the different events related to the event and generate the relevant tickets
• The correct person in support is identified, and relevant information is provided to the person as well as any insights the AI system can gather that will be relevant to the incident.
• The support person uses an expert chatbot that has been trained with relevant system data and existing AI tools including those built on traditional AI/ML models to identify the root cause
• The product support team is contacted once the root cause is identified fix the problem. The product support team determines the code needs to be created/updated together and goes through the CI/CD process to update the application, create the relevant test data, test the application and then release it to production
The figure below provides a high-level flow of the above step
Figure 1 High level flow of IT operations use case embedded with AI
The journey to integrate AI into IT operations is at different stages across enterprises. There are many benefits in doing this including reduction in IT Help Desk tickets, improved productivity of IT help desk staff, containment of IT request to AI agents without need for live advisor, improved mean time to resolution, proactive issue prevention, increased availability of systems and reduction in time to remediate issues. The above can have significant improvement in ROI depending on the use case being implemented and the enterprises maturity level with integrating AI into their IT Ops environment. It is important that every IT operations environment examine how AI can be embedded into their current processes or modify their existing process to include AI as given the significant benefits the enterprise will gain.
About the Authors
Utpal Mangla
Utpal Mangla
Utpal Mangla (MBA, PEng, CMC, ITCP, PMP, ITIL, CSM, FBCS) is a General Manager responsible for Telco Industry & EDGE Clouds in IBM. Prior to that, he ( utpalmangla.com ) was the VP, Senior Partner and Global Leader of TME Industry’s Centre of Competency. In addition, Utpal led the 'Innovation Practice' focusing on AI, 5G EDGE, Hybrid Cloud and Blockchain technologies for clients worldwide. In his role as senior executive in business with P&L responsibility and thought leader in emerging technologies, Utpal’s mission is to fuel growth by building, scaling and implementing differentiated competitive market service solution offerings to meeting business imperatives of our customers. Under Utpal's leadership, IBM recently achieved the mission of scaling to make "Watson AI Impact 1.5 Billion Consumers” and creation of “Industry Blockchain platforms”. Utpal is a Master inventor and is at the forefront in making Hybrid Cloud and 5G/EDGE real for enterprises globally Utpal has been with IBM (and PwC) since 1998. With 20+ years of experience, Utpal is a highly motivated & dynamic leader who thrives in challenging environments. He is reputed for his trust, problem solving and organizational skills. Recipient of numerous client excellence awards, he is recognized as “IBM Top Talent" Utpal is a regular speaker at industry forums, univ and business conferences globally, including MWC, THINK, TMForum, Dreamforce, Cannes, Fierce 5G and CEM Telecoms. With 50+ articles, Utpal contributes to industry blogs, analyst reports and emerging marketplace trends. He has been quoted in Fortune, Bloomberg, GSMA, LF and BusinessWire. Utpal is an active contributor & member of FORBES council, AI Think Tank at Cognitive World, is current chair of ISSIP Strategy Council, member of CompTIA’s IoT Advisory leadership and was on board of ATIS. Utpal is also member of IBM’s Executive Partner Promotion committee, Talent Ecosystem & 5G EDGE Acceleration teams. Utpal is on advisory boards of Penn State Univ and Rochester Institute of Tech. An active STEM volunteer and P-TECH mentor dedicated to ‘Pathways in Technology, Early College”, Utpal supports education outreach initiatives through Univ of Toronto and Prof. Engineers Ontario. Utpal holds Bachelor’s degree in Computer Science Engg from Pune Univ (with highest honours) and MBA from Northwestern Univ’s Kellogg Graduate School of Management. He completed executive studies at Harvard Business School’s strategic leadership, Wharton School’s financial value creation and Stanford Business School's entrepreneurial leadership programs
John Thomas
John Thomas is part of IBM's automation sales team and has prior experience working with a startup that scaled from $0 to $20M annual revenue and successful mid-size companies that were acquired by IBM. He has generated millions in pipeline and collaborated with customers, partners and IBM teams to help organizations modernize their IT environment through infrastructure automation and security solutions.
Joel George
Joel George is a Solutions Engineer at IBM where he helps clients modernize their operations through AI, automation, and security solutions. A Computer Science honors graduate of UT Dallas and National Merit Scholar, he has built his career across Fortune 500 companies with hands-on experience in data engineering, machine learning, cloud infrastructure, and AI security. He brings both a deep technical and client-facing perspective to the opportunities and challenges of AI in the enterprise.