AI Ops Architect

IBM Computing
Old Toronto
CAD 60,000 - 80,000
Job description

At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, let's talk.

Your Role and Responsibilities

We are looking for an AIOps Architect to lead the development and deployment of AI-enhanced solutions for IT operations. In this role, you will architect cloud-native, compliant platforms that integrate AIOps, cognitive computing, and machine learning models to improve infrastructure performance, reduce downtime, and enhance system observability. You will design scalable, secure, and resilient systems, develop automated operations, and implement robust security practices to ensure compliance and operational excellence.

As an AIOps Architect, you will guide clients in their digital transformation, utilizing state-of-the-art technologies to build intelligent operations platforms that drive efficiency, enhance system reliability, and support business growth.

Core Responsibilities

  1. Architect and deploy hybrid, multi-cloud, and cloud-native solutions to support payments transformation, aligning infrastructure, systems, networking, and data center strategies.
  2. Architect and implement comprehensive Solution Architectures, High-Level Designs (HLD), and Low-Level Designs (LLD) that ensure seamless integration of cloud-native technologies, AI-enhanced monitoring, and automation tools, adhering to best practices in security, compliance, and governance.
  3. Develop and deploy strategies to enhance scalability, resilience, and operational efficiency across hybrid and multi-cloud environments, integrating automation, observability, and robust security protocols to support seamless, high-performing, and compliant systems.
  4. Design and implement solutions that optimize cloud operations, infrastructure management, application performance, DevOps pipelines, security frameworks, network architecture, MLOps, and LLMOps.
  5. Deep expertise in monitoring tools (AppDynamics, Dynatrace, Splunk, Instana, QRadar, AWS CloudWatch, Azure Monitor, Google Operations Suite), with a focus on LLM observability and security for real-time analytics and anomaly detection.
  6. Develop advanced monitoring and observability frameworks leveraging LLM observability and security, enabling robust tracking of application performance, anomaly detection, and real-time analytics for Large Language Models and other AI/ML workloads.
  7. Integrate supervised learning models for predictive analytics, employing techniques such as data cleaning, event correlation, and root cause analysis to generate actionable insights that drive proactive incident resolution and optimize system performance.
  8. Design and implement IT Service Management (ITSM) and ITIL frameworks, encompassing incident management, problem management, change management, and service level management, to standardize operational workflows and enhance service reliability.
  9. Utilize AI/ML models, including machine learning-based anomaly detection and reinforcement learning, to automate incident response, performance tuning, and infrastructure scaling, reducing Mean Time to Detection (MTTD) and Mean Time to Resolution (MTTR).
  10. Engineer robust security architectures that include Cloud Native Application Protection Platforms (CNAPP), Zero Trust Network Access (ZTNA), and fully automated DevSecOps pipelines, ensuring compliance with stringent regulatory requirements and maintaining security posture across multi-cloud ecosystems.
  11. Design and deploy High Availability (HA) and Disaster Recovery (DR) solutions using distributed architectures, multi-zone redundancy, data replication, and automated failover, ensuring minimal service disruption and business continuity in multi-region deployments.
  12. Implement chaos engineering practices, conducting FURPS (Functionality, Usability, Reliability, Performance, Supportability) testing to identify potential failure points, validate system resilience, and ensure seamless recovery under high-stress conditions.
  13. Lead end-to-end project lifecycle management, including agile project methodologies, DevOps pipelines, resource allocation, risk management, and milestone tracking, to ensure the successful deployment of scalable, robust, and secure solutions aligned with client objectives.

Required Technical and Professional Expertise

  1. 8+ years of experience in the design, delivery, and scaling of complex, large-scale IT projects, with a focus on cutting-edge technology solutions across hybrid, multi-cloud, and on-premises environments.
  2. 3+ years of technical leadership as a solution architect, driving the design, integration, and management of hybrid cloud solutions, including seamless coordination across various cloud environments.
  3. Demonstrated success in leading super complex projects, from initial solution design through to deployment, managing diverse teams, multi-vendor coordination, and ensuring alignment with strategic business goals.
  4. Strong background in architecting complex, multi-cloud systems, leveraging hyperscalers (AWS, Azure, IBM Cloud, Google Cloud), with experience in multi-region deployments, multi-cloud networking, and cross-cloud service integration.
  5. Proven expertise in designing cloud-native solutions with microservices, containers (Docker, Podman), and orchestration platforms (Kubernetes, OpenShift), ensuring modular, scalable, and resilient deployments.
  6. In-depth understanding of regulatory compliance, security frameworks, and best practices in designing secure, resilient architectures.
  7. Familiarity with integrating AI/ML models to enhance monitoring, incident response, and predictive maintenance processes.
  8. Expertise in emerging technologies, such as AI-enhanced operations, automation frameworks, and cloud-native security, to future-proof systems and improve operational efficiency.

Preferred Technical and Professional Expertise

Same as above

About Business Unit

IBM Consulting is IBM's consulting and global professional services business, with market leading capabilities in business and technology transformation. With deep expertise in many industries, we offer strategy, experience, technology, and operations services to many of the most innovative and valuable companies in the world. Our people are focused on accelerating our clients' businesses through the power of collaboration. We believe in the power of technology responsibly used to help people, partners and the planet.

Your Life @ IBM

In a world where technology never stands still, we understand that, dedication to our clients success, innovation that matters, and trust and personal responsibility in all our relationships, lives in what we do as IBMers as we strive to be the catalyst that makes the world work better.

Being an IBMer means you'll be able to learn and develop yourself and your career, you'll be encouraged to be courageous and experiment every day, all whilst having continuous trust and support in an environment where everyone can thrive whatever their personal or professional background.

Our IBMers are growth minded, always staying curious, open to feedback and learning new information and skills to constantly transform themselves and our company. They are trusted to provide ongoing feedback to help other IBMers grow, as well as collaborate with colleagues keeping in mind a team focused approach to include different perspectives to drive exceptional outcomes for our customers. The courage our IBMers have to make critical decisions every day is essential to IBM becoming the catalyst for progress, always embracing challenges with resources they have to hand, a can-do attitude and always striving for an outcome focused approach within everything that they do.

Are you ready to be an IBMer?

About IBM

IBM's greatest invention is the IBMer. We believe that through the application of intelligence, reason and science, we can improve business, society and the human condition, bringing the power of an open hybrid cloud and AI strategy to life for our clients and partners around the world.

Restlessly reinventing since 1911, we are not only one of the largest corporate organizations in the world, we're also one of the biggest technology and consulting employers, with many of the Fortune 50 companies relying on the IBM Cloud to run their business.

At IBM, we pride ourselves on being an early adopter of artificial intelligence, quantum computing and blockchain. Now it's time for you to join us on our journey to being a responsible technology innovator and a force for good in the world.

Location Statement

Must have the ability to work in Canada without sponsorship.

For additional information about location requirements, please discuss with the recruiter following submission of your application.

Being You @ IBM

IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.

Get a free, confidential resume review.
Select file or drag and drop it
Avatar
Free online coaching
Improve your chances of getting that interview invitation!
Be the first to explore new AI Ops Architect jobs in Old Toronto