AVP, Site Reliability Engineer, Core Banking Technology, Group Technology | Singapore, SG

Be among the first applicants.

DBS Bank Limited

Singapore

SGD 125,000 - 150,000

Be among the first applicants.

Yesterday

Job description

AVP, Site Reliability Engineer, Core Banking Technology, Group Technology

Business Function
Group Technology enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group Technology, we manage the majority of the Bank's processes and inspire to delight our business partners through our multiple banking delivery channels.

Responsibilities

End to End Handling of Open system applications across varying technologies applications production issues with work ranging from incident analysis, Code fix, SIT Testing, UAT Test planning (Scope, duration), UAT support for users, Source promotion to production, Facilitate / Drive recovery calls for major incidents and coordinate with multiple teams to drive the resolution.
Application improvements ranging from performance and operational improvements, identification and remediation of system and process Toils.
Ensure timely service restorations within SLA's. Drives Root Cause Analysis with technology partners, post incident resolution and facilitates RCA reviews.
Ensure Preventive and detective measures of the applications are identified and implemented, identify persistent or recurring problems and recommend creative solutions.
Working in a stretch role to take forward production issues from analysis to production fix/deployment.
Follow Production Support Processes and provide input to strengthen the same.
Providing status to leads, stakeholders and working with vendors to review the design/fix/enabling for production deployment.
Own communication for Incidents (SLA breaches, Application Major Incidents, Logistics issue), Incident escalation and responsible for communications with management.
Coordinate recurring issues and ensure long-term resolution through proper Incident and Problem Management.
Working with various teams like Infrastructure, development team to resolve, analysis of root cause for complex incidents.
Strong stakeholder management skills with main focus on continuous service improvement, consistent delivery and stability of production.
Toils automation. Automation of manual activities/processes for Production teams. (Automation experience required)
Site Reliability Engineering principles implementation.
Manage the identification and development of monitoring and improvements (process/systemic) to improve the reliability of Production systems.
Build automation and Observability tools to detect, troubleshoot and recover systems faster and improve Production systems reliability and resiliency.
Build predictive tools and solutions using Machine Learning capabilities. Make best use of available logs and instrumentation.
Strong communications skills and understands and works well within global team, ensures proper handoff of incidents and details.

Requirements

7+ years of strong experience in the Banking industry with minimum 5+ years in Run-the-Bank (RTB) lead role with a proven track record of working in Open Systems Production in a multi-country environment.
Sound understanding of RDBMS / Unix / Large banking applications.
Managed Open Systems (Ecosystems) and multi-countries production environments.
Good knowledge of production Infrastructure, performance monitoring, fine-tuning and reporting tools.
Implement Site Reliability Engineering principles regarding performance, reliability, monitoring, alerting in Production environment. Experience in automating toils and automation solutions.
Familiar with vendor product Quadient or equivalent statement generation engines, Jboss, MariaDB, EDB Postgres, Java (in Linux operating system), Kafka, MQ.
Experience in supporting critical applications using API driven cloud native technologies (e.g: AWS, PCF, OpenShift).
Good hands-on experience of AIX/ Linux/ Openshift/ AWS/ PCF/ Kubernetes.
Good hands-on experience of Databases - Oracle/ EDB/ Postgres/ MariaDB.
Good hands-on experience in Java & Spring boot, Python.
Knowledge of Middleware systems/ Kafka, REDIS.
Experience in Incident, Change and Problem Management.
Good working experience in Elasticsearch, Logstash, Grafana/ Kibana, AppDynamics etc.
Production automation (Automation experience required). Automation of manual activities/processes for Production teams. Automation experience using Python is an added advantage.
Responsible for production deployments, ensuring governance and controls and act as a gatekeeper for any changes to production environment.
Occasionally required to provide weekend/outside office hours support as per Rota/Requirement.

Apply now
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.