Title: Incident Manager
Department: Engineering
Location: Remote within EU
About Block LabsBlock Labs is a leading force in the
Web3 space, incubating, investing in, and accelerating top-tier fintech,
crypto and iGaming projects. With a mission to shape the future of decentralized technology, we partner with visionary startups to raise funding, refine product-market fit, and grow their audiences. Our diverse team drives innovation, using deep industry expertise and an extensive network to empower the next wave of blockchain-driven companies. At Block Labs, we’re passionate about turning bold ideas into breakthrough success.
About The RoleIn this strategic role, you will be the cornerstone of our
Technical Support Operations, overseeing Incident and
Problem Management processes that uphold our rigorous reliability standards. Our global team is responsible for managing incidents from detection to resolution, ensuring that service outages, critical bugs, and security issues are promptly and effectively addressed. Working closely with Platform Engineering and other cross-functional teams, you will drive clear, user-first communications during disruptions and lead efforts to continuously improve our support processes. This role is essential for delivering the highest value to the organization through operational excellence.
Key Responsibilities- Serve as the Incident Commander for critical incidents across various domains, coordinating cross-functional teams to drive prompt resolution.
- Oversee Incident and Problem Management processes—from initial detection and triage through to root cause analysis and remediation.
- Ensure timely escalation of incidents to the appropriate experts, including software engineers, platform engineers, or external partners.
- Manage daily operations of the technical support team, aligning processes with ITIL best practices to maintain consistent, high-quality service.
- Collaborate with Platform Engineering, product teams, and senior management to provide accurate situation reports and facilitate effective internal and external communications during incidents.
- Lead post-incident reviews and drive continuous improvements in incident response strategies, processes, and tooling.
- Prepare detailed incident reports and conduct Root Cause Analysis (RCA) discussions to prevent future occurrences.
- Foster a user-first approach by ensuring that all external communications during service disruptions are clear, timely, and informative.
- Participate in on-call rotations to provide continuous support and ensure rapid response during incidents.
About You- 5+ years of experience managing major incidents in mission-critical or always-on environments.
- Experience with Web3 platforms and technologies, including non-custodial wallets and related integrations, is a strong plus; iGaming experience would be an advantage.
- Proven ability to independently lead multiple incidents concurrently with minimal support.
- Strong understanding of application development, system architectures, and cloud environments.
- Familiarity with infrastructure concepts, including physical, virtual, and containerized compute platforms.
- Practical experience with modern monitoring and telemetry tools such as Splunk, Prometheus, or Grafana.
- Basic data analysis skills using SQL or similar tools.
- Excellent task management and communication skills, with the ability to remain composed under pressure.
- Experience handling diverse incident types such as technical, security, privacy, or crisis management.
- Familiarity with distributed architectures and system interdependencies in a cloud environment.
- Proven experience in managing public-facing communications, including status pages and social media updates during incidents.
- A proactive, ownership-driven mindset with a commitment to continuous improvement in incident management processes.
Are you ready to take our Engineering function to the next level?