Site Reliability Engineer

Be among the first applicants.

Selby Jennings

London

Remote

GBP 80,000 - 100,000

Be among the first applicants.

Yesterday

Job description

Site Reliability Engineer - Global Quant Hedge Fund - London (Remote)

Our client is a global quantitative and systematic hedge fund that leverages software engineering, data engineering, and financial engineering to drive innovation in crypto trading. They are seeking a Site Reliability Engineer (SRE) with a background in crypto trading to play a key role in ensuring the reliability, performance, and scalability of their high-frequency cryptocurrency trading systems.

In this role, you will be responsible for maintaining system health, performance monitoring, issue resolution, and process automation daily. You will be an integral part of a team dedicated to ensuring the smooth operation of mission-critical systems in a fast-paced 24/7 crypto trading environment.

The ideal candidate will have hands-on experience in a crypto trading setting and a passion for managing high-performance systems. They should be detail-oriented, proactive in problem-solving, and thrive in the dynamic, high-pressure world of crypto trading.

Examples of projects you could be working on include performance optimisation, reliability engineering, incident management and data integrity.

Responsibilities:

Monitor and Maintain trading system health
Identify, triage, and resolve issues in real-time
Develop automation scripts and tools
Implement and maintain monitoring and alerting systems
Conduct root cause analysis and implement preventive measures
Optimise trading system performance through analysis and tuning
Maintain documentation for operational procedures and system architecture
Assist in planning and scaling trading infrastructure

Requirements:

Bachelor's degree in Computer Science, Software Engineering, or related field
Experience in financial trading systems or high-frequency environments
Strong proficiency in Python for scripting and automation
Linux systems administration
Network protocols and troubleshooting techniques
Familiar with SQL/NoSQL databases and real-time data processing
Experience with monitoring tools (e.g., Prometheus, Grafana)