We are a high-frequency prop trading firm with offices worldwide, seeking a skilled Senior Site Reliability Engineer to join our High Performance Computing team.
About the Role
You will develop and support our large-scale compute and storage platform, designed to solve demanding business and financial problems through computer modelling, simulation, and analysis.
Key Responsibilities:
Deploy, operate, and support HPC infrastructure, including diverse and distributed on-prem & cloud storage, schedulers (e.g. HTCondor or SLURM), and container orchestration platforms (Kubernetes).
Manage hardware and software vendor relationships.
Requirements
To succeed in this role, you will need:
Solid Linux admin experience, preferably in a large-scale research environment infrastructure.
Experience managing medium to large-scale platform environments, such as Kubernetes or Mesos.
Hands-on experience with at least one programming language, preferably Python.
A degree (or equivalent) in Computer Science or a related field.
Benefits
We offer a competitive salary of up to 250-275k SGD base, along with:
Performance-based bonuses.
Generous benefits, including medical insurance and gym membership.
A collaborative and friendly environment with smart, highly engaged colleagues.
Relaxed, dress-down office culture, with breakfast, lunch, and snacks provided.