Site Reliability Engineer (Marketplace) - Sea Labs

Sea
Singapore
SGD 100,000 - 125,000
Job description

The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; we build foundations for a long-lasting future. We don't limit ourselves on what we can or can't do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee's hyper-growing business scale has transformed most "innocent" problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.

About the Team

The mission of the SRE (Site Reliability Engineer) team is to ensure the efficient and sustainable operation of Shopee 24x7, as well as to build and maintain large-scale, highly available, high-performance distributed systems based on system availability and performance. It is formed by combining traditional software engineering and technical operation. The SRE team needs to dive deep into the Shopee development lines to ensure that the system is highly scalable under the rapid evolution of the System.

From the perspective of stability and performance, it includes the design of business development, components of the basic platform (middleware, container scheduling, caching, object storage, etc.), OS optimization, data center and network optimization. We optimize the inefficient and complicated operation in the traditional operation and maintenance mode through engineering and service means and are committed to building a sound monitoring system to improve the efficiency of incident handling.

Job Description

  1. Deep dive into development lines, learn and understand the mechanism of every application component, and promote product scalability, stability, and performance.
  2. Set up, manage, and maintain Shopee product/middleware/big-data applications and services.
  3. Perform regular and ad-hoc server-side deployments, make improvements to the performance, and troubleshoot.
  4. Design and develop automated technical operation platforms.
  5. Manage capacity and resources.
  6. Responsible for the full-chain stress test to enhance performance and remove redundancy of applications.
  7. Prepare routine operation documentation.

Qualifications

Education: Bachelor's degree or above in Computer Science, Engineering, Information Systems, or related fields.

Experience: More than 2 years of relevant experience (candidates with no working experience are welcomed to apply).

Technical Skills:

  • Extensive and hands-on knowledge with Linux operating systems (Ubuntu, CentOS, etc.).
  • Highly familiar with Computer Networks (TCP/IP, DNS, etc.), Computer Organizations, and OS.
  • Hands-on experience with at least one programming language: Bash, Python, Go.
  • Strong analytical and problem-solving skills with the ability to thrive in a dynamic work environment.
  • Passionate and possess a strong sense of responsibility.
  • Fast learning ability and a good team player.
  • Agile and detail-oriented.

Optional but Preferred Skills:

  • Experience with automation tools like Ansible, SaltStack.
  • Experience with monitoring tools like Prometheus, Zabbix, Grafana, etc.
  • Experience with load balancing tools like LVS, Nginx, Openresty, or HAProxy.
  • Experience with container technology such as Docker, Kubernetes.
  • Experience with High Availability system design and Server Deployment Process.
  • Experience with SRE.
  • Experience with Ops PaaS platform or Ops automation platform (i.e., CMDB).
Get a free, confidential resume review.
Select file or drag and drop it
Avatar
Free online coaching
Improve your chances of getting that interview invitation!
Be the first to explore new Site Reliability Engineer (Marketplace) - Sea Labs jobs in Singapore