As a member on the Site Reliability Engineer team, you will work on large-scale system design and troubleshooting, and be fluent in systems programming and/or automation. You will have a desire to tackle the complex problems of scale which are unique to Tokopedia.
* Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Tokopedia's services.
* Solve problems related to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.
* Influence and create new designs, architectures, standards and methods for large-scale distributed systems.
* Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
* Conduct periodic on call duties using a follow-the-sun model.
* Bachelors degree in Computer Science or related technical field, or equivalent practical experience.
* Experience in one or more of: C, C++, Java, Perl, Python, Go, or scripting experience in Shell and Perl.
* Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols.
* Networking: experience with network theory e.g. TCP/IP, UDP, ICMP, etc., MAC addresses, IP packets, DNS, OSI layers, and load balancing.