As a member on the Site Reliability Engineer team, you will work on large-scale
system design and troubleshooting, and be fluent in systems programming and/or
automation. You will have a desire to tackle the complex problems of scale which
are unique to Tokopedia.
* Design, write and deliver software to improve the availability, scalability,
latency, and efficiency of Tokopedia's services.
* Solve problems related to mission critical services and build automation to
prevent problem recurrence; with the goal of automating response to all
non-exceptional service conditions.
* Influence and create new designs, architectures, standards and methods for
large-scale distributed systems.
* Engage in service capacity planning and demand forecasting, software
performance analysis and system tuning.
* Conduct periodic on call duties using a follow-the-sun model.
* Bachelors degree in Computer Science or related technical field, or
equivalent practical experience.
* Experience in one or more of: C, C++, Java, Perl, Python, Go, or scripting
experience in Shell and Perl.
* Experience working with Unix/Linux systems from kernel to shell and beyond,
with experience working with system libraries, file systems, and
* Networking: experience with network theory e.g. TCP/IP, UDP, ICMP, etc., MAC
addresses, IP packets, DNS, OSI layers, and load balancing.