Placement: Jakarta, Bandung, Yogyakarta (relocation provided)
* Design, develop, and maintain our data infrastructure.
* Optimization/modification of data flow/pipeline to handle 3Vs of big data (Volume, Velocity, Variety).
* Develop custom ETL jobs to cater custom requirements.
* Coordinate with other departments (Commercial, Marketing, etc) to fulfil/adapt their data requirements/requests.
* Make sure the end user of the data (Analysts, Data Scientists, etc) can query the data seamlessly for their use.
* Explore/learn new technologies that can complement or replace our current stack to improve it.
* Background in server-side software development in Linux environment (It's a plus if can do front-end as well).
* Degree in Computer Science/Engineering/Mathematics is a good start, but not a must.
* Not scared of reading technical documentation or source code.
* Programming language: * Python
* Relevant experience: * Google Cloud Platform Data Infrastructure (Big Query, Dataproc, Dataflow)
* Hadoop (HDFS, MR, Yarn)
* Hadoop File Formats & Compression (Parquet, ORCFile, Snappy, gzip)
* SQL on Hadoop (Hive, SparkSQL, Impala)
* NoSQL (BigTable, HBase, Cassandra)
* RDBMS (MySQL)
* Distributed processing engine (Spark, Flink)
* Data Ingestion & Message Processing (RabbitMQ, ActiveMQ, ZeroMQ, Kafka, Flume)
* Stream processing (Spark Streaming, Storm)