Job Description:
You will be responsible for managing, securing, and scaling our ETL infrastructure in AWS and GCP. You will also provide development support for various backend systems such as our CMS/API and ETL data pipeline.
- An effective Data Engineer should be able to effectively communicate with the Product & Engineering team while developing/maintaining ETL pipelines that support our data and analytics product efforts. The ideal candidate must be able to prioritize tasks and complete them in a timely manner.
Key responsibilities
- Collaborate with our engineers, researchers, executives, and designers to develop new data capabilities using database systems (SQL, PostgreSQL) and data warehousing solutions within AWS and GCP.
- Process unstructured data into structured and clean data.
- Collaborate with our DevOps and Research teams to leverage ETL systems to help users analyze data relevant to a specific business problem.
- Utilize data APIs to enable data scientists and analysts to query important data.
- Maintain the security and health of our data pipelines.
- Conduct code reviews and engage in pair programming within the Agile Development lifecycle.
- Contribute to CMS development and other backend systems using PHP and/or Python.
- Work with MWAA (Airflow) and Redash.
Qualifications
- 7 or more years experience in data pipeline engineering.
- Familiarity with programming languages such as Python and Scala.
- Knowledge about source control software such as git.
- Knowledge of the workings of Redis, Redshift, and BigQuery.
- Experience in Business Intelligence products such as Google Data Studio, Looker, Domo, or Tableau.
- Experience using development environments such as Docker, Vagrant, Virtual Box or similar platform.
- Knowledge of algorithms and data structures to support data filtering and data optimization.
- Strong familiarity with Unix / Linux operating systems and general command-line knowledge.
- Knowledge of digital assets, crypto, and blockchain technologies.
Nice to have
- Experience with CI/CD tools to support developer lifecycle.
- Collaborate with our engineers to build and maintain infrastructure required for feature development.
- Provide development support for CMS and other backend systems.
- Familiarity managing data pipelines in cloud infrastructure in AWS or other cloud providers (Azure, GCP.)
- Familiarity with AWS infrastructure suite (ECS, Elasticache, RDS, ALB, EC2, IAM, Cloudsearch) or other distributed systems.
- Familiarity managing resources with IaC tools such as Terraform, Ansible, Puppet, or Chef.
- Knowledge of building Docker images.
- Industry-standard certifications.
- Experience with the Atlassian suite.
- Proficiency in scripting.