As a Data Engineer for our advanced analytics platforms, your main responsibilities will be to: Design and implement patterns of practice for productized, portable, modular, instrumented, CI/CD automated and highly performant data ingestion pipelines that leverage structured streaming techniques, processing both batch and streamed data in unstructured, semi-structured and structure form, using Apache Spark, Delta lake, Delta Engine, Hive and other relevant tech stacks Ensure that data ingestion pipelines built with these patterns validate and profile inbound data reliably, identify anomalous and trigger appropriate remediation actions by operations staff when needed Filter, consolidate and contextualize ingested data and further aggregate according to data analytics and ML requirements. Use agile development practices, and continually improve development methods with the goal of automating the build, integration, deployment and monitoring of ingestion, enrichment and ML pipelines Using your expertise and influence, help establish patterns of practice for the above, and encourage their adoption by software and data engineering teams across the company Monitor and optimize the performance of data pipelines to ensure efficient data processing and storage.
Education, Experience, and Licensing:
Advanced degree in computer science, but at a minimum a bachelor's degree in 5 years of programming proficiency in, at least one modern programming language (e.g. Java, C#, JavaScript, etc.) and at least one other high-level programming language such as Python SQL Expert level proficiency with agile software development & continuous integration + continuous deployment methodologies along with supporting tools such as Git (Gitlab), Terraform
5+ years of experience in big data engineering roles, developing and maintaining ETL and ELT pipelines for data warehousing, on-premise and cloud datalake environments 3+ years experience with AWS platform services, including AWS S3 & EC2, Data Migration Services (DMS), RDS, EMR, RedShi0ft, Lambda, DynamoDB, CloudWatch, CloudTrail Strong technical collaboration and communication skills
Technical Qualifications:
Proficiency with functional programming methods and their appropriate use in distributed systems Expert proficiency with data management fundamentals and data storage principles Expert proficiency with AWS foundational compute services, including S3 and EC2, ECS and EKS, IAM and CloudWatch Experience with Databricks, including the use of Databricks notebooks, clusters, and jobs
Other Qualifications:
Demonstrated curiosity and an ability to learn new skills on an ongoing, sustained basis Demonstrated systems perspective when analyzing problems, thinking about overall operation, failure modes and how to address these problems proactively A strong sense for the importance of documentation, and the importance of not having to learn things twice Ability to work in an agile product team environment and balance a diverse set of stakeholder requests Excellent oral and written communication skills with an ability to break down complex technical systems to help business partners understand the value Strong technical collaboration and communication skills as well as the ability to drive cultural change and adoption of best practices through community participation Ability to collaborate with other teams across the company, defining technology roadmaps, sharing experiences and lessons learned for continual improvement Excellent problem-solving and troubleshooting skills Process-oriented with great documentation skills Experience with data visualization tools and techniques Familiarity with machine learning frameworks and libraries