Summarized Accountability: The role will play a leadership role in extending the enterprise data platform to support the Rich Products business in mainland China. In addition to supporting a reliable and performant data platform, the role is also responsible for building highly available data pipelines that facilitate deeper analysis and reporting, following industry best practices. The senior data engineer's work would be ensuring best practices and standards are defined and established. This role will prioritize and monitor both performance and cost management for the mainland China instance of the enterprise data platform.
Key Responsibilities: -Overseeing junior and mid-level data engineering activities -Support in the building, monitoring, and optimizing data pipelines -Driving best practices and standards for Enterprise Data Lake(house) -Responsible for the design, deployment, and maintenance of data platform within the market -Collaborating and solutioning with the enterprise data and analytics teams on alternatives when faced with differences in underlying tech stack capabilities -Develop and monitor data quality metrics to support remediation and governance improvements -Ensures enterprise data is cataloged and documented to support analytic workloads -Enable the platform to support data integration, BI, advanced AI and ML workloads. -Implement and maintain best practices in DataOps, MLOps, and LLMOps to ensure operational efficiency. -Ensure adherence to data governance and compliance with local Chinese regulations and international data standards.
Technical Skills: -Candidate must possess strong leadership skills, be result-driven and a strategic problem solver. -Must be hands-on/well-versed in Python/PySpark coding and scripting -Strong expertise and hands-on experience with Azure Data Factory and Azure Databricks -Experience with Agile Methodology and Azure Devops -Databricks Lakehouse architecture experience strongly preferred -Strong expertise with data modeling and design -Understanding of the process and the importance of building data quality rules and expectation within data pipeline jobs and workflows -Familiarity with software and system engineering design principles & standards (e.g., TDD) -Foundational knowledge of data management architectures like Data Warehouse, Data Lake, Data Hub and the supporting processes like Data Integration, Governance, Metadata Management -Ability to design, build and manage data pipelines for data structures encompassing data transformation, data models, data quality and observability, schemas, metadata, and job management -Experience with data integration technologies such as ETL/ELT, data replications/CDC, message-oriented data movement, API design and access, stream data integration, and data virtualization -Basic understanding of machine learning algorithms and approaches -Knowledge of/proficiency using one or more popular languages and frameworks such as SQL, Python, Scala, Apache Spark, Command Line, etc. -Ability to automate development via CI/CD patterns and processes -Experience leveraging version control and repository management systems (Git experience preferred) -Experience leveraging IDE and source code editors (e.g., Visual Studio Code) -Develop and monitor performance metrics and data SLAs -Oversee and optimize costs associated to platform operations and usage -Formulate and maintain disaster recovery and business continuity strategies
Education: -A bachelor's or master's degree in computer science, statistics, data management, information systems, or a related quantitative field is preferred.