1.Responsible for developing data processing, analysis and development work using Python and SQL, building efficient and stable data processing workflows to meet the company’s various business data requirements, including data extraction, transformation and loading (ETL/ELT) operations. 2.Proficient in using Apache Spark (SQL) and Apache Flink (SQL), with extensive experience in applying them to ETL/ELT tasks. Capable of handling large-scale data distributed processing and real-time data stream transformation to ensure data accuracy and timeliness. 3.Responsible for working with relational databases such as MySQL and PostgreSQL, including database design, optimization and performance tuning. Able to design reasonable database architectures based on business needs, optimize query statements and improve database performance to ensure stable system operation under high concurrency and large data volume scenarios. 4.Skilled in operating the Aliyun platform, leveraging its services and tools for data storage, computing and deployment. Ensures secure, reliable and efficient operation of the data platform in the cloud environment, and is capable of configuring and managing cloud resources flexibly according to different business needs. 5.Possesses practical experience in using Databricks (Delta) for managing large-scale data lakes. Responsible for data lake architecture design, data organization, storage optimization and data security to enable efficient data storage, access and analysis, providing strong support for enterprise data asset management and utilization. 6. Has real-time streaming ETL and CDC (Change Data Capture) experience. Capable of real-time data processing and synchronization to ensure data consistency across different systems, providing timely data insights for real-time decision-making. 7. Possesses extensive experience in building data engineering projects across various infrastructures. Proficient in optimizing, maintaining and upgrading existing data projects to enhance system performance, scalability and stability, meeting the company’s rapidly evolving business and data needs. 8. Responsible for designing, implementing, and optimizing Kafka-based data pipelines for high-throughput, low-latency event streaming. Proficient in Kafka Streams API for real-time data transformation and processing, enabling seamless integration with downstream systems like Spark or Flink .
Job Requirement: 1. Proficiency in Python and SQL. 2. Familiarity with ETL/ELT using Apache Spark(SQL) and Apache Flink(SQL). (required both) 3. Proficiency in working with relational databases (e.g., MySQL, PostgreSQL), Experience in database design, optimization, and performance tuning. 4. Hands-on experience with aliyun cloud platform 5. Hands-on experience with Databricks(Delta) and Apache Paimon for managing large-scale data lakes. (required both) 5. Real-time streaming ETL and CDC experience 6.Proficiency builds engineering data projects based on various infrastructures and also has the ability to optimize and maintain 7. Hands-on experience with Kafka architecture (Topics, Partitions, Brokers, Consumer Groups) and tuning configurations (e.g., acks, linger.ms, compression.type). 8. 3+ years of related working experience 9. Fluent in English reading and writing