Job Description 职位描述 - Oversee daily operations of the data platform, including system monitoring, troubleshooting, performance optimization, and disaster recovery management. - Participate in requirement reviews, version releases, and deployment processes for the data platform. - Develop and maintain automated deployment tools, monitoring/alerting systems, and operational scripts to improve efficiency. - Prepare technical documentation, operation manuals, and maintenance reports to ensure traceability of system operations. - Establish and implement operational standards, monitoring strategies, and emergency plans to enhance system reliability. - Perform performance tuning, root cause analysis, and resolution for data clusters to ensure timely and accurate data processing. Qualifications 61 Technical Skills: o Proficient in Java or Python,Shell programming with scripting capabilities. o Advanced Linux system administration and Shell scripting. o Hands-on experience with Hadoop, Flink, YARN, Hive, etc. Familiarity with DolphinScheduler, Apache Doris, Superset, or Tableau is a plus. o Expertise in big data platform deployment, tuning, and automation tool development. 61 Soft Skills: o Strong documentation and communication skills. o Team-oriented mindset with resilience and problem-solving capabilities. o Passion for learning and adapting to emerging technologies. 61 Bachelor's degree or higher in Computer Science or related fields 61 5+ years of IT work experience, with at least 2 years in operations roles. 61 Proficient in Java/Python/Shell programming 61 Familiar with Linux OS and Shell scripting 61 Solid understanding of SQL and database management 61 Familiarity with big data components (e.g., Hadoop, Spark, Hive, Kafka) 61 Hands-on experience with monitoring tools (e.g., Prometheus, Grafana, Zabbix) and log analysis systems (e.g., ELK Stack)