【南京SRE Engineer职位招聘_南京爱普瑞斯软件有限公司招工招聘信息】-51米多多招聘网

Overview
We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our dynamic team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our systems and infrastructure. You will work closely with development, operations, and product teams to implement robust solutions that enhance the overall user experience and drive operational excellence.
________________________________________
Key Responsibilities
61 System Design & Architecture:
o Design, build, and maintain scalable and resilient infrastructure systems.
o Collaborate with cross-functional teams to integrate SRE best practices into software development and deployment.
61 Automation & Monitoring:
o Develop and implement automation tools to streamline operations, deployment, and system monitoring.
o Implement robust monitoring, logging, and alerting systems to detect and respond to system anomalies.
61 Incident Management:
o Lead incident response efforts, perform root cause analysis, and implement preventive measures.
o Participate in on-call rotations to ensure prompt resolution of production issues.
61 Performance & Capacity Planning:
o Optimize system performance through proactive analysis and capacity planning.
o Maintain and achieve service level objectives (SLOs) and service level agreements (SLAs).
61 Continuous Improvement:
o Drive initiatives for continuous improvement in reliability, performance, and operational efficiency.
o Advocate for and implement best practices in system reliability, security, and efficiency.
________________________________________
Required Qualifications
61 Experience:
o 3+ years of experience in a Site Reliability Engineering, DevOps, or related operational role.
61 Technical Skills:
o Proficiency with cloud platforms such as Azure, AWS
o Strong experience with containerization and orchestration tools (e.g., Docker, Kubernetes).
o Expertise in automation and scripting languages (e.g., Python, Bash etc).
o Solid understanding of networking, operating systems (Linux/Unix), and database management.
61 Operational Excellence:
o Experience with CI/CD pipelines, Infrastructure as Code (e.g., Terraform, ansible), and monitoring tools (e.g., Dyantrace, Prometheus, Grafana).
o Proven track record in incident management and implementing robust disaster recovery strategies.
61 Soft Skills:
o Excellent problem-solving skills and attention to detail.
o Strong communication and collaboration abilities.
o Ability to thrive in a fast-paced, dynamic environment.
________________________________________
Preferred Qualifications
61 Advanced degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
61 Certifications in cloud platforms (e.g., Azure Certified Solutions Architect, AWS Certified Solution Architect).
61 Experience working in a microservices-based architecture.
61 Familiarity with security best practices and compliance requirements.