Overview We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our dynamic team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our systems and infrastructure. You will work closely with development, operations, and product teams to implement robust solutions that enhance the overall user experience and drive operational excellence. ________________________________________ Key Responsibilities 61 System Design & Architecture: o Design, build, and maintain scalable and resilient infrastructure systems. o Collaborate with cross-functional teams to integrate SRE best practices into software development and deployment. 61 Automation & Monitoring: o Develop and implement automation tools to streamline operations, deployment, and system monitoring. o Implement robust monitoring, logging, and alerting systems to detect and respond to system anomalies. 61 Incident Management: o Lead incident response efforts, perform root cause analysis, and implement preventive measures. o Participate in on-call rotations to ensure prompt resolution of production issues. 61 Performance & Capacity Planning: o Optimize system performance through proactive analysis and capacity planning. o Maintain and achieve service level objectives (SLOs) and service level agreements (SLAs). 61 Continuous Improvement: o Drive initiatives for continuous improvement in reliability, performance, and operational efficiency. o Advocate for and implement best practices in system reliability, security, and efficiency. ________________________________________ Required Qualifications 61 Experience: o 3+ years of experience in a Site Reliability Engineering, DevOps, or related operational role. 61 Technical Skills: o Proficiency with cloud platforms such as Azure, AWS o Strong experience with containerization and orchestration tools (e.g., Docker, Kubernetes). o Expertise in automation and scripting languages (e.g., Python, Bash etc). o Solid understanding of networking, operating systems (Linux/Unix), and database management. 61 Operational Excellence: o Experience with CI/CD pipelines, Infrastructure as Code (e.g., Terraform, ansible), and monitoring tools (e.g., Dyantrace, Prometheus, Grafana). o Proven track record in incident management and implementing robust disaster recovery strategies. 61 Soft Skills: o Excellent problem-solving skills and attention to detail. o Strong communication and collaboration abilities. o Ability to thrive in a fast-paced, dynamic environment. ________________________________________ Preferred Qualifications 61 Advanced degree in Computer Science, Engineering, or a related field, or equivalent practical experience. 61 Certifications in cloud platforms (e.g., Azure Certified Solutions Architect, AWS Certified Solution Architect). 61 Experience working in a microservices-based architecture. 61 Familiarity with security best practices and compliance requirements.