Job Title – Site Reliability Engineering Lead
What Skills & Experience You Should Bring
● Education: Bachelor’s or Master’s degree in computer engineering, computer science, or a related field.
● Experience: 7+ years in Site Reliability Engineering, DevOps, or cloud infrastructure roles with at least 2+ years in a leadership or mentoring capacity.
● Deep AWS expertise (EC2, S3, RDS, IAM, VPC, Lambda, CloudFormation/Terraform, etc.).
● Strong knowledge of Infrastructure-as-Code (IaC) using Terraform, AWS CDK, or CloudFormation.
● Proven experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI,or similar).
● Proficiency in containerization and orchestration (Docker, Kubernetes, ECS, or EKS).
● Expertise in monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana, ELK, CloudWatch, etc.).
● Strong scripting or programming background (Python, Bash, or Go).
● Sound understanding of networking, security, and identity/access
management in the cloud.
● Experience designing high-availability and disaster recovery strategies
for critical workloads.
● Excellent communication, problem-solving, and leadership skills with the ability to influence across teams.
Desired Skills
● AWS or other Cloud Certification (Solutions Architect, DevOps Engineer, etc.).
● Experience with AIOps, Serverless Architectures, and event-driven systems.
● Familiarity with FinOps practices and cost optimization frameworks.
● Experience with SaaS monitoring tools (Datadog, New Relic, Sumo Logic, PagerDuty).
● Exposure to Atlassian tools (Jira, Confluence, Bitbucket).
● Experience with SQL/NoSQL databases.
● Proven track record of leading cross-functional reliability initiatives or platform-wide automation projects.