Position Summary
Serve as a Staff DevOps Engineer specializing in AWS and Kubernetes to design, implement, and optimize scalable, secure cloud-native infrastructure. Lead PoC initiatives, oversee monitoring solutions, and translate SOX compliance into actionable cloud implementation plans. Build a comprehensive team knowledge base, provide technical leadership for cloud migration/security/DevOps best practices, and drive operational excellence.
Job Responsibilities
- Lead design, implementation, and management of Kubernetes clusters on AWS EKS, ensuring high availability, scalability, and security (autoscaling, monitoring, logging, security policies).
- Spearhead proof-of-concept (PoC) initiatives for new tools/environments.
- Manage Kubernetes cluster lifecycle: upgrades, patch management, version control, and performance optimization.
- Support teams deploying/optimizing applications on Kubernetes (container orchestration and service mesh).
- Design and implement monitoring/alerting using CloudWatch, Prometheus, and Datadog.
- Create observability standards/dashboards using AI/AIOps and SRE agents for anomaly detection, alert noise reduction, and automated root cause analysis.
- Develop/maintain Infrastructure as Code (Terraform or AWS CDK) and implement CI/CD pipelines for deployment and image management.
- Design/implement security solutions and translate SOX compliance into actionable cloud implementation plans.
- Lead cloud migration and modernization of legacy applications with cross-functional teams.
- Mentor junior engineers; lead knowledge-sharing initiatives and maintain documentation/runbooks.
- Stay current with emerging AWS services; optimize cost-efficiency.
- Proactively identify inefficiencies and drive process improvement.
- Participate in on-call rotations to respond to critical/emergency issues.
Required Qualifications
- Bachelorβs degree in Computer Science/IT or related field.
- 7+ years in DevOps or Site Reliability Engineering.
- 5+ years hands-on AWS services/cloud architecture.
- 5+ years hands-on Kubernetes (cluster management, troubleshooting, optimization).
- Proficiency in at least one language (Python, Go, Java).
- Strong Infrastructure as Code experience (Terraform, CloudFormation, AWS CDK).
- Deep containerization/orchestration knowledge (Docker, Kubernetes) including security best practices.
- CI/CD experience (GitLab CI, GitHub Actions).
- Strong networking knowledge in cloud environments.
- Excellent problem-solving/troubleshooting.
- Proven PoC leadership.
- Experience creating/maintaining technical documentation/knowledge bases.
- Ability to identify operational inefficiencies and improve processes.
- Strong analytical, communication, and mentoring skills.
- Proficient in Microsoft Office (Word, Excel, Outlook) and basic business internet use.