About the Role
The Site Reliability Engineering (SRE) team architects, builds, and maintains the rock-solid infrastructure that applications rely on. We work closely with development teams to ensure scalability, reliability, and efficiency. This collaboration empowers us to deliver exceptional customer experiences while enabling developers to focus on building great features.
What You Will Do:
- Deploy, automate, maintain, and manage various cloud-based and on-premises production systems.
- Understanding the high-level overview of our architecture, and possessing the ability to systematically document new and existing requirements to ensure a smooth project delivery without miscommunication.
- Work closely with the Information security and infrastructure team in ensuring that we are adopting security best practices.
- Ensuring the availability, performance, scalability, and security of productions systems.
- Troubleshoot and resolve system issues across platform and application domains.
- Suggest architectural improvements and recommend process optimizations.
- Evaluate new technologies to enhance the infrastructure stack.
- Ensuring system security policies are properly remediated.
- Drive and implement automated provisioning and scaling of servers, along with testing and compliance checks using automation tools.
- Handle operational tasks, including on-call duties, alerts, and incident management.
What We Are Looking For:
- Minimum 2 years of engineering experience.
- Bachelor’s or Master’s degree in a relevant field (e.g., IT, Computer Science) or a proven track record in DevOps.
- A strong willingness to continuously upgrade skills and stay up-to-date with the latest DevOps trends.
- Experience with cloud-native tools (e.g., Kubernetes, Docker, Nginx, OpenTelemetry) is a plus.
- Experience managing cloud servers (AWS, GCP).
- A desire to transition into engineering management is a valued addition.
- Experience with on-premises physical servers, databases, and storage solutions (MySQL, PostgreSQL, Redis) is a plus, as well as familiarity with Infrastructure as Code (IaC) tools (Terraform, Pulumi).