Sr. Director, Global Site Reliability Engineering

Cboe Digital • Full-time • Lenexa, KS • $174.25k - $215.25k / year • 1m ago

Job Description:

The Sr. Director of Global Site Reliability Engineering (“SRE”) will be responsible for overseeing the global reliability, scalability, and performance of Cboe’s critical infrastructure and services. This role will lead a team of engineers across multiple regions including APAC, North America, and Europe, driving operational excellence and ensuring that our systems are designed, built, and operated with a focus on resilience, availability, and cost efficiency. The Cboe SRE team is a highly skilled technical unit responsible for platform engineering, configuration management, implementation, capacity planning, performance tuning and analysis, troubleshooting, and process automation.

The Sr. Director of Global SRE is a thought leader that influences how Cboe thinks about engineering to define future directions around process, architecture, automation, and quality. They will drive consistent solutions and process across global teams, providing mentorship within and outside of the SRE team.

The Sr. Director of Global SRE will maximize efficiencies across the SRE functional unit, lead significant projects, drive consensus and implement or migrate to new technologies or solutions. They will collaborate closely with operations, engineering, business, and global leaders to define and implement SRE best practices and strategies that align with the company’s growth objectives.

Responsibilities:

Global Leadership & Strategy:

Develop and execute the global SRE strategy, ensuring alignment with business goals and technology roadmaps across all regions.

Operational Excellence:

Drive initiatives that enhance the availability, scalability, and performance of global infrastructure and services, minimizing downtime and service disruptions.
Maximize efficient use of resources to meet Cboe’s business objectives.
Define and enhance policies and procedures that are supported by the SRE team (e.g., Capacity Planning Policies & Procedures, Change Management Policies & Procedures, Disaster Recovery Plans, …).

Cross-Functional Collaboration:

Work closely with software engineering, infrastructure, operations, security, and business teams to design and implement reliable and secure services.

Team Development:

Build, mentor, and lead a high-performing, globally distributed SRE team of 30+ associates, fostering a culture of collaboration, continuous improvement, and technical innovation.
Develop a strong global leadership team while maintaining a keen eye on succession planning.

Incident Response and Learning Reviews:

Lead the operational response and troubleshooting efforts for critical incidents, ensuring timely resolution.
Drive root cause analysis efforts and implement long-term systemic improvements by executing on lessons learned through the Cboe Learning Review process.
Ensure incident management records are descriptive, accurate, and that all regulatory and compliance reporting obligations are effectively met.

Automation & Efficiency:

Spearhead automation efforts to improve operational workflows, reducing manual intervention and improving system uptime.
Decompose workloads across team to maximize efficiency and make effective use of Jira for project/task management.

Capacity Planning:

Drive timely implementation of capacity planning decisions avoiding need of last-minute heroic efforts to avoid a capacity limiting issue.
Support the budget planning process to ensure capacity budget is sufficient to cover annual infrastructure growth and performance needs.
Participate in quarterly Capacity Planning meetings led by the SRE team, ensuring that any necessary follow up is promptly addressed.

Monitoring & Analytics:

Oversee the implementation of monitoring and alerting systems, ensuring proactive detection and resolution of issues before they impact customers or result in reportable compliance/regulatory issues.

Cost Management:

Support optimization of expense management for infrastructure expenses by developing cost-efficient strategies for resource allocation, cloud usage, and scaling.

Risk Management & Compliance:

Ensure that all systems and processes supported by the SRE team meet or exceed regulatory, compliance, and security standards.
Monitor disaster recovery and business continuity plans to ensure they are well-developed and regularly tested.
Test all changes to systems and platform functionality prior deployment to production environments.

Requirements:

Bachelor’s degree in Computer Science, Computer Engineering, Software Engineering, or a related discipline. Masters preferred.
Minimum 15 years of experience in a Technical/Operations role with a significant amount of focus in an SRE, DevOps, Software Engineering, Systems, Network, or Database Administration, related discipline.
Minimum 10 years of leadership experience.
Ideal candidates will have 3+ years of experience leading a global team. · Experience supporting cloud infrastructures (e.g., AWS, Azure, Google Cloud), containerization (e.g., Docker, Kubernetes), monitoring tools (e.g., Prometheus, Datadog, Grafana), and automation frameworks (e.g., Terraform, Pulumi, Ansible, …).
Excellent listening, written and verbal communications skills including the ability to explain complex technical concepts to non-technical stakeholders.
Must possess strong analytical, quantitative, and research skills. 
Intellectually curious.
Excellent organizational and time management skills.

** This role will be physically located in Kansas City, Hybrid. You must be willing to relocate to Kansas City if you do not already reside there **

#LI-CS

Our pay ranges are determined by a number of factors, including, but not limited to, role, experience, level, and location. The national new hire base pay range for this job in the United States is $174,250-$215,250. This range represents the minimum and maximum base pay the company expects to offer for new hires working in the position full time. If you live in one of the following areas or if you work in a Cboe office in the following areas, the range may be higher according to the geographic differentials listed below:

US Geographic Differentials:

110%: Austin TX, Chicago IL, Denver CO, San Diego CA

115%: Los Angeles CA, Seattle WA

120%: Boston MA, Washington DC

125%: New York City NY

130%: San Francisco CA

Within the range, individual pay is determined by a number of factors, including, but not limited to, work location, job-related skills, experience, and relevant education or training. In addition to base pay, our total rewards program includes an annual variable pay program and benefits including healthcare (medical, dental and vision), 401 (k) with a generous company match, life and disability insurance, paid time off, market-leading tuition assistance, and much more! Your recruiter will provide more details about the total compensation package, including variable pay and benefits, during the hiring process. For further information on our total rewards program, visit TOTAL REWARDS @CBOE.

Any communication from Cboe regarding this position will only come from a Cboe recruiter who has a @cboe.com email or via LinkedIn Recruiter. Cboe does not use any otherthird party communication tools for recruiting purposes.