Principal Site Reliability Engineer

Fidelity

Administration, Software Engineering

Durham, NC, USA

Posted on Mar 25, 2026

Apply now

Job Description:

Position Description:

Combines Operational excellence with Development experience to deliver services at high scale, high availability with resilience. Builds reliability into the ecosystem by applying best practices in Resiliency Engineering, Automation, Observability and Chaos Testing. Streamlines and accelerates software delivery cycle by using DevOps practices and toolchain. Integrates Site Reliability Engineering (SRE) practices (Observability and Chaos) with DevOps processes and delivery pipelines to stop bad code from reaching production. Ensures business-critical enterprise systems are continuously available to internal and external customers. Implements technical standardization and process refinements within the engineering organization and for Site Reliability Engineers. Collaborates with production support teams to define and implement processes for the identification, collection, and analysis of incident data. Brings together technical, procedural, and financial data to reduce toil and increase efficiency.

Primary Responsibilities:

Develops Chaos Testing capabilities using multiple Chaos Tools (AWS Fault Injection Service (FIS), Chaos Mesh, and Chaosd) and Chaos Toolkit.
Develops and enhances organization’s internal Chaos Framework to streamline Chaos Executions and reporting.
Provides specialized technical expertise in the adoption of Chaos Engineering by application teams.
Chaos tests and observes business-critical applications to understand the weaknesses and increase application resiliency.
Activates Observability for the critical applications with recommended Service Level Indicators and Service Level Objectives for Latency, Availability, Error Rate etc.
Utilizes modern monitoring tools (Datadog, Splunk, Catchpoint etc.) to reduce mean time to detect an issue and improve the response times.
Creates CI/CD pipelines with security and quality checks with Application Lifecycle management toolchain. Helps in integrating Chaos and Observability with CI/CD pipelines.
Automates repetitive activities using scripting languages (Python, Groovy etc.).
Implements and supports solutions based on cloud platforms AWS/Azure and container orchestration Kubernetes.
Onboards /Evaluates New Cloud services that help to enhance the Resiliency of cloud ecosystem. Serves as a liaison for vendor engagement.
Participates in incident management, problem management and incident postmortems.
Takes part in peer code reviews providing qualitative feedback.
Builds processes and capabilities to adapt and respond to risks, and disruptions, while maintaining business operations and data recovery with minimal disruptions.
Coaches peer SREs and application teams on SRE and DevOps.
Implements Agile methodologies in the team’s project completion using incremental and iterative steps.

Education and Experience:

Bachelor’s degree in Computer Science, Engineering, Information Technology, Information Systems, or a closely related field (or foreign education equivalent) and five (5) years of experience as a Principal Site Reliability Engineer (or closely related occupation) implementing resilient container and cloud-based applications and infrastructure solutions, using DevOps or SRE practices, in a financial services environment.

Or, alternatively, Master’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, or a closely related field (or foreign education equivalent) and three (3) years of experience as a Principal Site Reliability Engineer (or closely related occupation) implementing resilient container and cloud-based applications and infrastructure solutions, using DevOps or SRE practices, in a financial services environment.

Skills and Knowledge:

Candidate must also possess:

Demonstrated Expertise (“DE”) improving application resiliency by implementing chaos engineering to build system's capability to withstand turbulent conditions in production, using Chaos Mesh, Chaosd, Azure Chaos Studio, AWS FIS, or Gremlin; and driving automation to implement scalable approaches for the planning, design, execution, and reporting of chaos testing using Jenkins pipelines, standard frameworks, data visualization, and dashboards.
DE implementing advanced observability practices and techniques in production and pre-production environments, at scale using Datadog, Splunk, or Catchpoint; tracking the error budget, proactively identifying issues, minimizing Mean Time to Repair (MTTR); and balancing customer expectations by implementing Service-Level Indicators (SLIs) and Service-Level Objectives (SLOs) using logs, traces, monitors and synthetic tests.
DE migrating and maintaining cloud applications and creating cloud solutions using Amazon Web Services (AWS) or Azure cloud services; Implementing infrastructure as code for cloud; Onboarding new AWS or Azure services with required reviews and security controls in non-production and production environments; and researching evolving cloud ecosystem to adopt machine learning based tools (AWS DevOps guru) to boost AIOps abilities.
DE implementing CI/CD pipelines in both production and non-production environments using Application Lifecycle Management (ALM) tools (JIRA, GitHub, Jenkins, SonarQube, Artifactory, or uDeploy) to enable faster code delivery, enhanced software quality, reliability, and security; and developing products, and core and common capabilities for the organization to reduce toil and drive standardization, using containerization and orchestration technologies (Docker or Kubernetes), Infrastructure as Code (IaC) tools, scripting languages (Python or Groovy), and engineering best practices.

#PE1M2

#LI-DNI

Certifications:

Category:

Information Technology

Most roles at Fidelity are Hybrid, requiring associates to work onsite every other week (all business days, M-F) in a Fidelity office. This does not apply to Remote or fully Onsite roles. Some roles may have unique onsite requirements. Please consult with your recruiter for the specific expectations for this position.

Please be advised that Fidelity’s business is governed by the provisions of the Securities Exchange Act of 1934, the Investment Advisers Act of 1940, the Investment Company Act of 1940, ERISA, numerous state laws governing securities, investment and retirement-related financial activities and the rules and regulations of numerous self-regulatory organizations, including FINRA, among others. Those laws and regulations may restrict Fidelity from hiring and/or associating with individuals with certain Criminal Histories.

Apply

All fields are required.

Benefits that balance life and work

From our fully paid parent leave to our on-site health and wellness centers, our benefits support the belief that more balance you have, the better you can achieve your goals.

Benefits

Company overview

Company overview

At Fidelity, we are passionate about making our financial expertise broadly accessible and effective in helping people live the lives they want. We are a privately held company that places a high degree of value in creating and nurturing a work environment that attracts the best talent and reflects our commitment to our associates. We are proud of our diverse and inclusive workplace where we respect and value our associates for their unique perspectives and experience.

Reasonable accommodations

Fidelity will reasonably accommodate applicants with disabilities who need adjustments to participate in the application or interview process. To initiate a request for an accommodation contact the HR Accommodation Team by sending an email to accommodations@fmr.com, or by calling 800-835-5099, prompt 2, option 3.

Equal opportunity employer

Fidelity Investments is an equal opportunity employer. We believe that the most effective way to attract, develop, and retain a diverse workforce is to build an enduring culture of inclusion and belonging.

Hybrid work schedule

Fidelity’s hybrid working model blends the best of both onsite and offsite work experiences. Working onsite is important for our business strategy and our culture. We also value the benefits that working offsite offers associates. Most hybrid roles require associates to work onsite all business days of every other week in a Fidelity office.

Applicant screening

At Fidelity, we value honesty, integrity, and the safety of our associates and customers within a heavily regulated industry. Certain roles may require candidates to go through a preliminary credit check during the screening process. Candidates who are presented with a Fidelity offer will need to go through a background investigation and may be asked to provide additional documentation as requested. This investigation includes but is not limited to a criminal, civil litigations and regulatory review, employment, education, and credit review (role dependent). These investigations will account for 7 years or more of history, depending on the role. Where permitted by federal or state law, Fidelity will also conduct a pre-employment drug screen, which will review for the following substances: Amphetamines, THC (marijuana), cocaine, opiates, phencyclidine.

AI Guidelines

Learn about our guidelines for use of AI when applying for a Fidelity job

Return to job search

Similar Jobs

Apply now

See more open positions at Fidelity

Principal Site Reliability Engineer

Job Description:

Certifications:

Category:

Apply

Benefits that balance life and work

Company overview

Similar Jobs

Principal Site Reliability Engineer

Principal Site Reliability Engineer

Senior Data Engineer

Not finding the right fit?