Job Description:
Position Description:
Automates with various scripting languages such as Python and Shell scripting to run, build and develop applications. Coordinates systems using Infrastructure as Code (IaC) tools (IAM, ARM, Terraform, and Chef). Deploys applications in a DevOps environment using Cloud Computes and DevOps concepts (CI/CD pipelines). Utilizes modern monitoring tools such as DataDog, Prometheus, and Splunk. Confers with systems analysts, engineers, programmers and others to design systems and to obtain information on project limitations and capabilities, performance requirements and interfaces.
Primary Responsibilities:
Provides high scale, highly available, and resilient delivery services using automation and infrastructure code.
Builds reliability using resiliency engineers, automation, observability and chaos tests.
Implements advanced observability practices and techniques at scale.
Maintains and interprets large datasets using query languages and visualization tools.
Troubleshoots new software, methods, and practices and brings them to developers.
Defines and executes a comprehensive reliability and observability strategy available to customers.
Brings together technical, procedural, and financial data to reduce toil and increase efficiency.
Executes plans for technical standardization and process refinement within the engineering organization.
Troubleshoots stack-wide engineers issues related to hardware, software, network, applications, and cloud service providers.
Analyzes user needs and software requirements to determine feasibility of design within time and cost constraints.
Confers with data processing or project managers to obtain information on limitations or capabilities for data processing projects.
Consults with customers or other departments on project status, proposals, or technical issues, such as software system design or maintenance.
Confers with systems analysts and other software engineers/developers to design systems and to obtain information on project limitations and capabilities, performance requirements and interfaces.
Develops and coordinates software system tests and validation procedures, programs, and documentation.
Education and Experience:
Bachelor’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and five (5) years of experience as Principal Site Reliability Engineer (or closely related occupation) designing and developing reliability, performance, and scalability of enterprise-wide full stack applications (ensuring seamless integration and high availability) using Datadog, ELK, and Prometheus in a financial services environment.
Or, alternatively, Master’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and three (3) years of experience as a Principal Site Reliability Engineer (or closely related occupation) designing and developing reliability, performance, and scalability of enterprise-wide full stack applications (ensuring seamless integration and high availability) using Datadog, ELK, and Prometheus in a financial services environment.
Skills and Knowledge:
Candidate must also possess:
Demonstrated Expertise (“DE”) designing, architecting, and building scalable and resilient N-tier software solutions, and creating E2E plans for critical services according to DevOps practices, using .Net, Java, Python, Docker, and Kubernetes.
DE delivering high scale, highly available, and resilient services according to automation and Infra-structure-as-Code (IaC) methodologies, using Open Telemetry (OTEL), Datadog, Splunk, Prometheus, and ELK.
DE building cloud-based platforms for consumption at an enterprise level, using AWS EKS, Lambda, EMR, and CloudFormation AWS and Azure services.
DE developing micro-services in EKS platform; and maintaining CI/CD pipelines using DevOps technologies (GitHub, Artifactory, Sonar, Jenkins/Jenkins Core, and Terraform).
#PE1M2
#LI-DNI
Certifications:
Category:
Information TechnologyFidelity’s hybrid working model blends the best of both onsite and offsite work experiences. Working onsite is important for our business strategy and our culture. We also value the benefits that working offsite offers associates. Most hybrid roles require associates to work onsite every other week (all business days, M-F) in a Fidelity office.
Please be advised that Fidelity’s business is governed by the provisions of the Securities Exchange Act of 1934, the Investment Advisers Act of 1940, the Investment Company Act of 1940, ERISA, numerous state laws governing securities, investment and retirement-related financial activities and the rules and regulations of numerous self-regulatory organizations, including FINRA, among others. Those laws and regulations may restrict Fidelity from hiring and/or associating with individuals with certain Criminal Histories.