Job Description:
Principal Cloud Engineer
The Role
As a Principal Cloud Engineer, you blend deep public Cloud (AWS & Azure) Container Platform services, Kubernetes services experience and resiliency engineering expertise with a passion for delivering results. Our Container platform management engineering group within Enterprise Infrastructure DCS(Distributed Compute and Storage) combines Container Platform Operations Excellence with the Development Experience to deliver services at high scale, high availability with resilience by using automation and Infrastructure Code. We build reliability into our ecosystem by applying best practices in Resiliency Engineering, Automation, Observability & Chaos Testing.
This individual should have hands-on experience on developing and maintaining CICD pipelines, EKS support and maintenance, Maintaining Stash repository, automating manual processes and release process improvement. This individual will be partnering with product delivery teams across the firm to help them with onboarding applications onto the Cloud, providing suggestions and helping with the right infrastructure configurations on the cloud. He/she will be actively involved in the transformation of the support operating model with an aggressively changing application infrastructure from On-prem and AWS native model to Kubernetes (EKS).
- Creating scalable solutions and automation to monitor the Container Platform health and establish signals to drive understanding of our public Cloud and Container Platform environments.
- Strengthening event based data-driven operational processes with Fidelity support and incident management teams for our cloud ecosystem.
- Maintain Kubernetes platform, EKS and AKS clusters creation, rehydration services across enterprise and help with Helm charts, Deployments.
The Expertise and The Skills You Bring:
- Bachelor’s Degree or equivalent in a technology-related field (e.g. Computer Science, Engineering, etc.) required.
- Production experience running Kubernetes workloads EKS/AKS/RKS(Rancher)
- Experience managing and maintaining Kubernetes Clusters on EKS/AKS and RKS.
- Experience creating and deploying Helm charts & libraries
- 10+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale.
- 8+ years of experience in Cloud development (AWS/Azure) and migration skills
- Experience building and deploying Docker images including Docker Compose
- Hands-on experience with Jenkins Core, including authoring and maintaining declarative CI/CD pipelines and libraries
- Proficiency with UNIX operating systems and shell scripting
- Programming experience, Python and/ or Golang (both preferred).
- Experience with distributed version control systems, Git preferred.
- Experience crafting and maintaining logging, monitoring, and alerting capabilities using tools like Datadog and Splunk
- 10+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation
- Practical experience in building cloud hosted and native applications for the enterprise. Maintains a deep understanding of a wide variety of AWS/Azure services that support reliability, observability, and automation/orchestration.
- Possess at least one cloud certification (AWS or Azure preferred).
- Experience in incident/crisis management and supporting mission critical applications.
- Possess at least one cloud certification (AWS or Azure preferred).
- Experience in incident/crisis management and supporting mission critical applications.
- Solid understanding of Cloud Computing and DevOps concepts including CI/CD pipelines
- Hands-on Kubernetes skills and knowledge Hands on experience with one or more observability tools (Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.)
- Ability to automate with various scripting languages (Python, Shell scripting, etc.)
- Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef)
- Ability to triage, execute root cause analysis, and be decisive under pressure.
- Experience managing and interpreting large datasets using query languages and visualization tools (Power BI/ Tableau), data driven approach to analyze cloud health events and alerts at scale.
The Team:
You work in partnership with the Fidelity Container Platform teams, other observability & resiliency, reliability engineering teams across enterprise. You will work with development teams across Fidelity to understand their usage and need of cloud services and drive reliable and intelligent adoption and ongoing usage of cloud.
The base salary range for this position is $107,000-216,000 USD per year.Placement in the range will vary based on job responsibilities and scope, geographic location, candidate’s relevant experience, and other factors.
Base salary is only part of the total compensation package. Depending on the position and eligibility requirements, the offer package may also include bonus or other variable compensation.
We offer a wide range of benefits to meet your evolving needs and help you live your best life at work and at home. These benefits include comprehensive health care coverage and emotional well-being support, market-leading retirement, generous paid time off and parental leave, charitable giving employee match program, and educational assistance including student loan repayment, tuition reimbursement, and learning resources to develop your career. Note, the application window closes when the position is filled or unposted.
Please be advised that Fidelity’s business is governed by the provisions of the Securities Exchange Act of 1934, the Investment Advisers Act of 1940, the Investment Company Act of 1940, ERISA, numerous state laws governing securities, investment and retirement-related financial activities and the rules and regulations of numerous self-regulatory organizations, including FINRA, among others. Those laws and regulations may restrict Fidelity from hiring and/or associating with individuals with certain Criminal Histories.
Most roles at Fidelity are Hybrid, requiring associates to work onsite every other week (all business days, M-F) in a Fidelity office. This does not apply to Remote or fully Onsite roles.