Principal Application Engineer (SRE)


Principal Application Engineer (SRE)

In-Office: Riverwoods, Illinois


About This Role

Discover. A brighter future.

With us, you’ll do meaningful work from Day 1. Ourcollaborative culture is built on three core behaviors: We Play to Win, We Get Better Every Day & We Succeed Together.And we mean it — we want you to grow and make a difference at one of the worlds leading digital banking and payments companies. We value what makes you unique so that you have an opportunity to shine.

Come build your future, while being the reason millions of people find a brighter financial future with Discover.

Job Description:

At Discover, be part of a culture where diversity, teamwork and collaboration reign. Join a company that is just as employee-focused as it is on its customers and is consistently awarded for both. We’re all about people, and our employees are why Discover is a great place to work. Be the reason we help millions of consumers build a brighter financial future and achieve yours along the way with a rewarding career.

As a Principal Site Reliability Engineer, you’ll tap into your passion for finding and fixing inefficiencies to solve our reliability and performance issues. You’ll work on projects including CI/CD, improving data monitoring, and work with our internal product group to help build and define our SRE practice within our Rewards Product Area.


Develop and run SRE own tooling and observability using automation like CI/CD, and Kubernetes.

Build monitoring that alerts on symptoms rather than on outages.

Improve operability, latency, capacity planning, change management and improve MTTR (Mean Time to Repair)

Debug production issues across services and levels of the stack.

Plan the growth and reliability of services and end-to-end or performance testing.

Hands on with creating self-healing and/or self-servicing solutions via automation and tooling.

Improve monitoring (data Dog, AppD etc.) and building new smart metrics.

Develop a relationship with a product group and help define their SLO/SLI

Work directly with AppDev to improve product by Non-functional and production readiness.

Be on an on-call rotation to respond to “Code Red” incidents to help restore customer impacting service.

Planning, designing, analyzing, and debugging distributed systemsProvides emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed.

Identifies significant projects that result in substantial improvements in reliability, cost savings and/or revenue.

Influences the product roadmap and works with engineering and product counterparts to influence improved resiliency and reliability of the product.

Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage optimization.

Identify parts of the system that do not scale, provide immediate palliative measures, and drive long term resolution of these incidents.

Identify Service Level Indicators (SLIs) that will align the team to meet the service level objectives.

Minimum Qualifications

At a minimum, here’s what we need from you:

Bachelors – Computer Science or related

6 Years — Information Technology, (Software) Engineering, or related

Internal applicants only: technical proficiency rating of proficient on the Dreyfus engineering scale

Preferred Qualifications

Bonus Points If You Have:

5 years SRE experience

Think about systems: edge cases, failure modes, behaviors, specific implementations.

As an engineer, when you see something broken, you cannot help but fix it.

Have an urge to document everything so you do not need to learn the same thing twice.

Strong knowledge of SDLC (System Development Life Cycle)

Strong knowledge of SRE tool kits such as Datadog, AppD, Moogsoft, or similar tools.

Strong knowledge of git, Docker, Kubernetes, Jenkins, AWS (Amazon Web Services) or similar technologies

Know what the use of configuration management systems like Chef, Ansible

Good understanding of ServiceNow, JIRA or similar reporting tools.

Have strong programming skills in one or more of the following languages: C, Ruby, Python, Shell, Java

Good understanding of hybrid infrastructure

External applicants will be required to perform a technical interview.



The base pay for this position generally ranges between to . Additional incentives may be provided as part of a market competitive total compensation package. Factors, such as but not limited to, geographical location, relevant experience, education, and skill level may impact the pay for this position.


We also offer a range of benefits and programs based on eligibility. These benefits include:

Paid Parental Leave

Paid Time Off

401(k) Plan

Medical, Dental, Vision, & Health Savings Account

STD, Life, LTD and AD&D

Recognition Program

Education Assistance

Commuter Benefits

Family Support Programs

Employee Stock Purchase Plan

Learn more at .

What are you waiting for? Apply today!

All Discover employees place our customers at the very center of our work. To deliver on our promises to our customers, each of us contribute every day to a culture that values compliance and risk management.

Discover is committed to a diverse and inclusive workplace. Discover is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status, or other legally protected status.(Know Your Rights) (