Principal Site Reliability Engineer

arcadia · Remote (USA)

Posted about 1 month ago

Apply in seconds with Jobply
One-click apply on Workday, Greenhouse, Lever & 50+ ATS systems
Apply with Jobply →

Skills

Site Reliability EngineeringSystem ReliabilityPlatform EngineeringIncident ResponseSLOsError BudgetsAutomationSelf-service WorkflowsGitOpsObservabilityResilienceDisaster RecoveryInfrastructure SecurityData TransformationPHI Handling

Job description

Arcadia is dedicated to happier, healthier days for all. We believe that there is a better healthcare world – one powered by data. Our platform transforms complex, diverse data into a unified foundation for health, helping organizations deliver better care, boost revenue, and lower costs.

We’re a team of fiercely driven individuals committed to making healthcare more sustainable—and we’re looking for passionate people to help us get there.
 
For more information, visit arcadia.io

Why This Role is Important to Arcadia 

Love building reliable systems, and want to make a difference?

Arcadia’s customers rely on us to securely process and deliver high-value healthcare insights. Reliability, availability, performance, and security are foundational to trust—especially when systems support critical workflows and handle PHI. As a Principal Site Reliability Engineer, you’ll set reliability strategy across teams, drive cross-cutting platform improvements, and ensure we can scale delivery without scaling operational burden.
 
What Success Looks Like
In 3 months

Build deep context on Arcadia’s platform, production risks, and operational practices. Participate in on-call/incident response and quickly improve signal quality for at least one critical domain (dashboards, alerts, traces, runbooks). Identify a high-leverage reliability initiative and align stakeholders on scope, success metrics, and milestones.

In 6 months

Establish SLOs/error budgets for key customer journeys, drive operational readiness standards for launches, and lead remediation for recurring incidents with measurable reductions in customer impact and MTTR. Deliver major toil-reduction improvements via automation and self-service workflows.

In 12 months

Own and execute a reliability program with cross-org impact (e.g., GitOps delivery guardrails, observability platform evolution, resilience/DR improvements, or secure infrastructure controls). Influence architecture decisions, establish org-wide operational standards, and mentor Staff engineers—raising the reliability and security bar across Arcadia.

Stop filling out the same form 100 times.

Install the free Jobply Chrome extension and auto-apply to Principal Site Reliability Engineer and 300,000+ other live jobs across Workday, Greenhouse, Lever, and 50+ other ATS systems.

Apply with Jobply — Free
✓ Free forever✓ No credit card✓ 4.8★ from 12k+ users