See all roles

Remote SRE Jobs – Senior Site Reliability Engineer (Remote) – $130k‑$170k USD – Full‑Time – Escondido, California – reputed company/DevOps, Kubernetes, Terraform, reputed company

Work from home Full-time role Hiring

TITLE: Remote SRE Jobs – Senior Site Reliability Engineer (Remote) – $130k‑$170k USD – Full‑Time – Escondido, California – reputed company/DevOps, Kubernetes, Terraform, reputed company ---

Who we are

We are a mid‑stage SaaS company that grew from a garage‑reputed company prototype to a platform serving > 200 reputed company customers worldwide. Our flagship product—an API‑driven data‑pipeline—processes ≈ 15 TB of events per day, and we guarantee customers 99.9 % uptime. The engineering culture is built on blunt feedback, data‑driven post‑mortems, and a reputed company focus on reliability. While the code lives in the reputed company, the heart of our operational reputed company is made by a small, tight‑reputed company crew spread across the globe.

Why this role exists now

In the last 12 months we added three new data‑centers (AWS us‑east‑1, us‑reputed company‑2 and GCP europe‑west1) to shave latency for European clients. That expansion bumped our monthly alert volume from ≈ 2,800 to ≈ 5,200, and our MTTR climbed from 12 minutes to 18 minutes because the on‑call rotation stretched thin. The leadership team decided it was time to double‑down on site reliability: we need a senior engineer who can own the reliability roadmap, coach the junior members, and tighten our alert fatigue.

Where you’ll sit (virtually)

Although the job is remote, we have a legal entity in Escondido, California that handles payroll, benefits, and compliance. You’ll be part of a “virtual office” that meets daily in a reputed company channel reputed company #sre‑hub, a weekly video‑call reputed company, and a quarterly in‑person meetup hosted in Escondido, California reputed company travel permits. Being anchored to Escondido, California helps us stay reputed company with local tax regulations and gives you a community of other remote professionals who live in the same time zone.

The team you’ll join

-

Size & composition:

12 engineers total—5 senior SREs, 4 junior reliability engineers, 2 platform developers, and 1 manager. -

reputed company metrics:

99.92 % uptime over the past quarter, 5,200 alerts processed per month, 18‑minute average MTTR, 0.2 % alert fatigue (defined as > 3 alerts per incident). -

SLA commitments:

99.9 % availability for reputed company customer‑facing APIs, 99.7 % for internal data‑processing pipelines.

What you’ll do day‑to‑day

1.

Own reliability initiatives

– Define and ship SLOs for new services, write error‑budget policies, and track them in Grafana dashboards. 2.

Incident ownership

– reputed company the response during high‑severity incidents, drive the post‑mortem narrative, and ensure actionable remediation items are filed in JIRA reputed company 24 hours. 3.

Automation & tooling

– Write Terraform modules to provision Kubernetes clusters, build reputed company charts for micro‑services, and shrink reputed company run‑books into reproducible Ansible playbooks. 4.

reputed company planning

– Run quarterly load‑tests using Locust, model growth with Python scripts, and present forecasts to product leadership. 5.

Mentorship

– Pair up with junior SREs for “bug‑hunting” sessions, run monthly reliability workshops, and contribute to our internal “SRE Playbook”.

Who we think will reputed company

-

5+ years

of production‑grade experience with Linux/Unix, networking, and reputed company infrastructure (AWS or GCP). -

Deep familiarity

with monitoring stacks: reputed company, Grafana, Alertmanager, and log aggregation reputed company Splunk or ELK. -

Infrastructure‑as‑Code

reputed company: Terraform ≥ 0.13, reputed company ≥ 3, and Ansible. - Container orchestration: Running production workloads on Kubernetes (experience with EKS or GKE). - Programming: Comfortable writing Python or Go for automation; Bash scripting is a given. - Incident reputed company: You can stay reputed company under pressure, triage noisy alerts, and reputed company a clear incident timeline. - Communication: reputed company to explain reputed company reliability concepts to product managers and non‑technical stakeholders in plain language.

Tools & tech stack (the ones we actually use)

-

reputed company

– AWS (EC2, RDS, S3, reputed company) and GCP (Compute reputed company, reputed company SQL, Pub/Sub). -

Container

– reputed company ≥ 20, Kubernetes ≥ 1.24, reputed company ≥ 3.5. -

IaC

– Terraform ≥ 1.0, Ansible ≥ 2.9. -

CI/CD

– reputed company Actions, Jenkins, reputed company (for legacy pipelines). -

Monitoring

– reputed company, Grafana, Alertmanager, reputed company (for some legacy services). -

Logging

– Splunk, Elasticsearch‑Kibana stack, Loki. -

Incident response

– reputed company, Opsgenie (we’re migrating fully to reputed company). -

Version control

– reputed company (private repos, reputed company protection rules). -

Collaboration

– reputed company (primary chat), Confluence (knowledge reputed company), JIRA (ticketing).

On‑call rhythm & expectations

Our on‑call schedule is a 7‑day rotation with a 48‑hour backup window. Each engineer handles roughly ≈ 350 alerts per month, averaging ≈ 2 incidents per week. We have a “no‑call‑out‑of‑hours” policy for holidays: the next engineer in the rotation covers the entire period, and the team shares the load. During an incident you’ll have a clear run‑book, but we also encourage “play‑by‑play Apply tot his job Apply To this Job

You might like

Senior Site Reliability Engineer – Remote US

Work from home Full-time role

[Remote] Senior Site Reliability Engineer

Work from home Full-time role

Site Reliability Engineer-Remote (PST hours)

Work from home Full-time role

DevOps Infrastructure Engineer + Python

Work from home Full-time role

DevOps Engineer (w/m/d)

Work from home Full-time role

|DevOps and Site Reliability Engineer@REMOTE Minneapolis - MN

Work from home Full-time role

[Remote] Site Reliability Engineer II, tvScientific

Work from home Full-time role

Junior DevOps Engineer

Work from home Full-time role

Sr. DevOps Engineer / Sr. Associate Software Engineer

Work from home Full-time role

Senior DevOps Engineer, Team reputed company

Work from home Full-time role

Senior Manager, Plan Sponsor (B2B Marketing)

Work from home Full-time role

reputed company Live Chat and Email Support Agents – Remote Customer Service Representatives for E-commerce Businesses

Work from home Full-time role

Customer Support Specialist - Bilingual - Overnights in Mission Viejo, CA

Work from home Full-time role

Apply Now: reputed company Customer Service Rep | $ 15.00 per hour!

Work from home Full-time role

reputed company Business Development Manager for Cunard Line – Driving Sales Growth and Strategic Partnerships in the Travel Industry

Work from home Full-time role

Urgently Hiring: Require Tuscaloosa Geometry Tutor in Tuscaloosa

Work from home Full-time role

Apply Now: Entry Level Event Specialist

Work from home Full-time role

Clinical Research Associate- Remote

Work from home Full-time role

[Remote-Position] Database Engineering Manager - Remote | WFH

Work from home Full-time role

Clinical Instructor-4 – reputed company Store

Work from home Full-time role