[Remote] Senior Engineering Manager, Compute
Note: The job is a remote job and is open to candidates in USA. Temporal Technologies is an innovative company focused on simplifying code and enhancing developer experiences through their open-source programming model. They are seeking a Senior Engineering Manager for their Compute team to lead the development of a reliable compute layer for AI workloads, ensuring operational excellence and strategic direction.
Responsibilities
- Strategic direction for Compute: Own the strategy and standards of excellence for the compute layer that the world's agents run on, across design, delivery, and operations. Build a culture of ownership, quality, and customer-first decision-making
- Technical leadership: Lead, hire, and grow a high-ownership team; roll up sleeves, ready to do deep into the trenches, by staying close to design docs and code, rather than managing from a distance. Coach engineers, level them up, and clear the friction that slows them down
- Roadmap & trajectory: Drive the arc from today's compute toward the next-generation of compute platforms. Ground prioritization in customer and design-partner feedback, and turn ambiguous, fast-moving requirements into predictable, iterative delivery
- Operational excellence: When you run frontier AI in production, reliability *is* the product. Own operations, run on-call and incident response, and drive blameless postmortems and the systemic fixes that prevent recurrence
- Technical depth: Guide the hard architectural decisions for large-scale, multi-tenant compute, where technical concerns cut across workload isolation and security, scheduling, fleet efficiency / utilization / goodput, and performance, while ensuring the platform is reliable and efficient for the workloads that depend on it
- Capacity, supply & economics: Own utilization, capacity and supply planning, and the cost-per-unit-of-compute and margin profile of the fleet, across CPU compute today and accelerated compute ahead
- Cross-team & customer execution: Partner with leadership, Product, SDK, UX/DX, Security, and design-partner customers to align priorities and unblock delivery. Communicate progress, tradeoffs, and risk clearly to technical and non-technical audiences alike
Skills
- Proven experience leading software engineering teams that build and operate large-scale compute platforms or fleets, with strong operational practices
- 12+ years in software and/or infrastructure engineering, including 7+ years of people management and demonstrated ownership of delivery and live-site outcomes
- Deep distributed-systems and compute infrastructure depth, with the hands-on judgment to guide architecture and execution rather than from a distance
- Experience operating multi-tenant compute that other people's production workloads depend on
- Bachelor's degree in Computer Science or related field, or equivalent practical experience; advanced degree a plus
- Excellent communication skills, with the ability to partner across engineering, product, and leadership and fold customer feedback into the roadmap
- Strong leadership, coaching, and performance management; ability to grow engineers and build a healthy, accountable, high-ownership team
- Excellence in execution: planning, prioritization, and delivering iterative milestones in an ambiguous, fast-moving environment while managing unplanned work
- Fleet thinking: utilization, goodput, capacity and supply planning, and cost discipline as first-class engineering concerns
- Live-site reliability craft: on-call, incident management & response, and postmortem-driven continuous improvement
- Strong command of the building blocks of a compute platform: multi-tenant isolation and security, scheduling, and resource management
- Ability to review and raise the bar on technical artifacts (design docs, code reviews) across a distributed-systems codebase
- MicroVMs and virtualization (Firecracker, gVisor, Edera) or managed-compute primitives (AWS Fargate, GCP Cloud Run, AWS Lambda), and/or Kubernetes internals
- Building serverless or hosted-compute products from 0 to 1, including the rapid-delivery-vs-durable-platform tradeoffs that come with it
- Multi-cloud delivery across AWS and GCP
- Cold-start, warm-pool, and scheduling/latency optimization for on-demand compute
- Agent sandboxes, secure execution of untrusted code, or other AI-agent infrastructure
- GPU / accelerated compute: fractional GPUs (MIG, MPS, time-slicing), GPU scheduling, training vs. inference fleets, and multi-tenant GPU isolation
Benefits
- This role is eligible to participate in Temporal's equity plan
- Unlimited PTO, 12 Holidays + 2 Floating Holidays
- 100% Premiums Coverage for Medical, Dental, and Vision
- AD&D, LT & ST Disability, and Life Insurance (Standard & Supplemental Available)
- Empower 401K Plan
- Additional Perks for Learning & Development, Lifestyle Spending, In-Home Office Setup, Professional Memberships, WFH Meals, Internet Stipend and more!
- Paid Time Off (PTO) and Benefits outside the United States vary by country, and are issued in partnership with Remote.com.
- Temporal offers perks to all international employees for learning & career development, a lifestyle spending account, in-home office setup (in addition to company-issued hardware), professional memberships, work-from-home meals, and access to the Calm app for mental wellness.
- Occasional travel may be required for company events, team offsites, and other meaningful moments that bring us together.
- $3,600 / Year Work from Home Meals
- $1,800 / Year Professional Enrichment (Career Development & Professional Memberships)
- $1,200 / Year Lifestyle Spending Account
- $1,000 / Year In-Home Office Setup (In addition to Temporal issued equipment - laptop, monitor, keyboard, mouse, trackpad, and extension power cable at no cost to you)
- $74 / Month Reimbursement for Internet
- Calm App Subscription for Mental Health & Wellness
Company Overview
Company H1B Sponsorship