[Remote] AI Engineer
Note: The job is a remote job and is open to candidates in USA. In Tandem is a company focused on building technology to help families manage their routines and navigate transitions. They are seeking an AI Engineer to maintain and optimize their AI infrastructure, run self-hosted inference stacks, and develop user-facing features that assist families in coordinating their daily activities.
Responsibilities
- Run and optimize our self-hosted inference stack
- Run the inference serving layer on our own GPU hardware: choose and tune the serving stack (vLLM, SGLang, TensorRT-LLM) for high throughput and low latency
- Optimize aggressively: tensor parallelism, quantization (FP8, AWQ, GPTQ), KV-cache and prefix caching, continuous batching, speculative decoding, concurrency tuning
- Serve multiple models and features off shared hardware: multi-LoRA, routing, and request scheduling that balances internal workloads against latency-sensitive product traffic
- Make our AI workloads efficient: improve latency, throughput, and GPU utilization so we get the most out of what we run
- Build the visibility: instrument performance and usage across our AI surfaces so there's clear data on how everything is running
- Surface the technical tradeoffs (performance, latency, efficiency) so the people making the calls have what they need to make them
- Ship the in-app agent layer that helps families coordinate: proactive nudges, smart suggestions, agents that summarize, draft, schedule, and act for busy parents
- Build the substrate underneath: tools, memory, orchestration, guardrails, and evaluation harnesses, integrated cleanly with production APIs alongside our architecture team
- Work in nimble pairs with feature owners, standing up whatever's needed to test an idea, including a vibe-coded UI when that's the fastest path to a real customer. Ship rough, learn fast, harden what works
Skills
- 5+ years shipping production software, including meaningful applied AI or ML work
- Demonstrated experience running and optimizing self-hosted LLMs on dedicated multi-GPU hardware: a serving stack (vLLM, SGLang, or TensorRT-LLM) and the optimization that comes with it (tensor parallelism, quantization, batching, KV cache)
- A track record of optimizing inference performance and efficiency (latency, throughput, GPU utilization)
- Strong Python and engineering fundamentals, with the full-stack range to stand up a quick UI, and the genuine desire to work app-layer features and not only infra
- Hands-on with agent frameworks (Claude Agent SDK, LangGraph, or similar), LLM APIs, embeddings, and RAG
- Comfortable with AWS and the devops this role owns: Docker, CI/CD, monitoring, and observability
- Experience building internal tooling or platforms others depend on. Bonus for Slack apps, MCP, or agent orchestration at team scale
Benefits
- Medical: In Tandem pays 100% of the premium for employees AND 99% for all additional family members
- 401k: Up to a 4% match with immediate vesting
- Paid leave for all new parents
- Learning & Development stipend for employees
- Paid Time Off: 11 Holidays + Winter Break (3 Days) + Volunteer Time Off (1 Day) + Floating Holiday (1 Day)
- Personal Time Off: 15 days for 0-1 years of employment, 20 days 1-3 years of employment
- Supportive and flexible working environment – work from anywhere!
Company Overview