Job Description

AI Inference Software Engineer — Stealth AI Systems Startup

Base Salary Range: $200,000-$300,000

Location: San Fransisco (Onsite)

A stealth-stage AI systems company is redefining the performance boundaries of inference at scale. As generative AI models become larger and more complex, inference is emerging as the core bottleneck in production environments. This team is building a vertically integrated stack—from low-level GPU kernels to developer-friendly APIs—that dramatically improves inference speed, efficiency, and scalability.

Spun out of cutting-edge academic research and backed by deep industry experience across distributed systems, machine learning infrastructure, and hardware design, they are focused on enabling production-grade AI with minimal latency and maximal throughput. Their platform integrates seamlessly with modern ML frameworks like PyTorch and LangChain, allowing teams to deploy and monitor workloads in seconds.

They are looking for a Software Engineer focused on AI inference performance to help build and optimize the core runtime infrastructure powering these systems. This role sits at the intersection of deep learning, systems engineering, and GPU performance.

What You’ll Do

Implement and evaluate advanced inference optimization techniques, including quantization, KV caching, and FlashAttention
Design and build systems for distributing inference workloads efficiently across multiple GPUs and nodes
Profile and benchmark large-scale models to identify bottlenecks across the software and hardware stack
Optimize CUDA kernels and GPU memory usage to improve performance across a wide variety of AI models
Collaborate closely with research and systems engineers to push the limits of model serving infrastructure

What They’re Looking For

Proficiency with CUDA and experience writing or optimizing GPU kernels
Strong background in Python and C++ development
Hands-on experience with PyTorch, TensorFlow, or similar deep learning frameworks
Knowledge of distributed systems or model-serving platforms at scale
Familiarity with performance tuning, benchmarking tools, and profiling techniques

Nice to Have

Graduate degree in computer science, engineering, or a related field
Experience with compiler frameworks such as MLIR or Triton
Exposure to vLLM, ONNX, or custom model runtimes

This is a rare opportunity to work on core infrastructure for AI systems at a team solving some of the hardest performance challenges in the field.

Job Tags

Similar Jobs

Vestas Wind Systems

Wind Turbine Service Site Technician II-Caney River, Kansas Job at Vestas Wind Systems

Vestas Wind Technology, Inc. Technician II Caney River, KS about 1 hour South/East of Wichita, KS WHO WE ARE At Vestas, we... ..., and potential main component or blade work on wind turbines. During a typical day, you will work at heights and in confined...

ECLARO

Associate Scientist Job at ECLARO

JOB TITLE: Assistant Scientist- Viral Vector PD Pilot Lab Location: Seattle, WA (100% onsite) Duration: 12 months initial (potential extension/potential right to hire) Description We are seeking an enthusiastic, self-driven individual to join the Gene Delivery...

Jet Aviation

Line & Ramp Services Agent Job at Jet Aviation

...Responsibilities for this Position Line & Ramp Services Agent Location: West Palm Beach, FL, US, 33406 Job Category: Fixed Base Operations... ...operating permit (MVOP) within 30-60 days Must obtain an Airport ID Badge within 30 days of employment Must be at least 18...

Ultimate Staffing

Medical Insurance Collection Specialist Job at Ultimate Staffing

...Staffing Services is actively seeking a detail-oriented Medical Insurance Collections Specialist to join their client's team in Florida.... ...adjustments. Communicate professionally with insurance companies, patients, and internal teams to clarify billing issues and resolve...

EA Sween Company

Class B Class B Route Delivery Driver Job Job at EA Sween Company

Class B Class B Route Delivery Driver JobE.A. Sween Company, also dba Deli Express, operates a Combined Distribution Center in Denver, Colorado. We... ...Mustbe at least 18 years of age to operate company route truck, comply with D.O.T. regulations and have a valid...

AI Performance Software Engineer Job at Signify Technology, San Francisco, CA

dnlqSFFnQUdTQkVVQ0F4eVlwWE9OWmYvTFE9PQ==