SENIOR AI INFRASTRUCTURE PERFORMANCE ENGINEER
(Download this content as a PDF)
TITLE: SENIOR AI INFRASTRUCTURE PERFORMANCE ENGINEER
LOCATION: UNITED STATES — REMOTE
Who We Are:
GigaIO has invented the first truly composable universal dynamic AI memory fabric, empowering users to accelerate AI and engineering/scientific workloads on-demand through our revolutionary SuperNODE architecture. As a global leader in accelerated infrastructure, our open platform helps users quickly deploy leading edge infrastructure to help them capitalize on all the ways AI will move their businesses forward.
Does getting in on the ground floor of a data center technology that is disrupting AI and HPC computing make your heartbeat a little faster? Does the excitement of joining a team of exceptionally talented and motivated technologists at a well-funded startup sound attractive? If so, please read on.
What You Will Do:
We are seeking a research-driven Senior AI Infrastructure Performance Engineer to contribute to the design, development, and optimization of high-performance AI systems across our rackscale and edge product lines. In this hands-on role, you will analyze, benchmark, and tune AI infrastructure to achieve best-in-class efficiency, scalability, and reliability in production environments.
The ideal candidate brings deep technical expertise in AI infrastructure, data center, and HPC operations, and performance engineering, along with a research-oriented mindset for solving complex system-level challenges.
Key Responsibilities:
- Infrastructure Design & Development: Contribute to the architecture and development of AI infrastructure solutions that power enterprise-scale AI and HPC workloads, ensuring scalability, reliability, and performance.
- Performance Optimization: Analyze, benchmark, and tune rackscale and edge systems to maximize throughput, efficiency, and utilization across diverse workloads.
- HPC Systems Engineering: Engineer and optimize high-performance computing environments, leveraging advanced interconnects, GPU acceleration, and composable infrastructure.
- Cross-Functional Collaboration: Partner with AI researchers, data scientists, and platform engineers to align infrastructure performance with evolving AI model and application requirements.
- Research & Innovation: Apply and extend GigaIO’s patented technology to tackle emerging AI infrastructure challenges, contribute to internal R&D initiatives, and share insights through publications or technical conferences.
Qualifications:
- Education: Master’s or Ph.D. in Computer Science, Computer Engineering, or a related field.
- Experience: 2+ years of experience in HPC, AI infrastructure, or performance engineering.
Core Technical Expertise:
- CUDA or ROCm: Strong proficiency in GPU programming and optimization.
- Large Language Models (LLMs): Understanding of distributed training, inference scaling, and system-level requirements.
- PyTorch: Experience developing and optimizing models using PyTorch.
- Scaling Compute: Proven ability to scale and manage compute resources for large-scale AI workloads.
Nice to Have:
- Experience with AI communication libraries (e.g., NCCL, RCCL)
- Familiarity with AI serving frameworks (e.g., vLLM, SGLANG)
- Exposure to Linux kernel driver development or low-level system performance tuning
What We Offer:
GigaIO has an excellent compensation (including equity) and benefits package and we have built a caring, unique culture that focuses on providing development opportunities for all of our employees. Apply to this role and learn more about why you want to work at GigaIO!