About the Role
Grainger's Product Discovery team seeks a seasoned Manager, Applied ML Scientist to drive cutting-edge generative AI solutions. We're building AI agents that assist our human customer service agents in real-time during customer phone calls. The AI agents need to surface product information, detect customer sentiment, recommend next-best-action, and automate post-call documentation. This role is to lead the team building the ML systems that power these capabilities; we are looking for someone equally comfortable leading a team and building themselves.
You'll work at the intersection of real-time ML inference, event-driven architectures, and contact center operations. The voice channel is in early-stage development, so you'll have significant influence over architectural decisions and technical direction.
Chicago, IL is the preferred location with a hybrid work environment (2 days per week in the Merchandise Mart downtown Chicago office). We will also consider highly qualified remote candidates who are willing to travel to Chicago for onboarding and occasional team meetings. Onsite onboarding will be required.
You Will:
Stand up and develop a team to support the customer service voice project
Work with Product leaders to understand business objectives and communicate those to your team
Manage relationships with Software Engineering, Machine Learning Operations and Data Engineering
Design, train, and deploy ML models for voice-specific use cases: real-time intent classification, sentiment/tonality detection, escalation prediction, and conversational Q&A
Build and optimize production inference pipelines with tight latency requirements
Develop event-driven data pipelines that process streaming transcription data from Genesys through EventBridge into persistent state stores
Implement model monitoring, evaluation frameworks, and drift detection for voice-specific metrics
Collaborate with Software Engineering on API design, WebSocket integrations, and UI data contracts
Build automated retraining pipelines using call outcome feedback and human-labeled escalation data
You Have:
5+ years in ML Engineering/Applied ML with at least 2 years deploying models to production at scale
Hands-on experience with LLM/SLM fine-tuning and prompt engineering depending on latency/cost tradeoffs
Production model serving experience: Triton Inference Server, vLLM, TensorRT, or similar low-latency serving infrastructure
Strong Python fundamentals with experience in PyTorch for model development
Agentic AI frameworks: LangGraph, LangChain, or custom agent orchestration
Event-driven architecture experience: Kafka, AWS EventBridge, SQS, or similar message/event systems
Data pipeline development with Spark, Airflow, or equivalent for batch processing and dataset creation
AWS fluency: Lambda, EventBridge, SQS, ElastiCache, Aurora, S3 or equivalent cloud ML stack
Experience with MLOps tooling: MLflow, Weights & Biases, or similar for experiment tracking, model registry, and monitoring
Preferred:
Prior experience building out Machine Learning teams
Contact center or telephony domain experience (Genesys, Amazon Connect, Twilio, Five9)
Speech-to-text / ASR systems and handling transcription noise in downstream models
Real-time streaming ML (as opposed to batch-only)
Infrastructure-as-Code (Terraform) and CI/CD for ML systems
Distributed training (e.g. DeepSpeed) for fine-tuning larger models