About the Role
QA & Performance Engineering Lead (AI/LLM Focus)The RoleWe are seeking a high-caliber QA and Performance Engineering Lead to spearhead the testing strategy for enterprise-grade AI and LLM solutions. In this role, you will define the architecture for functional, non-functional, and performance testing, ensuring that complex AI agent workflows and large-scale applications meet the highest standards of reliability and compliance. You will act as a bridge between traditional QA excellence and the cutting-edge requirements of GenAI evaluation.Core Responsibilities & Technical ExpertiseStrategic QA Leadership: Leverage 10 years of experience leading enterprise-wide testing initiatives within Fortune 500 environments to design comprehensive QA architectures.AI/LLM Specialized Evaluation: Implement advanced metrics for model assessment, including BLEU, ROUGE, perplexity, and specialized scoring for hallucination and grounding rates.Performance & Resilience Engineering: Build frameworks for load, stress, and chaos testing to ensure system stability under extreme conditions and peak workloads.Automation & Orchestration: Engineer robust CI/CD test pipelines using Azure DevOps or GitHub Actions, focusing on automated API testing (Pytest/Postman) and integrated test harnesses.Agentic Workflow Validation: Design testing strategies for multi-step AI agents, covering tool chaining, orchestration, and context injection accuracy.Data Governance & Compliance: Apply deep knowledge of data lineage (Purview/Unity Catalog) and maintain strict traceability and auditability standards required in regulated industries.Lifecycle Management: Oversee model release gates, registry promotions, and the management of synthetic datasets and versioning.Key DeliverablesUnified Testing Framework: A standardized taxonomy and coverage model spanning unit, integration, E2E, and AI agent workflows.AI Evaluation Suite: A comprehensive suite for validating model consistency, toxicity, and correctness, supported by Proof-of-Concept (PoC) validations.Automated Performance Harness: Scalable workload models designed for peak-load scenarios and resiliency benchmarking.Smart Quality Gates: Automated pass/fail scoring mechanisms embedded directly into release pipelines across all quality dimensions.Advanced Observability: Implementation of 'Golden Dashboards' tracking real-time metrics such as latency-per-thought, grounding quality, and functional pass rates.Professional ProfileExpertise in Enterprise QA Architecture (Functional Non-functional Performance).Deep understanding of ML/LLM lifecycle and model promotion pipelines.Strong background in Regulated Industries (ensuring compliance and audit readiness).Hands-on experience with Synthetic Data generation and dataset versioning.