About the Role
In this role, you will lead the development of a comprehensive High Temperature Operating Life (HTOL) Test Software system. Your work will involve designing, implementing, and maintaining a scalable multi-chassis testing platform that performs automated stress and performance testing with real-time monitoring and comprehensive data collection capabilities.
Responsibilities
System Design & Development: Architect, build, and maintain scalable architecture for a multi-chassis HTOL testing system.
Orchestration: Develop containerized applications for deployment at scale using Python-based services for chassis coordination and management.
Hardware Monitoring & Management: Create hardware abstraction layers and develop APIs that represent hardware systems, providing essential capabilities for monitoring and management of those systems.
Manage Data: Develop data collection pipelines handling sensor data and performance metrics.
Deploy and Update Software: Create automated deployment and testing pipelines using CI/CD best practices.
Collaboration with Front-End Teams: Work closely with the frontend team to ensure seamless integration of backend APIs with applications.
Testing & Documentation: Write automated tests to monitor the reliability and performance of the system; maintain clear and concise documentation for troubleshooting.
Performance and Reliability: Continuously monitor and optimize performance to reduce response times and improve system scalability; ensure uptime in production environments; establish capacity planning procedures.
Required Skills
BS and 12+ years of experience or MS and 8+ years of experience; degree in Computer Science, Electrical Engineering, or related field.
Expert level Python, knowledge of web frameworks such as FastAPI, Flask, Django; strong understanding of API design principles and best practices.
Experience with containerization and orchestration technologies such as Docker and Docker Compose.
Experience with one or more databases such as MongoDB, PostgreSQL, Redis, time-series databases.
Familiarity with testing frameworks such as pytest and integration testing, performance testing tools.
Experience with CI/CD tools such as GitHub Actions/Runners and Infrastructure as Code tools such as Ansible.
Experience with hardware integration or embedded systems; interfacing with BMCs, FPGAs, temperature sensors, thermal management, power management systems.
Nice-to-have skills
Familiarity with real-time data handling and communication protocols, such as gRPC, TCP/IP, WebSockets, message brokers or similar technologies.
Experience with high-availability, mission-critical systems.
Experience in the Semiconductor Industry: wafer-level testing, burn-in systems, reliability testing.
Professional Certifications: Agile/Scrum certifications.
Experience building backend services for web applications like Next.js, proficiency in JavaScript/TypeScript.
Tech Stack
PythonFastAPIFlaskDjangoAPI designDockerDocker ComposeMongoDBPostgreSQLRedispytestintegration testingperformance testingCI/CDGitHub ActionsAnsiblehardware integrationembedded systemsBMCsFPGAstemperature sensorsthermal managementpower management systems