About the Role
<p><strong>Platform / Site Reliability Engineer (SRE)</strong><br />
<br />
Our client is transforming industries through cutting-edge technology. Their platform leverages AI, automation, and scalable systems to solve complex real-world problems.</p>
<p>As a Platform / Site Reliability Engineer (SRE), you will play a key role in establishing and enhancing the engineering platform. You’ll help ensure the reliability, scalability, and efficiency of our systems while developing tools that improve engineering productivity.</p>
<p>You will help define and shape the platform strategy, set best practices, and drive initiatives that enhance developer experience, system performance, and operational efficiency.</p>
<hr />
<p><strong>What You’ll Be Doing</strong></p>
<ul>
<li>
<p><strong>DevOps & Infrastructure</strong>: Design, implement, and maintain scalable infrastructure to support engineering needs.</p>
</li>
<li>
<p><strong>CI/CD Optimization</strong>: Improve continuous integration and deployment pipelines using AWS CDK, including requirements for deployment and database migration tooling.</p>
</li>
<li>
<p><strong>Release Tracking & Deployment</strong>: Establish visibility into release cycles, implement automation to streamline deployments, and ensure smooth rollouts.</p>
</li>
<li>
<p><strong>Site Reliability & Observability</strong>: Implement monitoring, logging, and alerting systems to ensure high availability and performance.</p>
</li>
<li>
<p><strong>Internal Tooling</strong>: Build and maintain tools that improve developer efficiency, automate repetitive tasks, and enhance productivity.</p>
</li>
<li>
<p><strong>Security & Compliance</strong>: Ensure infrastructure and deployments align with security best practices, with attention to SoC, ISO, and GDPR standards.</p>
</li>
</ul>
<hr />
<p><b>Experience</b></p>
<ul>
<li>
<p>7+ years of technical experience, with 5+ years as an SRE or similar role. Startup experience is a plus.</p>
</li>
<li>
<p>Deep expertise in AWS, including Fargate and Kubernetes for container orchestration.</p>
</li>
<li>
<p>Strong experience with CI/CD pipelines, particularly using AWS CDK.</p>
</li>
<li>
<p>Proficiency with observability tools (Datadog, Prometheus, Grafana).</p>
</li>
<li>
<p>Strong knowledge of scaling strategies and highly available architectures.</p>
</li>
<li>
<p>Proficiency in scripting/automation with Python, Bash, or TypeScript.</p>
</li>
<li>
<p>Familiarity with security best practices and compliance frameworks (SoC, ISO, GDPR).</p>
</li>
<li>
<p>Strong collaboration skills and ability to work cross-functionally.</p>
</li>
</ul>
<hr />
<p><strong>Our Tech Stack</strong></p>
<ul>
<li>
<p><strong>Infrastructure</strong>: AWS, Fargate, Redis, PostgreSQL, SQS, CDK, GitHub, Retool</p>
</li>
<li>
<p><strong>Backend</strong>: Django REST framework, Celery</p>
</li>
<li>
<p><strong>Frontend</strong>: Next.js, Tailwind CSS</p>
</li>
<li>
<p><strong>LLM Integrations</strong>: OpenAI, Claude, AWS Bedrock</p>
</li>
</ul>