About the Role
<span><span><b>Job Title: Machine Learning Engineer – MLOps Lead<br />
Duration: Contract role<br />
Location: Remote, United States</b></span></span><br />
<br />
<span><span><b>Role Mission</b></span></span><br />
<span><span>You are being hired to productionize machine learning at scale — eliminating fragile pilot models, building hardened MLOps pipelines, and delivering compliant, monitored, and continuously improving ML systems that directly support business operations.</span></span><br />
<span><span>Your success is measured not by “knowing tools,” but by deploying, stabilizing, and scaling real ML systems in production.</span></span><br />
<br />
<span><span><b>First-Year Outcomes (What You Must Deliver)</b></span></span><br />
<span><span><b>Within First 30 Days</b></span></span>
<ul>
<li class="MsoNoSpacing"><span><span>Fully assess current ML pipelines, data flows, and deployment architecture</span></span></li>
<li class="MsoNoSpacing"><span><span>Identify top 3 reliability, security, and performance risks in current ML lifecycle</span></span></li>
<li class="MsoNoSpacing"><span><span>Produce a documented MLOps modernization roadmap</span></span></li>
</ul>
<br />
<span><span><b>Within 90 Days</b></span></span><br />
<span><span><b>You will:</b></span></span>
<ul>
<li class="MsoNoSpacing"><span><span>Stand up standardized CI/CD pipelines for model training, validation, and deployment</span></span></li>
<li class="MsoNoSpacing"><span><span>Implement automated monitoring, alerting, and versioning across active production models</span></span></li>
<li class="MsoNoSpacing"><span><span>Deploy at least one business-critical ML model into hardened production pipelines</span></span></li>
<li class="MsoNoSpacing"><span><span>Establish security, audit, and compliance controls for model governance</span></span></li>
<li class="MsoNoSpacing"><span><span>Reduce model deployment cycle time by 30–50%</span></span></li>
</ul>
<br />
<span><span><b>Within 180 Days</b></span></span><br />
<span><span><b>You will:</b></span></span>
<ul>
<li class="MsoNoSpacing"><span><span>Operate a fully standardized enterprise MLOps framework (MLflow/Kubeflow/Airflow based)</span></span></li>
<li class="MsoNoSpacing"><span><span>Enable continuous retraining and automated rollback capability</span></span></li>
<li class="MsoNoSpacing"><span><span>Achieve ≥ 99.5% model uptime</span></span></li>
<li class="MsoNoSpacing"><span><span>Establish retraining cadence that improves model accuracy and reliability quarter-over-quarter</span></span></li>
<li class="MsoNoSpacing"><span><span>Mentor junior engineers and codify ML engineering standards</span></span></li>
</ul>
<br />
<span><span><b>Ongoing Success Metrics</b></span></span>
<table class="Table">
<thead>
<tr>
<td><span><span><b>Metric</b></span></span></td>
<td><span><span><b>Target</b></span></span></td>
</tr>
</thead>
<tbody>
<tr>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>Production model uptime</span></span></li>
</ul>
</td>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>≥ 99.5%</span></span></li>
</ul>
</td>
</tr>
<tr>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>Model deployment cycle time</span></span></li>
</ul>
</td>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>↓ 30–50%</span></span></li>
</ul>
</td>
</tr>
<tr>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>Automated pipeline coverage</span></span></li>
</ul>
</td>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>100%</span></span></li>
</ul>
</td>
</tr>
<tr>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>Compliance audit readiness</span></span></li>
</ul>
</td>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>Continuous</span></span></li>
</ul>
</td>
</tr>
<tr>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>Model accuracy improvement</span></span></li>
</ul>
</td>
<td>
<ul>
<li class="MsoNoSpacing"><span><span>QoQ measurable gains</span></span></li>
</ul>
</td>
</tr>
</tbody>
</table>
<br />
<span><span><b>What You Will Build</b></span></span>
<ul>
<li class="MsoNoSpacing"><span><span>End-to-end MLOps pipelines (data → training → testing → deployment → monitoring → retraining)</span></span></li>
<li class="MsoNoSpacing"><span><span>Kubernetes-based model serving platforms</span></span></li>
<li class="MsoNoSpacing"><span><span>Cloud ML platforms (Vertex AI / SageMaker / Azure ML)</span></span></li>
<li class="MsoNoSpacing"><span><span>CI/CD automation for ML systems</span></span></li>
<li class="MsoNoSpacing"><span><span>Model observability and alerting using Prometheus / Grafana</span></span></li>
<li class="MsoNoSpacing"><span><span>Secure, version-controlled ML governance frameworks</span></span></li>
</ul>
<br />
<span><span><b>Required Experience (Performance Evidence)</b></span></span><br />
<span><span><b>You must have:</b></span></span>
<ul>
<li class="MsoNoSpacing"><span><span>Proven delivery of production ML pipelines (not just experiments)</span></span></li>
<li class="MsoNoSpacing"><span><span>Built CI/CD for ML models in Kubernetes environments</span></span></li>
<li class="MsoNoSpacing"><span><span>Implemented monitoring, retraining, and version governance</span></span></li>
<li class="MsoNoSpacing"><span><span>Delivered at least one enterprise-scale ML deployment</span></span></li>
<li class="MsoNoSpacing"><span><span>Hands-on experience with MLflow / Kubeflow / Airflow</span></span></li>
<li class="MsoNoSpacing"><span><span>Cloud ML production deployment (AWS, GCP, or Azure)</span></span></li>
<li class="MsoNoSpacing"><span><span>Strong Python engineering background</span></span></li>
</ul>