Site Reliability Engineer (SRE)

Coloradousvia direct

// Job Type

Full Time

// Salary

Not disclosed

// Posted

3 months ago

// Seniority

manager

// Work Mode

onsite

// Experience

12+ years

About the Role

Job Description US Work Authorization Requirement: Candidates must be legally authorized to work in the United States without employer sponsorship. This includes, but is not limited to, U.S. Citizens, Permanent Residents, and other individuals with valid U.S. work authorization. Job Description: We are seeking a highly experienced Site Reliability Engineer (SRE) with a strong Java development background to lead reliability initiatives and ensure the stability, scalability, and performance of mission-critical systems. This role blends deep hands-on engineering with leadership, ownership, and a proactive approach to reliability and operations. The ideal candidate is someone who has evolved from a strong developer into an SRE/DevOps leader, understands production systems deeply, and can partner effectively with development, platform, and operations teams. Key Responsibilities: Design, build, and maintain highly reliable, scalable, and fault-tolerant systems in production environments. Embed reliability best practices (SLOs, SLIs, error budgets) into the software development lifecycle. Work closely with development teams on Java Spring Boot microservices to improve operability and resilience. Automate operational workflows to reduce manual effort and improve system efficiency. Monitor system health, performance, and availability; proactively identify risks and bottlenecks. Lead incident management, on-call support, and root cause analysis for production issues. Drive continuous improvement initiatives focused on availability, scalability, and performance. Support and oversee release and deployment activities, including after-hours support when required. Champion best practices around CI/CD, infrastructure as code, and cloud-native operations. Mentor engineers and provide technical leadership across SRE and development teams. Collaborate with stakeholders to align reliability goals with business priorities. Required Qualifications 12+ years of IT experience in SRE, DevOps, or Production Engineering Strong Java development experience (Java 17+, Spring Boot Microservices, Spring Web) Hands-on experience with OpenShift (OCP), Kubernetes, and Docker Strong expertise in MongoDB (data modeling, design, optimization) Experience with Apache Kafka and event-driven architectures Working knowledge of Oracle Database Familiarity with BDD practices Solid experience with CI/CD, automation, and IaC (Terraform, Ansible) Exposure to AI-assisted development tools (e.g., GitHub Copilot) Excellent troubleshooting skills in high-pressure production environments Strong communication, collaboration, and ownership mindset Preferred Qualifications: Experience with monitoring and observability tools such as Prometheus, Grafana, and the ELK stack. Knowledge of security best practices, compliance standards, and production hardening. Prior experience leading or mentoring SRE teams or guiding engineers in reliability practices. Apply Online Your Name * Your Phone Number * Your Email Address * Job id What is your current U.S. visa or immigration status? * SelectU.S. Citizen (USC)Lawful Permanent Resident (Green Card holder)H1BF1-OPT/Stem-OPT/CPT EADH4-EADL-2SGC-EADOther Valid Visa Where are you currently located at? * W2 or C2C * SelectW2C2C How many years of total experience do you have? * How many years of relevant experience do you have? * Do you require H1B sponsorship? * YesNo Do you require sponsorship? * NoYes – H-1B transferYes – Green Card sponsorshipYes – Both H-1B transfer and Green Card sponsorship Upload Resume * Δ US Work Authorization Requirement: Candidates must be legally authorized to work in the United States without employer sponsorship. This includes, but is not limited to, U.S. Citizens, Permanent Residents, and other individuals with valid U.S. work authorization. Job Description: We are seeking a highly experienced Site Reliability Engineer (SRE) with a strong Java development background to lead reliability initiatives and ensure the stability, scalability, and performance of mission-critical systems. This role blends deep hands-on engineering with leadership, ownership, and a proactive approach to reliability and operations. The ideal candidate is someone who has evolved from a strong developer into an SRE/DevOps leader, understands production systems deeply, and can partner effectively with development, platform, and operations teams. Key Responsibilities: Design, build, and maintain highly reliable, scalable, and fault-tolerant systems in production environments. Embed reliability best practices (SLOs, SLIs, error budgets) into the software development lifecycle. Work closely with development teams on Java Spring Boot microservices to improve operability and resilience. Automate operational workflows to reduce manual effort and improve system efficiency. Monitor system health, performance, and availability; proactively identify risks and bottlenecks. Lead incident management, on-call support, and root cause analysis for production issues. Drive continuous improvement initiatives focused on availability, scalability, and performance. Support and oversee release and deployment activities, including after-hours support when required. Champion best practices around CI/CD, infrastructure as code, and cloud-native operations. Mentor engineers and provide technical leadership across SRE and development teams. Collaborate with stakeholders to align reliability goals with business priorities. Required Qualifications 12+ years of IT experience in SRE, DevOps, or Production Engineering Strong Java development experience (Java 17+, Spring Boot Microservices, Spring Web) Hands-on experience with OpenShift (OCP), Kubernetes, and Docker Strong expertise in MongoDB (data modeling, design, optimization) Experience with Apache Kafka and event-driven architectures Working knowledge of Oracle Database Familiarity with BDD practices Solid experience with CI/CD, automation, and IaC (Terraform, Ansible) Exposure to AI-assisted development tools (e.g., GitHub Copilot) Excellent troubleshooting skills in high-pressure production environments Strong communication, collaboration, and ownership mindset Preferred Qualifications: Experience with monitoring and observability tools such as Prometheus, Grafana, and the ELK stack. Knowledge of security best practices, compliance standards, and production hardening. Prior experience leading or mentoring SRE teams or guiding engineers in reliability practices. [contact-form-7 id="ac31dcc" title="Job Apply Form"]

Tech Stack

oraclejavamongodbdockerkubernetesterraformansiblemicroservices

View on Original Source

Interested in this job?

Use our AI to tailor your resume for this Site Reliability Engineer (SRE) position at JPS Tech Solutions LLC.