About the Role
Kogo poszukujemy?
Required Skills & Experience
Experience with Prometheus, Grafana, and alerting pipelines.
Operational knowledge of AWS services (EC2, ECS/EKS, RDS, S3, CloudWatch).
Experience with Pingdom, Cloudflare, or similar uptime/performance monitoring tools.
Understanding of distributed systems, microservices, and cloud-native architecture.
Experience with log aggregation or observability tools (ELK, Loki, etc.).
Strong analytical mindset and problem-solving skills.
Clear documentation and communication skills.
What We Offer
Competitive contractor rates.
B2B open-ended contract.
Full-time job and long-term working possibilities.
Exposure to modern cloud and observability tooling.
Opportunity to shape platform monitoring and reliability practices.
Strong collaboration with the platform, DevOps, and operations teams.
Clear progression path toward SRE or Platform Engineering roles.
Knowledge-sharing opportunities.
Dynamic culture surrounded by industry experts.
Enthusiastic and energetic working environment.
Flat structure.
No dress code.
Sounds good? Please submit your CV in English by using the " Apply " button.
Czym będziesz się zajmować?
Role Purpose
The Platform Observability Analyst ensures the performance and reliability of Amelco’s platforms through strong monitoring, alerting, and operational insight. The role focuses on observability, incident response, and proactive system stability, providing early detection of issues and supporting rapid resolution.
Key Responsibilities
Own and maintain dashboards for system health, performance, and uptime.
Manage Prometheus, Grafana, Pingdom, Cloudflare, and AWS CloudWatch monitoring.
Manage alerts, adjust thresholds, and configure notifications in line with operational SLAs.
Monitor system metrics and logs proactively.
Respond to system alerts and operational issues.
Take immediate action on critical incidents, mitigate medium-impact issues, and escalate major events to Incident Management.
Document all alerts, actions, and resolutions.
Identify trends and early warning signs in system performance.
Recommend improvements for monitoring, alerting, and operational efficiency.
Support post-incident reviews and maintain operational documentation and runbooks.
Work closely with L2/L3 Support, Incident Management, and DevOps teams.
Provide clear technical insights to stakeholders.
The role involves an evening and overnight shift pattern.
Tech Stack
PrometheusGrafanaAWSEC2ECS/EKSRDSS3CloudWatchPingdomCloudflareELKLokidistributed systemsmicroservicescloud-native architecture