RSS is a forward thinking company in search of a talented Devops Specialist. While prior experience in retail, logistics, production, or industry is a significant advantage, we value a strong passion for innovation and a commitment to excellence above all else.
The ideal candidate will have robust experience in IT monitoring, incident detection and automation, coupled with excellent communication skills to work effectively across multiple teams.
This role is crucial in ensuring our applications and services run smoothly, minimizing downtime, and swiftly addressing any issues that arise. The goal we want to achieve is to be as predictive as possible in our way of operating our systems and avoiding incidents thanks to monitoring.
More specifically, the key responsibilities of the DevOps IT Monitoring and Incident Detection Specialist are :
IT Monitoring :
* Implement and manage monitoring solutions drawing from experience from one or more of the following technologies: ELK stack, Splunk, Dynatrace, Datadog, New Relic, Grafana, or similar tools.
* Continuously optimize monitoring configurations to enhance visibility and alert accuracy.
* Develop dashboards and reports to provide insights into system performance and health.
* Help the Incident Manager into analyzing and detecting trends which can, slowly but certainly, lead to incidents
Cloud Management:
* Utilize practical experience with major cloud providers such as Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure to integrate and maintain monitoring solutions.
* Ensure cloud environments are monitored effectively to detect and resolve issues promptly.
Incident Detection:
* Analyze the collected data to identify trends and new metrics which have to be monitored in order to become more predictable with our incidents
* Work closely with the Incident Manager to help identifying trends, but also understand what is the root cause behind an issue (if applicable to monitoring)
* Develop and execute fault detection metrics and alerts in applications
* Implement alerts into our CRM (FreshServices from FreshWorks) to link monitoring with alerting and incident response
* Work, together with the incident manager, on post incident reviews and implementing lessons learned to prevent recurrence.
Collaboration and communication:
1. Work collaboratively with cross functional teams, including application owners, developers, and IT support staff, to integrate applications with the monitoring platform.
2. Communicate effectively to ensure all stakeholders are informed and aligned during incident resolution and system optimizations.
3. Provide technical guidance and support to application teams to maximize the effectiveness of monitoring solution