RSS is a forward-thinking company in search of a talented Devops Specialist. While prior experience in retail, logistics, production, or industry is a significant advantage, we value a strong passion for innovation and a commitment to excellence above all else.
The ideal candidate will have robust experience in IT monitoring, incident detection and automation, coupled with excellent communication skills to work effectively across multiple teams.
This role is crucial in ensuring our applications and services run smoothly, minimizing downtime, and swiftly addressing any issues that arise. The goal we want to achieve is to be as predictive as possible in our way of operating our systems and avoiding incidents thanks to monitoring.
More specifically, the key responsibilities of the DevOps - IT Monitoring and Incident Detection Specialist are :
IT Monitoring :
* Implement and manage monitoring solutions drawing from experience from one or more of the following technologies: ELK stack, Splunk, Dynatrace, Datadog, New Relic, Grafana, or similar tools.
* Continuously optimize monitoring configurations to enhance visibility and alert accuracy.
* Develop dashboards and reports to provide insights into system performance and health.
* Help the Incident Manager into analyzing and detecting trends which can, slowly but certainly, lead to incidents
Cloud Management:
* Utilize practical experience with major cloud providers such as Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure to integrate and maintain monitoring solutions.
* Ensure cloud environments are monitored effectively to detect and resolve issues promptly.
Incident Detection:
* Analyze the collected data to identify trends and new metrics which have to be monitored in order to become more predictable with our incidents
* Work closely with the Incident Manager to help identifying trends, but also understand what is the root cause behind an issue (if applicable to monitoring)
* Develop and execute fault detection metrics and alerts in applications
* Implement alerts into our CRM (FreshServices from FreshWorks) to link monitoring with alerting and incident response
* Work, together with the incident manager, on post-incident reviews and implementing lessons learned to prevent recurrence.
Collaboration and communication:
* Work collaboratively with cross-functional teams, including application owners, developers, and IT support staff, to integrate applications with the monitoring platform.
* Communicate effectively to ensure all stakeholders are informed and aligned during incident resolution and system optimizations.
* Provide technical guidance and support to application teams to maximize the effectiveness of monitoring solutions.
Automation :
* Leverage automation tools such as Terraform, Python, bash, and CI/CD pipelines to streamline monitoring and incident management processes.
* Develop scripts and automation workflows to enhance system efficiency and reliability.
Documentation :
* Ensure proper documentation of each system being monitored and the linked alerts
* Make sure that thresholds linked to alert and priority level are correctly defined and documented
Autonomous Working:
* Demonstrate the ability to work independently and take initiative as confidence in the technology stack grows.
* Proactively identify opportunities for improvement and drive projects to enhance IT operations.
Qualifications:
* Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
* Proven experience with one or more IT monitoring technologies (ELK stack, Splunk, Dynatrace, Datadog, New Relic, Grafana, or similar).
* Hands-on experience with one of the major cloud providers (GCP, AWS, Azure).
* Excellent communication and collaboration skills, with the ability to work across multiple teams.
* Proficiency in automation tools and scripting languages such as Terraform, Python, bash, and CI/CD pipelines.
* Ability to work independently and manage time effectively in a fast-paced environment.
* You’re eager to learn and you have a huge appetite for innovative technologies and changing trends in the market
What We Offer:
* A permanent and pivotal role within our organization
* Competitive salary, company car, and fuel card
* Ongoing training and growth opportunities
* Accessible head office with convenient parking
* Flexible working hours and the option to work remotely for an optimal work-life balance
Join us in shaping the future of network engineering and security.
Apply now to be part of a dynamic team committed to innovation and excellence.
Your expertise will drive our success!