By Nicolas Bosc
In these times of containment and the advent of forced telework, the robustness of your information system and the patience of your users are being put to the test. The time has, therefore, come to provide you with a monitoring strategy, which will enable you to satisfy your users, but also your management and your IT teams.
Log files are not only useful to investigate after an error or the crash of an application. Indeed, with the right monitoring tools and the right dedicated strategy, you may be able to detect anomalies in advance, anticipate application overloads or feel the first signs of an attack. Splunk, ELK, Grafana + Prometheus, are barbaric terms? We explain how it works!
The first concrete argument of these tools is that it allows the monitoring of data through all the layers of your IS: network, system, application server, web server and application. Thus the recovered data can benefit all your IT teams, see business. These solutions (editor or open-source) also include powerful algorithms for anomaly detection and Machine Learning. This allows your monitoring instance to prevent potential incidents and to detect them even faster in the future by analyzing the correlations between events.
The potential of these tools is almost limitless, but to successfully add value to your users and provide them with the KPIs they need, you will need to think carefully about governance. Because the needs of your development teams, your marketing team or your management are quite remote!
- Development team > Regression detection, the status of development environments, response time and performance of developed features, downtime, etc.
- Product Owner > Adoption of new services (new features), the evolution of the quality of critical functionalities, definition of the stroke reduction strategy
- IT management > High-level assessment of the quality of service, billing strategy, the study of the success of new projects, capacity planning, reporting
- Marketing Team > Creation of persona, analysis of user behaviour, correlation of user behaviour, reduction of Churn rate
- Support Team > Real-time centralized monitoring to reduce incident response time (global view, end-to-end), quick and easy root cause, monitoring of production environment quality, business service performance, web service quality, etc.
- Security > Prevention of security breaches, ease of performing security audits, analysis of user behaviour via Artificial Intelligence (Machine Learning), obsolescence management, IAM, malware and information theft detection.
These intelligent tools allow dynamic monitoring of your IS and can automatically adapt to its evolution. With the emergence of Infrastructure as a Service and technologies such as Kubernetes, your infrastructures are continually being assembled and dismantled, and the monitoring agents offered by the tools mentioned will follow the movement automatically.