Vista Normal

Hay nuevos artículos disponibles. Pincha para refrescar la página.
Anteayerhomelab.

How to accidentally write a monitoring system (another one)

6 Julio 2024 at 10:50
How to accidentally write a monitoring system (another one)

It’s interesting how it works out for me - my pet projects turn out by chance. There is no final goal, there is only an impulse: “Oh! This sounds interesting, how can this be done?” And all: “sleep is for weaklings”, “beer on Friday? Of course I won’t!” and stuff like that. As they say, there is only a way. And this story began in much the same way... It was getting dark. At work I had nothing to do, I needed to install a certain number of servers and services for monitoring, but due to the large bureaucracy in the company, this was not easy to do, and the monitoring system itself worked on SNMP database, but where can I get SNMP from a self-written service? And then the brilliant idea came to my mind to try it myself. Besides, it didn’t look complicated: monitoring ports, http and sending an alert somewhere. “Why not,” I thought, besides, I’m learning more about Python. And so he appeared...

Simple monitoring that somehow does something, shows something, and even has a console tool:

Old SMON

A couple of years later, I remembered that I had homemade monitoring and why not add it to my main pet project, Roxy-WI. No sooner said than done. After all, the more functions the better! And it so happened that over time, monitoring became “crowded” within the walls of Roxy-WI: on the one hand, it was necessary to develop a web interface, on the other, monitoring, so that there was no preponderance in one of the parties, I decided to move monitoring into a separate project. Greetings - RMON! Yes... my names are so-so.

RMON status page

Pfft... one more monitoring, how many?

100500? Yes, perhaps so, they probably also said about Prometheus at one time: “Why is there Zabbix?!”, and before that about Zabbix: “Why is there SNMP, MRTG and Nagios?!” Yes, there is, but why not? Maybe you'll be able to do something better. Of course, I don’t yet put RMON in the same category as these monitoring systems, not yet. What if we can do something better ;)?

What do I see as the “competitive advantage” of RMON over existing monitoring systems, primarily over Prometheus (as an industry standard) and Uptime Kuma (as closer in functionality)? There are, in my opinion, at least five main killer features:

  1. Agents - you can install several pieces both inside and outside the perimeter and monitor availability from several points. Agents can be combined into “regions” to balance checks and move between groups.
  2. API.
  3. Role-based agent access model.
  4. Easy to install and configure, Web interface and Status pages.
  5. 7 HTTP connection metrics + SSL certificate attenuation monitoring.

There is also Ping monitoring, DNS records and TCP. In the future I plan to expand the capabilities of inspections.

New RMON

We've seen it all before

Yes, agents are essentially implemented in Prometheus and Blackbox exporter: Blackbox exporters can also be installed at different points and monitored from there, + - the same thing. Yes, Uptime Kuma is even easier to climb and also has a web interface. The API can be replaced with the same Ansible, for example. But there is one thing - it is not here and there. You can’t give a playbook to a person and say: “Don’t create anything on those exporters, you’re bad!”, you will have to raise several instances to share access, plus he needs to be trained to work with Ansible. It is also impossible to automate work with checks. More precisely, most likely it is possible, but these are crutches and a high level of entry.

As a result, for those who will write: “The Web sucks, the console is our everything!”

Yes, sometimes it is, and sometimes it is not. Sometimes even the most advanced and technologically correct solutions are not suitable. Somewhere it’s a pity to waste time and resources, somewhere you don’t want to dive, and somewhere you need to “get everything done in 2 minutes.” And sometimes advanced solutions are simply not needed and it is more convenient to work with simpler ones. We must proceed from a specific situation, and not force everyone into a framework: “%UserName%, use only %ProgrameName% in all cases of life!”

P.S. If you want to try, then write, I’ll be happy to show/explain :).

submitted by /u/aidaho6
[link] [comments]
❌
❌