1 1
lanefu

netdata is awesome

Recommended Posts

I just recently came across netdata.  Have you all seen it before... it's pretty great.... its kind of like a free version of datadog.

 

I was wondering if there would be interest in replacing the armbianmonitor -r rpimontor  with it.. or augmenting as another install option

Share this post


Link to post
Share on other sites
2 hours ago, lanefu said:

I was wondering if there would be interest in replacing the armbianmonitor -r rpimontor  with it

 

Only if you love monitoring mistake N°1: your monitoring is that heavy that it affects the way your system behaves. The purpose of RPi Monitor is to explore system behavior, e.g. adjust tunables to get sufficient ondemand governor behavior (which is broken on several platforms now but literally no one cares since all remaining devs are busy adding new devices and fancy features). If your monitoring is that heavy that your system will constantly clock at the upper speeds how would you be able to draw reasonable conclusions?

 

Just like you should benchmark every benchmark you're using you should monitor your monitoring solution of choice (quite simple with armbianmonitor -m). Checking out netdata 3 years ago led to the above conclusions when testing on weak SBC.

 

Also SBC stuff like CPU temperature and cpufreq scaling is missing. Netdata will show you CPU utilization only since it's meant for servers that will run on highest clockspeeds all the time. Which SBC is more busy: the one reporting 10% CPU utilization clocking at 1200 MHz or the one reporting 20% remaining at 480 MHz. Netadata's output is useless on systems with cpufreq scaling. It's only great for servers and for operators who know what they're doing. As such it should never be too easy to install it.

 

For those people interested. You can play around with it on ODROID bench: https://forum.odroid.com/viewtopic.php?f=29&t=32257#p246987 (please keep in mind that the four instances are S922X installations and this SoC is as capable as Intel Atom designs. Far more capable than the average SBC Armbian supports)

Share this post


Link to post
Share on other sites
19 hours ago, tkaiser said:

 

Only if you love monitoring mistake N°1: your monitoring is that heavy that it affects the way your system behaves. The purpose of RPi Monitor is to explore system behavior, e.g. adjust tunables to get sufficient ondemand governor behavior (which is broken on several platforms now but literally no one cares since all remaining devs are busy adding new devices and fancy features). If your monitoring is that heavy that your system will constantly clock at the upper speeds how would you be able to draw reasonable conclusions?

 

Just like you should benchmark every benchmark you're using you should monitor your monitoring solution of choice (quite simple with armbianmonitor -m). Checking out netdata 3 years ago led to the above conclusions when testing on weak SBC.

 

Also SBC stuff like CPU temperature and cpufreq scaling is missing. Netdata will show you CPU utilization only since it's meant for servers that will run on highest clockspeeds all the time. Which SBC is more busy: the one reporting 10% CPU utilization clocking at 1200 MHz or the one reporting 20% remaining at 480 MHz. Netadata's output is useless on systems with cpufreq scaling. It's only great for servers and for operators who know what they're doing. As such it should never be too easy to install it.

 

For those people interested. You can play around with it on ODROID bench: https://forum.odroid.com/viewtopic.php?f=29&t=32257#p246987 (please keep in mind that the four instances are S922X installations and this SoC is as capable as Intel Atom designs. Far more capable than the average SBC Armbian supports)

 

I feel like you missed the part where I said that netdata is awesome.

 

Anyway, in all seriousness thanks for clarifying the purpose of RPI Monitor, its longer polling interval, lighter footprint, etc would definitely make more sense than netdata.    armbianmonitor -m is my goto on all my SBCs for checking utilization.

 

For the sake of providing a general defense for netdata, my set of the pants testing on my Opi Prime while glancing at armbianmonitor, htop, etc has shown its footprint to be pretty minimal-- and much less jarring than when Observium's snmp poller hits it.   It looks like there are several fairly easy ways to add more metrics to it.   I got consul and nomad data via its built in statsd collector.  If you haven't used it on your own gear in a few years, its probably worth another look for other potential use cases.

 

 

Share this post


Link to post
Share on other sites

Netdata is interesting, but at the same time, most of their focus is on x86, where there is a fair amount of resources,

 

It runs on ARM, but just be mindful, and tweak the settings as needed...

Share this post


Link to post
Share on other sites

Hi,

 

I am the founder of netdata.

 

I just wanted to contribute to this discussion, that netdata can perfectly run on extremely weak devices, of any architecture, if you pay attention to a couple of settings that affect its resource consumption.

 

The first is the data collection frequency. The default is "per second" (`update every = 1), which may be quite expensive in CPU resources for very weak devices. Configuring netdata with `update every = 2` will cut its CPU resources utilization in half, while still maintaining a very high granularity compared to any other solution. For extremely weak devices this setting can be set to 5 or even more. For example, on my tests on RPi 1b, setting it to 5 seconds provides the best results.

 

Memory footprint can also be controlled using the `history` setting. This defaults to 3600 data collection points. If you set `update every = 5`, setting also `history = 720` will still provide 1 hour of data, but netdata will need just 1/5 of the memory.

 

Also, netdata has configurable data collection modules. Most notably, `apps.plugin` should be disabled completely on very weak devices. This is as expensive as the netdata daemon itself.

 

Last, keep in mind that you should examine the netdata resources usage while netdata does not have any viewers on its dashboard. This is the "permanent impact" of netdata on the target system. If there are viewers, the netdata daemon will use additional resources to serve them. But this only happens while you view the dashboard. The data collection frequency also affects the resources required by viewers, since the dashboards query the server with the same frequency data are collected.

 

If you need help to configure netdata properly for weak IoT devices, I would be glad to help.

 

Thank you!

Share this post


Link to post
Share on other sites
23 hours ago, lanefu said:

I feel like you missed the part where I said that netdata is awesome

 

Nope. Netdata is awesome. All I tried to explain is why 'armbianmonitor -r' was an attempt to generate insights about SBC behavior 3 years ago and why netdata is not sufficient for this purpose. Once you look at results the data collection approach completely changes system behavior --> useless for this use case.

 

21 minutes ago, ktsaou said:

If you need help to configure netdata properly for weak IoT devices, I would be glad to help

 

IMO you should take care of cpufreq scaling on this class of devices and if netdata should generate insights and not just fancy graphs you might want to explore EAS.

Share this post


Link to post
Share on other sites
21 minutes ago, ktsaou said:

 

 

Last, keep in mind that you should examine the netdata resources usage while netdata does not have any viewers on its dashboard. This is the "permanent impact" of netdata on the target system. If there are viewers, the netdata daemon will use additional resources to serve them. But this only happens while you view the dashboard. The data collection frequency also affects the resources required by viewers, since the dashboards query the server with the same frequency data are collected.

 

If you need help to configure netdata properly for weak IoT devices, I would be glad to help.

 

Thank you!

 

Hey @ktsaou thanks for chiming in.

 

I see your point about the viewers...  By running netdata in headless mode and shipping the metrics to another instance for reviewing them, the footprint would be consistent.  

 

Would you be able to speak to TK's concerned about CPU load reading being skewed by changing cpu frequency scaling?  

 

Also do you foresee adding armhf and aarch64 builds to your nightlies, or whatever is retrieved by the kickstart install script?  

Share this post


Link to post
Share on other sites
1 hour ago, lanefu said:

I see your point about the viewers...  By running netdata in headless mode and shipping the metrics to another instance for reviewing them, the footprint would be consistent.  

 

yes of course.

 

1 hour ago, lanefu said:

Would you be able to speak to TK's concerned about CPU load reading being skewed by changing cpu frequency scaling?

 

Netdata reports whatever the kernel reports. We almost never change the metrics artificially. But we report both CPU utilization % and CPU frequency.

 

The reason is simple: assuming that the CPU will scale, is wrong. The processor may not increase its speed for a number of reasons, including thermal protection, scaling governor policy and user settings. If everything works as expected, CPU frequency will be increased by the kernel long before utilization hits 10% or 20%.

 

So, if scaling works, you will never see 100% on a CPU running at 480MHz when it can go up to 1200MHz.

 

But if we did this artificially, it would be possible to see 40% on a CPU that could go up to 1200MHz, but for some reason runs constantly at 480MHz with 100% utilization. This situation would be totally unacceptable and with the current netdata cannot happen.

 

1 hour ago, lanefu said:

Also do you foresee adding armhf and aarch64 builds to your nightlies, or whatever is retrieved by the kickstart install script?  

 

We don't provide them, just because travis does not support these architectures. We are open for help thought. If you can make a PR integrating netdata CI to a third party service that can build such binary files, we would be glad to provide them.

 

Share this post


Link to post
Share on other sites
13 minutes ago, ktsaou said:

We don't provide them, just because travis does not support these architectures. We are open for help thought. If you can make a PR integrating netdata CI to a third party service that can build such binary files, we would be glad to provide them.

 

Okay good to know.    I'll explore some options, including just cross compiling with the appropriate toolchain. Would you need it to build on every commit or merge to satisfy CI, or just like as a nightly?

Share this post


Link to post
Share on other sites
1 minute ago, lanefu said:

Would you need it to build on every commit or merge to satisfy CI, or just like as a nightly?

 

Nightly is probably better for this, otherwise it will slow down all PRs.

 

Share this post


Link to post
Share on other sites
6 hours ago, ktsaou said:

But if we did this artificially, it would be possible to see 40% on a CPU that could go up to 1200MHz, but for some reason runs constantly at 480MHz with 100% utilization. This situation would be totally unacceptable and with the current netdata cannot happen.

 

Keep in mind that this is a key requirement for a monitoring system.

It would be tragic to see CPU utilization 40% instead of 100% on a system that has thermal issues and the processor does not actually scale.

 

Share this post


Link to post
Share on other sites
On 3/6/2019 at 5:59 PM, ktsaou said:

f you need help to configure netdata properly for weak IoT devices, I would be glad to help.

 

Netdata has shown up in a couple of upstreams - and there, the devices do have limited RAM/Flash Storage...

 

https://openwrt.org/packages/pkgdata/netdata

 

So one has to be aware of the limited environment - netdata is great for x86/amd64 desktop distros - but scaling down into the SBC space, much less linux oriented SoC's where there are additional concerns - arch (MIP's big endian vs little endian is a good example)....

 

I actually stopped running Netdata on my little hotel amd64 server -- as it was banging the SSD pretty hard with Apache stats, and I was concerned about it burning up the SSD with misc stuff written back the logs.

 

Looking at minimal ARM devices - one has to assume that memory might be limited, and KSM might not be available as a kernel build option, and again, impact to the storage devices - and uSD cards and eMMC's have a limited write cycle there... going to WiSOC's like OpenWRT supports, NOR flash can be even worse, as NOR has a much more limited write time compared to raw NAND.

 

It's a cool app - but in many ways, just not scaled on the lower end...

 

Share this post


Link to post
Share on other sites

Hi,

 

netdata supports both big and little endian machines.

 

On 3/17/2019 at 3:08 AM, sfx2000 said:

I was concerned about it burning up the SSD with misc stuff written back the logs

 

You should probably disable the apache plugin, or configure your apache to run without an access log file.

 

On 3/17/2019 at 3:08 AM, sfx2000 said:

one has to assume that memory might be limited

 

You can size the netdata database and disable unwanted plugins. Generally, netdata will run just fine with very limited memory.

We have also given special attention to have netdata running with predictable memory. For example, while data collecting, netdata does not allocate any memory at all. So, its memory consumption should be just fixed.

 

On 3/17/2019 at 3:08 AM, sfx2000 said:

impact to the storage devices - and uSD cards and eMMC's have a limited write cycle there... going to WiSOC's like OpenWRT supports, NOR flash can be even worse, as NOR has a much more limited write time compared to raw NAND.

 

Netdata does not write to disks while it runs.

So, this should not be a problem.

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
1 1