chwe Posted September 20, 2018 Posted September 20, 2018 1 hour ago, ag123 said: ok testing out desktop x-windows typing from armbian ... nightly upgraded to 5.60 kernel testing testing testing.. You need: a github account a bit of time a SBC armbian supports git clone https://github.com/armbian/testings cd testings ./createreport.sh you don't even need to fork the repo in the browser, we'll do it for you from command line.. Same for the PR, everything is done from console. 4 hours ago, botfap said: DNS servers have been changed today, probably just waiting for propagation although it should have happened by now. is back now even for google users... 1
ag123 Posted September 20, 2018 Posted September 20, 2018 17 minutes ago, chwe said: testing testing testing.. You need: a github account a bit of time a SBC armbian supports git clone https://github.com/armbian/testings cd testings ./createreport.sh you don't even need to fork the repo in the browser, we'll do it for you from command line.. Same for the PR, everything is done from console. is back now even for google users... ok my entry is here https://github.com/ag88/testings/blob/20180921-orangepipc-next/orangepipc-next.report armbianmonitor -u apparently returns a blank
chwe Posted September 20, 2018 Posted September 20, 2018 4 minutes ago, ag123 said: ok my entry is here https://github.com/ag88/testings/blob/20180921-orangepipc-next/orangepipc-next.report armbianmonitor -u apparently returns a blank I guess that's due to your DNS resolver has still issues with ix.io... Can we assume that Quote armbianmonitor -u https://pastebin.com/3NLDHaFs (truncated due to size limits, the latest entry is at bottom) is still correct? (I edited your PR filling in armbianmonitor)
ag123 Posted September 20, 2018 Posted September 20, 2018 1 minute ago, chwe said: I guess that's due to your DNS resolver has still issues with ix.io... Can we assume that https://pastebin.com/3NLDHaFs is still correct? (I edited your PR filling in armbianmonitor) ok you can use that
Moklev Posted September 22, 2018 Posted September 22, 2018 Still stability issues on stable v5.60 (v5.59 upgraded to 5.60). I can not have an uptime over 3-12 hours (it is not reachable via ssh or webgui). SD and psu are OK. The system run stable on old v5.38 (january 2018) with same setup for weeks. Orange Pi Zero 512/v1.4 system log: http://ix.io/1ngD
ag123 Posted September 22, 2018 Posted September 22, 2018 @Moklev i'd suggest monitoring and tracking down the root cause of the ssh woes, one of those ways is to connect via a serial console when ssh fails https://forum.armbian.com/topic/8237-serial-console-access-via-usb-uartgadget-mode-on-linuxwindowsosx/ try to connect a cable via the micro usb OTG port to see if you get a linux console presented. if that works when ssh fails you could then 'get into' the board and do some checks like a armbianmonitor -u (if network fails you may need to run armbian monitor -U and saving the logs to a file) among the things check /sbin/sysctl -a |grep vm.swappiness vm.swappiness = 100 check in /etc/sysctl.conf if it is defined there you may like to reduce that value to 50 (or even for tests 0) to see if it made a difference there could also be other ways like hacking up a script to say ping the router every 5 minutes and if ping fails to capture a log like armbianmonitor -U > logfile ------ on a side note, i'd think 5.60 is still a useful 'milestone' to have 1
ag123 Posted September 22, 2018 Posted September 22, 2018 on kernel sunxi-next 4.14.70 (armbian 5.59 upgraded to 5.60 in nighly builds) orange pi pc i'm trying to setup the power tactile button to send a 'key' as input. i.e. using acpid to shutdown i did an $ ls /dev/input/by-path platform-1ee0000.hdmi-event platform-1f02000.ir-event the gpio key is absent, hence regardless of the dt overlays i apply, i don't receive any keys the things i checked are /boot/dtb/sun8i-h3-orangepi-pc.dtb decompiled (dtc -I dtb -O dts /boot/dtb/sun8i-h3-orangepi-pc.dgb ) Quote r_gpio_keys { compatible = "gpio-keys"; pinctrl-names = "default"; pinctrl-0 = <0x3a>; sw4 { label = "sw4"; linux,code = <0x100>; gpios = <0x36 0x0 0x3 0x1>; }; }; the above looks ok however in /boot/config-4.14.70-sunxi # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y # CONFIG_KEYBOARD_ADC is not set ... # CONFIG_KEYBOARD_GPIO is not set # CONFIG_KEYBOARD_GPIO_POLLED is not set i think CONFIG_KEYBOARD_GPIO needs to be configured as at least 'm' (module) or 'y' yes without it gpio keys inputs won't appear in /dev/input/by-path as the input driver is missing one main use of that button is by (orange pi *) users who wanted to use the 'power' button to shutdown the os via acpid but it could be useful in other situations as well
Moklev Posted September 22, 2018 Posted September 22, 2018 2 hours ago, ag123 said: vm.swappiness = 100 check in /etc/sysctl.conf if it is defined there you may like to reduce that value to 50 (or even for tests 0) to see if it made a difference Interesting... I'll change swap threshold to a more conservative value to 30 and I'll test it for 24/48h.
ag123 Posted September 22, 2018 Posted September 22, 2018 i think @Igor is preparing the release of 5.60 images, known issues found here could be resolved after that e.g. for users to switch to 'nightly' to get the incremental updates
Igor Posted September 22, 2018 Author Posted September 22, 2018 1 hour ago, ag123 said: i think @Igor is preparing the release of 5.60 images, known issues found here could be resolved after that e.g. for users to switch to 'nightly' to get the incremental updates I had weeks of long shifts due to the last major upgrade and I will do my best to do nothing for some time Then I have a pile of non-technical tasks to cope with.
Moklev Posted September 23, 2018 Posted September 23, 2018 @ag123 The good thing: reducing the value to 30 (vm.swappiness = 30) seems to solve the problem. Now I got ~22 h uptime without any hang... The bad one: an heavy zram usage on my AMD/Debian 9.5 microserver freeze the system (both gnome shell and ssh). :-| I think zram has some problems with Debian...
tkaiser Posted September 26, 2018 Posted September 26, 2018 On 9/22/2018 at 2:34 PM, Moklev said: Orange Pi Zero 512/v1.4 system log: http://ix.io/1ngD Do you use this board to boil water? 75°C SoC temperature reported at boot? I don't know how the thresholds are defined for 'emergency shutdown' but in this situation some more CPU activity alone can result in Armbian shutting down. If you want to diagnose a problem you need to diagnose it. Some problems arise over time. 'Worked fine for months and now unstable' happens for various reasons. Human beings usually then blame immediately the last change they remember (like upgrading the OS) instead of looking for the culprit. Hopefully it still works but in your situation I would immediately install RPi-Monitor using armbianmonitor -r You get then a monitor instance running on your board and can look what happens and happened: https://www.cnx-software.com/2016/03/17/rpi-monitor-is-a-web-based-remote-monitor-for-arm-development-boards-such-as-raspberry-pi-and-orange-pi/ On 9/23/2018 at 4:55 PM, Moklev said: The good thing: reducing the value to 30 (vm.swappiness = 30) seems to solve the problem. Now I got ~22 h uptime without any hang... Hahaha! I run a bunch of Ubuntu and Debian servers with lowered DRAM, huge memory overcommitment (300% and not just the laughable 50% as with Armbian defaults) and of course vm.swappiness set to 100. No problem whatsoever. If you think swapping is the culprit you need to monitor swap usage! At least your http://ix.io/1ngD output shows memory and swap usage that is not critical at all. If you want to get the culprit whether this is related to swap you need to run something like 'iostat 600 >/root/iostat.log' and this in the background: while true ; do echo -e "\n$(date)" >>/root/free.log free -m >>/root/free.log sleep 600 done Then check these logs whether swapping happened. An alternative would be adjusting RPi-Monitor templates but while this is quite easy nobody will do this of course. On 9/23/2018 at 4:55 PM, Moklev said: The bad one: an heavy zram usage on my AMD/Debian 9.5 microserver freeze the system (both gnome shell and ssh). :-| I think zram has some problems with Debian... Zram is a kernel thing and not related to any userland stuff at all. In other words: you have the same set of problems on a MicroServer and an ARM SBC running different software stacks? Are SBC and MicroServer connected to the same power outlet? Unfortunately currently logging in Armbian is broken (shutdown logging and ramlog reported by @dmeey and @eejeel) but nobody cares (though @lanefu self assigned the Github issue). Great timing to adjust some memory related behavior and at the same time accepting that the relevant logging portions at shutdown that allow to see what's happening 'in the field' do not work any more)
Moklev Posted September 26, 2018 Posted September 26, 2018 1 hour ago, tkaiser said: Do you use this board to boil water? 75°C SoC temperature reported at boot? [...] After warm reboot... ~70-72°C it's a badly reported temperature, correct value -measured with a Fluke thermometer- is ~65°C on the SOC heatsink. It's normal, the scb works as a visual motion analizer (1 h264 hd stream) 24/7 since mid 2017. Quote Hopefully it still works but in your situation I would immediately install RPi-Monitor using armbianmonitor -r Yes, I've starting monitoring... Quote Zram is a kernel thing and not related to any userland stuff at all. In other words: you have the same set of problems on a MicroServer and an ARM SBC running different software stacks? Are SBC and MicroServer connected to the same power outlet? They are not... totally different purpose or software stacks... and power outlet. Anyway now my OPZ works stable again with Armbian 5.60 (with vm.swappiness set to 30-60).
tkaiser Posted September 27, 2018 Posted September 27, 2018 23 hours ago, Moklev said: Yes, I've starting monitoring... So can you please provide output from both 'free -m' and 'armbianmonitor -u' now? Once you switch back to vm.swappiness=100 the VM monitoring script should be adjusted to while true ; do echo -e "\n$(date)" >>/root/free.log free -m >>/root/free.log sync sleep 60 done (same with iostat -- also switching from 600 seconds to 60. And then the 'sync' call above is very important since with default Armbian settings we have a 600 second commit interval so last monitoring entries won't be written to disk without this sync call happening)
Moklev Posted September 27, 2018 Posted September 27, 2018 Debian 9.5 AMD/Microserver (vm.swappiness=60) iostat.log root@tubserver:~# cat iostat.log Linux 4.17.0-0.bpo.3-amd64 (tubserver) 27/09/2018 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0,09 0,01 0,15 0,29 0,00 99,45 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 1,55 24,94 27,24 21613817 23605307 sdb 1,23 4,76 27,24 4122642 23605307 md3 0,22 0,77 0,13 669789 109496 md1 0,00 0,00 0,00 2304 0 md2 0,36 26,05 19,72 22569513 17083692 md0 1,55 2,89 7,29 2501781 6313004 sdc 0,18 1,12 1,80 973561 1560460 zram0 0,01 0,00 0,03 2660 29588 zram1 0,01 0,00 0,04 2904 38420 avg-cpu: %user %nice %system %iowait %steal %idle 0,10 0,00 0,15 0,73 0,00 99,02 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 2,81 0,05 68,93 28 41359 sdb 2,77 0,00 68,93 0 41359 md3 0,00 0,00 0,00 0 0 md1 0,00 0,00 0,00 0 0 md2 0,00 0,00 0,00 0 0 md0 4,50 0,05 68,74 28 41244 sdc 0,00 0,00 0,00 0 0 zram0 0,00 0,01 0,00 4 0 zram1 0,00 0,00 0,00 0 0 free.log root@tubserver:~# cat free.log gio 27 set 2018, 11.22.53, CEST total used free shared buff/cache available Mem: 1743 257 123 26 1362 1278 Swap: 2776 55 2721 gio 27 set 2018, 11.24.12, CEST total used free shared buff/cache available Mem: 1743 258 121 26 1363 1276 Swap: 2776 55 2721 gio 27 set 2018, 11.27.01, CEST total used free shared buff/cache available Mem: 1743 260 155 26 1327 1274 Swap: 2776 54 2722 gio 27 set 2018, 11.37.01, CEST total used free shared buff/cache available Mem: 1743 260 154 26 1328 1274 Swap: 2776 54 2722 Tomorrow for OPZ/Armbian data... both for vm.swappiness=30 and 100 (and armbianmonitor -u)
tkaiser Posted September 27, 2018 Posted September 27, 2018 14 minutes ago, Moklev said: zram0 0,00 0,01 0,00 4 0 zram1 0,00 0,00 0,00 0 0 Sorry, this thing is doing nothing and especially not swapping. And I'm really not interested in what a MicroServer somewhere on this earth is doing. It's only about Armbian and the claim vm.swappiness=100 would crash your board. So as already said, it would be great if you can provide output from the two commands I asked for now. 1 hour ago, tkaiser said: So can you please provide output from both 'free -m' and 'armbianmonitor -u' now? Since after switching back to vm.swappiness=100 your board is supposed to crash then afterwards it gets interesting to have a look at the two logs (submitted via pastebin.com or some similar online pasteboard service).
ag123 Posted September 27, 2018 Posted September 27, 2018 i think the somewhat elevated temperatures of say 70-75 deg C on H3 socs can be tolerated. the main trouble is that many of these boards ships without a heatsink the main thing about running at these temperatures is that the board needs to run in a well ventilated environment e.g. that it should not be enclosed in a box that limits simple convection heat transfer (i.e. 70-75 deg should be an upper load limit and temperatures must stabilise at this range during high loads and must not increase) I've an orange pi one that i found rather commonly runs at those temperatures without a heatsink, i'd imagine it may even be possible for it to goto boiling point 100 deg C and that the soc still runs but that to prevent damage, it would be necessary to throttle the frequency (e.g. reduce to 400mhz) and lower the voltages (reduce to as low as is possible e.g. below 1.1v if possible) at those temperatures. but a shutdown should be unnecessary but that 70-75 deg C should be considered 'normal' operating temperatures for H3 socs running at 1.2ghz when under high loads the situation can be improved quite a bit even with a rather small heat sink especially at the higher temperatures https://forum.armbian.com/topic/8203-orangepione-h3-an-experiment-with-small-heatsink/ in particular at the higher temperatures 70-75 deg could be rather easily reached in hot summer temperatures as heat transfer is a function of temperature differences vs ambient temperatures. a heat sink needs to be combined with good ventilation to remove all that heat and as in my experiment, heat sink + *good ventilation* prevents the soc from reaching the higher temperatures and that at steady state it lets the soc run at a marginally lower temperature than without a heat sink the problem i've with the orange pi h3 socs is that *after* the os is shutdown, the h3 soc and the board can overheat to such extreme that temperatures is well above 100 deg C, i almost have a board going up in fire. the only safe / precautionary way for now is to disconnect power once the status led indicates that the os has shutdown and do not issue a shutdown remotely if you can't unplug power from the board
tkaiser Posted September 27, 2018 Posted September 27, 2018 25 minutes ago, ag123 said: i think the somewhat elevated temperatures of say 70-75 deg C on H3 socs can be tolerated Huh? Can we please stay focused here on an Armbian release upgrade and potential show-stoppers? You brought up the claim that vm.swapiness settings cause crashes without any evidence whatsoever. Let's focus now on collecting some data related to this and keep temperature babbling outside. We all know that certain boards report wrong temperatures anyway (OPi Zero rev 1.4 for example) and the only area where this gets interesting is whether those wrongly reported temperatures cause an emergency shutdown (the kernel's cpufreq framework defines that at a specific critical temperature a shutdown is initiated). So obviously with wrong thermal readouts as we experience on some sunxi boards/platforms we get a 'denial of service' behavior under load. Using the wrong hardware for specific purposes is not the focus of this thread (e.g. using a board with just 512 MB for desktop linux -- that's not a 'use case', that's just plain weird)
ag123 Posted September 27, 2018 Posted September 27, 2018 i think the issue about the vm.swapiness is that some of the processes, e.g. sshd after it is swapped out i.e. not resident in memory, may not respond or respond too slowly to connects over the network. i.e. that for some reason, external connects did not initiate a swapped out process image to be reinstated in memory in a normal manner if that's the case any attempts to connect over ssh fails reducing vm.swapiness in this case makes it more likely that sshd remains resident in memory rather than being swapped out, hence it alleviates the problem, but this likely won't resolve it entirely
tkaiser Posted September 27, 2018 Posted September 27, 2018 17 minutes ago, ag123 said: i think the issue about the vm.swapiness is that some of the processes, e.g. sshd after it is swapped out i.e. not resident in memory, may not respond or respond too slowly to connects over the network WTF? Can we please stop developing weird theories but focus on what's happening (which requires a diagnostic attempt collecting some data)? Without swap it could be possible that the sshd gets quit by the oom-killer (very very very unlikely though because the memory footprint of sshd is pretty low) but if it would be possible that swapping in Linux results in daemons not responding any more or not fast enough we all would've stopped using Linux a long time ago. And can you please try to understand that we're not talking about 'swap on crappy storage' (HDD) any more but about zram: that's just decompressing swapped out pages back into another memory area which is lightning fast compared to the old attempts with swap on HDD.
Moklev Posted September 27, 2018 Posted September 27, 2018 First log (reduced because too large): Orange Pi Zero (vm.swappiness=30) https://ghostbin.com/paste/vxqdd
ag123 Posted September 27, 2018 Posted September 27, 2018 1 hour ago, tkaiser said: WTF? Can we please stop developing weird theories but focus on what's happening (which requires a diagnostic attempt collecting some data)? Without swap it could be possible that the sshd gets quit by the oom-killer (very very very unlikely though because the memory footprint of sshd is pretty low) but if it would be possible that swapping in Linux results in daemons not responding any more or not fast enough we all would've stopped using Linux a long time ago. And can you please try to understand that we're not talking about 'swap on crappy storage' (HDD) any more but about zram: that's just decompressing swapped out pages back into another memory area which is lightning fast compared to the old attempts with swap on HDD. i won't be able to prove this easily, but if setting vm-swapiness to values less than 100 alleviates the issue, could it point to some possible issues related to or with zram? some google searches draw a blank as various similar issues are found but not identical to this, i.e. that zram isn't specifically mentioned. the other thing would be that it may be necessary to have the oom killer log the processes killed into /var/log/messages (i.e. by syslog or journald) for post mortem analysis of what actually happened, e.g. is sshd literally killed? i think oom killer currently only logs to dmesg, which means that the traces are lost once the board/soc is reset / restarted. i'm not too sure where such logs could be configured to be captured (kswapd?) such that if a paged out / swapped memory could not be restored the event be logged in syslog etc
Moklev Posted September 27, 2018 Posted September 27, 2018 vm.swappiness=100 https://ghostbin.com/paste/vrt5t With vm.swappiness=30: system work fine With wm.swappiness=100: system work fine for a random time (3-12 h), then hang with ssh unreachable, yellow ethernet led fixed on, pihole/motioneye/rpi monitor web pages unreachables. Hardware: OrangePI Zero v1.4 - Sandisk uSD 16GB U1 A1 (checked, good healt) EXT4 - Toshiba USB Stick 32GB (checked, good healt) F2FS, USB PSU FriendlyARM 5V/3A (checked, good healt).
tkaiser Posted September 27, 2018 Posted September 27, 2018 Let's stop here. I won't further waste my time with this 'report' (which is just a bunch of theories, assumptions and a weird methodology to back an idea). @Moklev: I was asking for armbianmonitor -u output but got only a redacted/censored variant (the line numbers are there for a reason). I was asking for what's the output of 'free -m' NOW. As in 'with your vm.swappiness=30 or 60 setting). Instead you rebooted (why?! You can adjust vm.swappiness all the time, no reboot needed). Providing 'free -m' output directly after a reboot is pointless as you see ZERO swapping happened. And providing a log containing information from 14.36.28 until 14.53.31 is pointless too. 14 minutes ago, Moklev said: With wm.swappiness=100: system work fine for a random time (3-12 h), then hang with ... rpi monitor web pages unreachables Ok On 9/26/2018 at 10:19 AM, Moklev said: Yes, I've starting monitoring... Yesterday you started monitoring. Now you report having switched from vm.swappiness=30 to 100. But you're able to report that with 100 settings your board froze and even RPi Monitor web page not being accessible. So you tested within the last 29 hours already twice with 100 settings and were able to report your board crashing (since you talk about '3-12 h' -- if these 3-12h is some anecdotical story from days ago I'm not interested in. It's only relevant what happens now with some monitoring installed able to provide insights). I'm interested in resolving real problems if there are any. What happens here is developing weird theories and trying to back them. Unless I get data I won't look any further into this. If you had RPi-Monitor running the graphs are available even after a reboot so you can share them focussing on the last hour prior to latest crash (Cpufreq, temperature, load). Also if 'ssh unreachable, yellow ethernet led fixed on, pihole/motioneye/rpi monitor web pagesunreachable' is the symptom then you report your board having freezes or crashed (and not sshd not being responsive any more).
ag123 Posted September 27, 2018 Posted September 27, 2018 after some searches i've only found a related but different issue https://github.com/armbian/build/issues/219 that relates mainly to kswapd taking 100% cpu - found quite a few instances in google searches not related to armbian. that particular issue seemed to be alleviated by vm.swappiness=0 but given the rather large lapse of time that issue is raised in 2016, and that it is for a different kernel 3.4.x it may not be directly relevant i think for the time being selecting a vm.swappiness in a mid range between 0 to 100 seem to alleviate related issues surfaced currently
tkaiser Posted September 27, 2018 Posted September 27, 2018 5 minutes ago, ag123 said: i think for the time being selecting a vm.swappiness in a mid range between 0 to 100 seem to alleviate related issues surfaced currently Are you trolling? Which 'related issues'? I only know of one overheating Orange Pi Zero that for whatever reasons is/was unstable. Zero useful efforts taken to diagnose this single problem other than a huge waste of time right now.
ag123 Posted September 27, 2018 Posted September 27, 2018 nope, i'm referring to the non-responding sshd, rpi-monitor freezes etc. that seemed to be alleviated selecting a vm.swappiness that is moderate say 50 etc
tkaiser Posted September 27, 2018 Posted September 27, 2018 Ok, this is pure madness here. Won't look into this thread for some time. @ag123 it's really nice that you're creative and able to develop theories about Linux being broken in general (breaking the system by swapping out important daemons -- I guess you also think kernel developers behave that moronic that they allow to swap out the kernel itself?). It also is nice that you ignore reality (@Moklev having reported a system freeze and not sshd being unresponsive). It also doesn't matter that much that you create out of one single report of 'freezes since some time' for reasons yet unknown a proof that there's something wrong with the new vm.swappiness default in Armbian and talk about 'related issues surfaced currently' (issues --> plural). I simply won't read these funny stories any more since they're preventing time better spent on something useful. Here are the download stats: https://dl.armbian.com/_download-stats/ Check when 5.60 has been released, count the number of downloads, count the reports of boards freezing/crashing in the forum and you know about 'issues surfaced currently'.
Moklev Posted September 27, 2018 Posted September 27, 2018 30 minutes ago, tkaiser said: I was asking for armbianmonitor -u output but got only a redacted/censored variant (the line numbers are there for a reason). I was asking for what's the output of 'free -m' NOW. As in 'with your vm.swappiness=30 or 60 setting). Instead you rebooted (why?! You can adjust vm.swappiness all the time, no reboot needed). It's a problem, "armbianmonitor -u" is broken, all free service (pastebin, ghostbin, etc...) are limited to 512kB-1,5MB. Now is not possibile to upload all necessary data. 30 minutes ago, tkaiser said: Providing 'free -m' output directly after a reboot is pointless as you see ZERO swapping happened. And providing a log containing information from 14.36.28 until 14.53.31 is pointless too. Ok Yesterday you started monitoring. Now you report having switched from vm.swappiness=30 to 100. But you're able to report that with 100 settings your board froze and even RPi Monitor web page not being accessible. So you tested within the last 29 hours already twice with 100 settings and were able to report your board crashing (since you talk about '3-12 h' -- if these 3-12h is some anecdotical story from days ago I'm not interested in. It's only relevant what happens now with some monitoring installed able to provide insights). SBC is crashed on 20.09 (boot at 13:05, crashed at 15:35*), on 21.09 (boot at 14.00 then crashed at 17.37*) and on 22.09 (boot at 13:35 then crashed at 23.12*). (*): timestamp of the last picture shooted and processed. Sunday (23.09) I've changed the vm.swappiness to 30 and the problem has been solved. Now I need the following directions to help you: - when to run the "free -m" command - how to send the log if armbianmonitor does not work - how long to run the indicated scripts I need at least 1 or 2 weeks to do everything...
Recommended Posts