rick Posted December 1, 2016 Share Posted December 1, 2016 So I thought I had a stable system with my OPI PC the 4.7.3-sun8i kernel (for BTRFS support), but the OOM killer keeps silently triggering. I can see it on the serial term and in the logs. So I have done some troubleshooting. I added the following t /etc/sysctl.conf: vm.swappiness=10 # try a value that will actually use the swapvm.min_free_kbytes=64000 # ensure that some memory is always available, held back from file caching.vm.oom_kill_allocating_task=1 # kill the offending task that requested the memory and not the task with the most memory to give up to see if there is an obvious trigger culprit. So, what I found is that the systems memory usually looks like this when running my system restore script: total used free shared buff/cache availableMem: 1026284 195292 123140 3284 707852 577196Swap: 1097724 23340 1074384 So it is not a memory shortage. It is triggred by random things that are not using much memory like: [13457.612306] Out of memory (oom_kill_allocating_task): Kill process 6285 (sessionclean) score 0 or sacrifice child[13457.622666] Killed process 6285 (sessionclean) total-vm:1460kB, anon-rss:76kB, file-rss:0kB, shmem-rss:0kB[15248.539294] Out of memory (oom_kill_allocating_task): Kill process 6593 (phpquery) score 0 or sacrifice child[15248.549794] Killed process 6593 (phpquery) total-vm:1460kB, anon-rss:104kB, file-rss:0kB, shmem-rss:0kB It's killed sh, watch, ls, .... all small potatoes. Sometimes a big one like apache or mysql too. Any ideas? Kernel bug or something I can fix? It does seem to like to crash during this command in a script that restores a folder under /var/www from a remote tgz, I have been running the restore over and over to crash the system with it. It only OOM's some times: (ssh $destuser "cat $destdir/www" |pv|unpigz| tar -xp ) || abort Any ideas? I tried to figure it out myself but I am in over my head so..... Link to comment Share on other sites More sharing options...
rick Posted December 3, 2016 Author Share Posted December 3, 2016 These OPI PC boards might have to be abandon as unstable. I can't get it to run stable with the 4.7.3 kernel and it crashes on boot with the 4.8.3. I have replaced the power supply wth a 3A buck boost device & fat leads that will ensure a steady 5.15V at the input and 5.05V at the usb ports. It does not seem to be a power issue, and it seems that the power was good (4.93v -5.07V at the USB port output) before too. Ok, so I made my btrfs root fs a raid1 and ran the above and started to get this under the extra load: [ 3641.664542] kernel BUG at lib/dynamic_queue_limits.c:26! [ 3641.669931] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 3641.675133] Modules linked in: xt_multiport evdev ir_lirc_codec lirc_dev sunxi_cir sun8i_ths cpufreq_dt uio_pdrv_genirq uio thermal_sys ip6t_REJECT s[ 3641.721723] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.7.3-sun8i #5 [ 3641.729390] Hardware name: Allwinner sun8i Family [ 3641.734166] task: c0d05f00 ti: c0d00000 task.ti: c0d00000 [ 3641.739653] PC is at dql_completed+0x100/0x17c [ 3641.744176] LR is at sun8i_emac_poll+0x26c/0x6a0 [ 3641.748867] pc : [<c0562de0>] lr : [<c06353dc>] psr: 800e0113 [ 3641.748867] sp : c0d01e14 ip : 00000000 fp : 00015600 [ 3641.760456] r10: 00000040 r9 : ef061518 r8 : ef061000 [ 3641.765748] r7 : 00000001 r6 : 02853a24 r5 : 00000000 r4 : 02853a24 [ 3641.772341] r3 : 00000000 r2 : 00000c00 r1 : 00000066 r0 : ee00c680 [ 3641.778939] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 3641.786145] Control: 10c5387d Table: 50d6006a DAC: 00000051 [ 3641.791959] Process swapper/0 (pid: 0, stack limit = 0xc0d00210) [ 3641.798033] Stack: (0xc0d01e14 to 0xc0d02000) [ 3641.802462] 1e00: ee00c600 00000000 00000066 [ 3641.810759] 1e20: 00000001 ef061000 ef061518 c06353dc 00000000 ef6ac280 00000040 c0d080cc [ 3641.819057] 1e40: ef061518 2ea4a000 c0b2dee8 c0c62b80 00000100 ef061518 c0635170 0005194e [ 3641.827353] 1e60: 0000012c c0d02100 00000040 c0d01e90 2ea4a000 c076513c ef6acb80 c0c62b80 [ 3641.835650] 1e80: c0d49faf c0d03380 c0b2dee8 c0b31c0c c0d01e90 c0d01e90 c0d01e98 c0d01e98 [ 3641.843946] 1ea0: 00000001 00000000 00000003 c0d00000 c0d0208c c0d02080 00000100 c0d02080 [ 3641.852244] 1ec0: 40000003 c0124218 00000000 ef1e8000 c0d01ec8 c0d4ba40 0000000a 0005194d [ 3641.860541] 1ee0: c0d02100 00200000 c0d0250c ffffe000 00000000 00000000 00000001 ef008000 [ 3641.868836] 1f00: f0803000 c0d49d5b c0d0250c c01245f0 c0c604bc c016b6e4 c0d1e368 c0d02848 [ 3641.877136] 1f20: f080200c c0d01f50 f0802000 c010148c c0108160 600e0013 ffffffff c0d01f84 [ 3641.885437] 1f40: c0c61578 c0d02504 c0d49d5b c010be14 00000001 00000000 00000000 c0119680 [ 3641.893737] 1f60: c0d00000 c0d0249c 00000000 00000000 c0c61578 c0d02504 c0d49d5b c0d0250c [ 3641.902037] 1f80: 600e0013 c0d01fa0 c010815c c0108160 600e0013 ffffffff 00000051 00000000 [ 3641.910337] 1fa0: c0d00000 c015ca20 c0d01fb4 c0d01fa8 00000000 ffffffff 00000000 c0c00c7c [ 3641.918637] 1fc0: ffffffff ffffffff 00000000 c0c00690 00000000 c0c48a30 c0d4b294 c0d02480 [ 3641.926934] 1fe0: c0c48a2c c0d079e4 4000406a 410fc075 00000000 4000807c 00000000 00000000 [ 3641.935267] [<c0562de0>] (dql_completed) from [<c06353dc>] (sun8i_emac_poll+0x26c/0x6a0) [ 3641.943492] [<c06353dc>] (sun8i_emac_poll) from [<c076513c>] (net_rx_action+0x1f4/0x2e4) [ 3641.951715] [<c076513c>] (net_rx_action) from [<c0124218>] (__do_softirq+0xfc/0x218) [ 3641.959589] [<c0124218>] (__do_softirq) from [<c01245f0>] (irq_exit+0xb8/0x118) [ 3641.967026] [<c01245f0>] (irq_exit) from [<c016b6e4>] (__handle_domain_irq+0x60/0xb4) [ 3641.974982] [<c016b6e4>] (__handle_domain_irq) from [<c010148c>] (gic_handle_irq+0x48/0x8c) [ 3641.983457] [<c010148c>] (gic_handle_irq) from [<c010be14>] (__irq_svc+0x54/0x70) [ 3641.991038] Exception stack(0xc0d01f50 to 0xc0d01f98) [ 3641.996163] 1f40: 00000001 00000000 00000000 c0119680 [ 3642.004458] 1f60: c0d00000 c0d0249c 00000000 00000000 c0c61578 c0d02504 c0d49d5b c0d0250c [ 3642.012747] 1f80: 600e0013 c0d01fa0 c010815c c0108160 600e0013 ffffffff [ 3642.019459] [<c010be14>] (__irq_svc) from [<c0108160>] (arch_cpu_idle+0x38/0x3c) [ 3642.026982] [<c0108160>] (arch_cpu_idle) from [<c015ca20>] (cpu_startup_entry+0x1b8/0x214) [ 3642.035380] [<c015ca20>] (cpu_startup_entry) from [<c0c00c7c>] (start_kernel+0x390/0x39c) [ 3642.043674] Code: e3520000 1a000002 e1a03005 eaffffdd (e7f001f2) [ 3642.049843] ---[ end trace e15ecc4671d2fcf3 ]--- [ 3642.054528] Kernel panic - not syncing: Fatal exception in interrupt [ 3642.060989] CPU3: stopping [ 3642.063818] CPU: 3 PID: 8407 Comm: unpigz Tainted: G D W 4.7.3-sun8i #5 [ 3642.071483] Hardware name: Allwinner sun8i Family [ 3642.076301] [<c010ea00>] (unwind_backtrace) from [<c010b350>] (show_stack+0x10/0x14) [ 3642.084172] [<c010b350>] (show_stack) from [<c05381e4>] (dump_stack+0x84/0x98) [ 3642.091519] [<c05381e4>] (dump_stack) from [<c010d888>] (handle_IPI+0x170/0x190) [ 3642.099035] [<c010d888>] (handle_IPI) from [<c01014cc>] (gic_handle_irq+0x88/0x8c) [ 3642.106722] [<c01014cc>] (gic_handle_irq) from [<c010c110>] (__irq_usr+0x50/0x80) [ 3642.114302] Exception stack(0xd0df1fb0 to 0xd0df1ff8) [ 3642.119427] 1fa0: 0003e715 000000c6 00057212 0004727e [ 3642.127721] 1fc0: 00000012 00055bcf 0005515d 00043d05 00098650 00098eb0 bee3d264 00098120 [ 3642.136011] 1fe0: 00055bd3 bee3d0f0 00000001 b6eee882 200e0030 ffffffff [ 3642.142696] CPU2: stopping [ 3642.145492] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D W 4.7.3-sun8i #5 [ 3642.153157] Hardware name: Allwinner sun8i Family [ 3642.157956] [<c010ea00>] (unwind_backtrace) from [<c010b350>] (show_stack+0x10/0x14) [ 3642.165820] [<c010b350>] (show_stack) from [<c05381e4>] (dump_stack+0x84/0x98) [ 3642.173165] [<c05381e4>] (dump_stack) from [<c010d888>] (handle_IPI+0x170/0x190) [ 3642.180680] [<c010d888>] (handle_IPI) from [<c01014cc>] (gic_handle_irq+0x88/0x8c) [ 3642.188367] [<c01014cc>] (gic_handle_irq) from [<c010be14>] (__irq_svc+0x54/0x70) [ 3642.195946] Exception stack(0xef0a5f88 to 0xef0a5fd0) [ 3642.201074] 5f80: 00000001 00000000 00000000 c0119680 ef0a4000 c0d0249c [ 3642.209368] 5fa0: 00000000 00000000 c0c61578 c0d02504 c0d49d5b c0d0250c e6a93acc ef0a5fd8 [ 3642.217651] 5fc0: c010815c c0108160 60030013 ffffffff [ 3642.222793] [<c010be14>] (__irq_svc) from [<c0108160>] (arch_cpu_idle+0x38/0x3c) [ 3642.230313] [<c0108160>] (arch_cpu_idle) from [<c015ca20>] (cpu_startup_entry+0x1b8/0x214) [ 3642.238697] [<c015ca20>] (cpu_startup_entry) from [<4010156c>] (0x4010156c) [ 3642.245730] CPU1: stopping [ 3642.248526] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D W 4.7.3-sun8i #5 [ 3642.256190] Hardware name: Allwinner sun8i Family [ 3642.260988] [<c010ea00>] (unwind_backtrace) from [<c010b350>] (show_stack+0x10/0x14) [ 3642.268854] [<c010b350>] (show_stack) from [<c05381e4>] (dump_stack+0x84/0x98) [ 3642.276199] [<c05381e4>] (dump_stack) from [<c010d888>] (handle_IPI+0x170/0x190) [ 3642.283713] [<c010d888>] (handle_IPI) from [<c01014cc>] (gic_handle_irq+0x88/0x8c) [ 3642.291400] [<c01014cc>] (gic_handle_irq) from [<c010be14>] (__irq_svc+0x54/0x70) [ 3642.298980] Exception stack(0xef0a3f88 to 0xef0a3fd0) [ 3642.304109] 3f80: 00000001 00000000 00000000 c0119680 ef0a2000 c0d0249c [ 3642.312403] 3fa0: 00000000 00000000 c0c61578 c0d02504 c0d49d5b c0d0250c 00000000 ef0a3fd8 [ 3642.320686] 3fc0: c010815c c0108160 60070013 ffffffff [ 3642.325826] [<c010be14>] (__irq_svc) from [<c0108160>] (arch_cpu_idle+0x38/0x3c) [ 3642.333345] [<c0108160>] (arch_cpu_idle) from [<c015ca20>] (cpu_startup_entry+0x1b8/0x214) [ 3642.341726] [<c015ca20>] (cpu_startup_entry) from [<4010156c>] (0x4010156c) [ 3642.348759] Rebooting in 10 seconds.. Link to comment Share on other sites More sharing options...
rick Posted December 3, 2016 Author Share Posted December 3, 2016 Ok, so I discovered and installed OPI-monitor. When top/free look like the following, OPI-monitor's graph says available memory is ZERO!!!!! That would explain the OOM invocations. total used free shared buff/cache availableMem: 1026284 109608 613652 13444 303024 680576Swap: 1097724 0 1097724 How can this be? Any ideas? Fixes? Link to comment Share on other sites More sharing options...
Recommended Posts