rick Posted December 1, 2016 Posted December 1, 2016 So I thought I had a stable system with my OPI PC the 4.7.3-sun8i kernel (for BTRFS support), but the OOM killer keeps silently triggering. I can see it on the serial term and in the logs. So I have done some troubleshooting. I added the following t /etc/sysctl.conf: vm.swappiness=10 # try a value that will actually use the swapvm.min_free_kbytes=64000 # ensure that some memory is always available, held back from file caching.vm.oom_kill_allocating_task=1 # kill the offending task that requested the memory and not the task with the most memory to give up to see if there is an obvious trigger culprit. So, what I found is that the systems memory usually looks like this when running my system restore script: total used free shared buff/cache availableMem: 1026284 195292 123140 3284 707852 577196Swap: 1097724 23340 1074384 So it is not a memory shortage. It is triggred by random things that are not using much memory like: [13457.612306] Out of memory (oom_kill_allocating_task): Kill process 6285 (sessionclean) score 0 or sacrifice child[13457.622666] Killed process 6285 (sessionclean) total-vm:1460kB, anon-rss:76kB, file-rss:0kB, shmem-rss:0kB[15248.539294] Out of memory (oom_kill_allocating_task): Kill process 6593 (phpquery) score 0 or sacrifice child[15248.549794] Killed process 6593 (phpquery) total-vm:1460kB, anon-rss:104kB, file-rss:0kB, shmem-rss:0kB It's killed sh, watch, ls, .... all small potatoes. Sometimes a big one like apache or mysql too. Any ideas? Kernel bug or something I can fix? It does seem to like to crash during this command in a script that restores a folder under /var/www from a remote tgz, I have been running the restore over and over to crash the system with it. It only OOM's some times: (ssh $destuser "cat $destdir/www" |pv|unpigz| tar -xp ) || abort Any ideas? I tried to figure it out myself but I am in over my head so.....
rick Posted December 3, 2016 Author Posted December 3, 2016 These OPI PC boards might have to be abandon as unstable. I can't get it to run stable with the 4.7.3 kernel and it crashes on boot with the 4.8.3. I have replaced the power supply wth a 3A buck boost device & fat leads that will ensure a steady 5.15V at the input and 5.05V at the usb ports. It does not seem to be a power issue, and it seems that the power was good (4.93v -5.07V at the USB port output) before too. Ok, so I made my btrfs root fs a raid1 and ran the above and started to get this under the extra load: [ 3641.664542] kernel BUG at lib/dynamic_queue_limits.c:26! [ 3641.669931] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 3641.675133] Modules linked in: xt_multiport evdev ir_lirc_codec lirc_dev sunxi_cir sun8i_ths cpufreq_dt uio_pdrv_genirq uio thermal_sys ip6t_REJECT s[ 3641.721723] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.7.3-sun8i #5 [ 3641.729390] Hardware name: Allwinner sun8i Family [ 3641.734166] task: c0d05f00 ti: c0d00000 task.ti: c0d00000 [ 3641.739653] PC is at dql_completed+0x100/0x17c [ 3641.744176] LR is at sun8i_emac_poll+0x26c/0x6a0 [ 3641.748867] pc : [<c0562de0>] lr : [<c06353dc>] psr: 800e0113 [ 3641.748867] sp : c0d01e14 ip : 00000000 fp : 00015600 [ 3641.760456] r10: 00000040 r9 : ef061518 r8 : ef061000 [ 3641.765748] r7 : 00000001 r6 : 02853a24 r5 : 00000000 r4 : 02853a24 [ 3641.772341] r3 : 00000000 r2 : 00000c00 r1 : 00000066 r0 : ee00c680 [ 3641.778939] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 3641.786145] Control: 10c5387d Table: 50d6006a DAC: 00000051 [ 3641.791959] Process swapper/0 (pid: 0, stack limit = 0xc0d00210) [ 3641.798033] Stack: (0xc0d01e14 to 0xc0d02000) [ 3641.802462] 1e00: ee00c600 00000000 00000066 [ 3641.810759] 1e20: 00000001 ef061000 ef061518 c06353dc 00000000 ef6ac280 00000040 c0d080cc [ 3641.819057] 1e40: ef061518 2ea4a000 c0b2dee8 c0c62b80 00000100 ef061518 c0635170 0005194e [ 3641.827353] 1e60: 0000012c c0d02100 00000040 c0d01e90 2ea4a000 c076513c ef6acb80 c0c62b80 [ 3641.835650] 1e80: c0d49faf c0d03380 c0b2dee8 c0b31c0c c0d01e90 c0d01e90 c0d01e98 c0d01e98 [ 3641.843946] 1ea0: 00000001 00000000 00000003 c0d00000 c0d0208c c0d02080 00000100 c0d02080 [ 3641.852244] 1ec0: 40000003 c0124218 00000000 ef1e8000 c0d01ec8 c0d4ba40 0000000a 0005194d [ 3641.860541] 1ee0: c0d02100 00200000 c0d0250c ffffe000 00000000 00000000 00000001 ef008000 [ 3641.868836] 1f00: f0803000 c0d49d5b c0d0250c c01245f0 c0c604bc c016b6e4 c0d1e368 c0d02848 [ 3641.877136] 1f20: f080200c c0d01f50 f0802000 c010148c c0108160 600e0013 ffffffff c0d01f84 [ 3641.885437] 1f40: c0c61578 c0d02504 c0d49d5b c010be14 00000001 00000000 00000000 c0119680 [ 3641.893737] 1f60: c0d00000 c0d0249c 00000000 00000000 c0c61578 c0d02504 c0d49d5b c0d0250c [ 3641.902037] 1f80: 600e0013 c0d01fa0 c010815c c0108160 600e0013 ffffffff 00000051 00000000 [ 3641.910337] 1fa0: c0d00000 c015ca20 c0d01fb4 c0d01fa8 00000000 ffffffff 00000000 c0c00c7c [ 3641.918637] 1fc0: ffffffff ffffffff 00000000 c0c00690 00000000 c0c48a30 c0d4b294 c0d02480 [ 3641.926934] 1fe0: c0c48a2c c0d079e4 4000406a 410fc075 00000000 4000807c 00000000 00000000 [ 3641.935267] [<c0562de0>] (dql_completed) from [<c06353dc>] (sun8i_emac_poll+0x26c/0x6a0) [ 3641.943492] [<c06353dc>] (sun8i_emac_poll) from [<c076513c>] (net_rx_action+0x1f4/0x2e4) [ 3641.951715] [<c076513c>] (net_rx_action) from [<c0124218>] (__do_softirq+0xfc/0x218) [ 3641.959589] [<c0124218>] (__do_softirq) from [<c01245f0>] (irq_exit+0xb8/0x118) [ 3641.967026] [<c01245f0>] (irq_exit) from [<c016b6e4>] (__handle_domain_irq+0x60/0xb4) [ 3641.974982] [<c016b6e4>] (__handle_domain_irq) from [<c010148c>] (gic_handle_irq+0x48/0x8c) [ 3641.983457] [<c010148c>] (gic_handle_irq) from [<c010be14>] (__irq_svc+0x54/0x70) [ 3641.991038] Exception stack(0xc0d01f50 to 0xc0d01f98) [ 3641.996163] 1f40: 00000001 00000000 00000000 c0119680 [ 3642.004458] 1f60: c0d00000 c0d0249c 00000000 00000000 c0c61578 c0d02504 c0d49d5b c0d0250c [ 3642.012747] 1f80: 600e0013 c0d01fa0 c010815c c0108160 600e0013 ffffffff [ 3642.019459] [<c010be14>] (__irq_svc) from [<c0108160>] (arch_cpu_idle+0x38/0x3c) [ 3642.026982] [<c0108160>] (arch_cpu_idle) from [<c015ca20>] (cpu_startup_entry+0x1b8/0x214) [ 3642.035380] [<c015ca20>] (cpu_startup_entry) from [<c0c00c7c>] (start_kernel+0x390/0x39c) [ 3642.043674] Code: e3520000 1a000002 e1a03005 eaffffdd (e7f001f2) [ 3642.049843] ---[ end trace e15ecc4671d2fcf3 ]--- [ 3642.054528] Kernel panic - not syncing: Fatal exception in interrupt [ 3642.060989] CPU3: stopping [ 3642.063818] CPU: 3 PID: 8407 Comm: unpigz Tainted: G D W 4.7.3-sun8i #5 [ 3642.071483] Hardware name: Allwinner sun8i Family [ 3642.076301] [<c010ea00>] (unwind_backtrace) from [<c010b350>] (show_stack+0x10/0x14) [ 3642.084172] [<c010b350>] (show_stack) from [<c05381e4>] (dump_stack+0x84/0x98) [ 3642.091519] [<c05381e4>] (dump_stack) from [<c010d888>] (handle_IPI+0x170/0x190) [ 3642.099035] [<c010d888>] (handle_IPI) from [<c01014cc>] (gic_handle_irq+0x88/0x8c) [ 3642.106722] [<c01014cc>] (gic_handle_irq) from [<c010c110>] (__irq_usr+0x50/0x80) [ 3642.114302] Exception stack(0xd0df1fb0 to 0xd0df1ff8) [ 3642.119427] 1fa0: 0003e715 000000c6 00057212 0004727e [ 3642.127721] 1fc0: 00000012 00055bcf 0005515d 00043d05 00098650 00098eb0 bee3d264 00098120 [ 3642.136011] 1fe0: 00055bd3 bee3d0f0 00000001 b6eee882 200e0030 ffffffff [ 3642.142696] CPU2: stopping [ 3642.145492] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D W 4.7.3-sun8i #5 [ 3642.153157] Hardware name: Allwinner sun8i Family [ 3642.157956] [<c010ea00>] (unwind_backtrace) from [<c010b350>] (show_stack+0x10/0x14) [ 3642.165820] [<c010b350>] (show_stack) from [<c05381e4>] (dump_stack+0x84/0x98) [ 3642.173165] [<c05381e4>] (dump_stack) from [<c010d888>] (handle_IPI+0x170/0x190) [ 3642.180680] [<c010d888>] (handle_IPI) from [<c01014cc>] (gic_handle_irq+0x88/0x8c) [ 3642.188367] [<c01014cc>] (gic_handle_irq) from [<c010be14>] (__irq_svc+0x54/0x70) [ 3642.195946] Exception stack(0xef0a5f88 to 0xef0a5fd0) [ 3642.201074] 5f80: 00000001 00000000 00000000 c0119680 ef0a4000 c0d0249c [ 3642.209368] 5fa0: 00000000 00000000 c0c61578 c0d02504 c0d49d5b c0d0250c e6a93acc ef0a5fd8 [ 3642.217651] 5fc0: c010815c c0108160 60030013 ffffffff [ 3642.222793] [<c010be14>] (__irq_svc) from [<c0108160>] (arch_cpu_idle+0x38/0x3c) [ 3642.230313] [<c0108160>] (arch_cpu_idle) from [<c015ca20>] (cpu_startup_entry+0x1b8/0x214) [ 3642.238697] [<c015ca20>] (cpu_startup_entry) from [<4010156c>] (0x4010156c) [ 3642.245730] CPU1: stopping [ 3642.248526] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D W 4.7.3-sun8i #5 [ 3642.256190] Hardware name: Allwinner sun8i Family [ 3642.260988] [<c010ea00>] (unwind_backtrace) from [<c010b350>] (show_stack+0x10/0x14) [ 3642.268854] [<c010b350>] (show_stack) from [<c05381e4>] (dump_stack+0x84/0x98) [ 3642.276199] [<c05381e4>] (dump_stack) from [<c010d888>] (handle_IPI+0x170/0x190) [ 3642.283713] [<c010d888>] (handle_IPI) from [<c01014cc>] (gic_handle_irq+0x88/0x8c) [ 3642.291400] [<c01014cc>] (gic_handle_irq) from [<c010be14>] (__irq_svc+0x54/0x70) [ 3642.298980] Exception stack(0xef0a3f88 to 0xef0a3fd0) [ 3642.304109] 3f80: 00000001 00000000 00000000 c0119680 ef0a2000 c0d0249c [ 3642.312403] 3fa0: 00000000 00000000 c0c61578 c0d02504 c0d49d5b c0d0250c 00000000 ef0a3fd8 [ 3642.320686] 3fc0: c010815c c0108160 60070013 ffffffff [ 3642.325826] [<c010be14>] (__irq_svc) from [<c0108160>] (arch_cpu_idle+0x38/0x3c) [ 3642.333345] [<c0108160>] (arch_cpu_idle) from [<c015ca20>] (cpu_startup_entry+0x1b8/0x214) [ 3642.341726] [<c015ca20>] (cpu_startup_entry) from [<4010156c>] (0x4010156c) [ 3642.348759] Rebooting in 10 seconds..
rick Posted December 3, 2016 Author Posted December 3, 2016 Ok, so I discovered and installed OPI-monitor. When top/free look like the following, OPI-monitor's graph says available memory is ZERO!!!!! That would explain the OOM invocations. total used free shared buff/cache availableMem: 1026284 109608 613652 13444 303024 680576Swap: 1097724 0 1097724 How can this be? Any ideas? Fixes?
Recommended Posts