Jump to content

Hanging on kernel load after Shutdown/Reboot/Kernel panic


Lore

Recommended Posts

I have run into this issue multiple times and the process flow is this:

 

Flash image "Armbian_20.11.6_Helios64_buster_current_5.9.14.img" to a micro SD card.

Run through the config (installing OMV with SAMBA)

Kick over to the OMV to configure the share / media so its available on my network

Runs for some time (this last install was running for maybe a few weeks before it triggered a Kernel panic)

Power off.

Power on.

Board boots, hangs on "Starting Kernel..."

 

Is there some trick I am missing to get this thing to actually boot / reboot correctly?

 

Troubleshooting info:

No jumpers are shorted.

I had a similar issue with a previous img as well.

Link to comment
Share on other sites

I was able to get it booting.  The following is the error that was causing it to hang.

 

Begin: Running /scripts/local-premount ... Scanning for Btrfs filesystems
done.
Begin: Will now check root file system ... fsck from util-linux 2.33.1
[/sbin/fsck.ext4 (1) -- /dev/mmcblk0p1] fsck.ext4 -a -C0 /dev/mmcblk0p1 
/dev/mmcblk0p1: recovering journal
/dev/mmcblk0p1 contains a file system with errors, check forced.
/dev/mmcblk0p1: Inode 753 seems to contain garbage.  

/dev/mmcblk0p1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
	(i.e., without -a or -p options)
fsck exited with status code 4
done.
Failure: File system check of the root filesystem failed
The root filesystem on /dev/mmcblk0p1 requires a manual fsck

 

I then just ran

fsck.ext4 /dev/mmcblk0p1

 

 

After that it ran and booted to OMV on its own.

Link to comment
Share on other sites

So, I came down tonight to the system in the same state, halted on the FSCK.

 

Seems to have booted fine after I manually ran it, but if it reboots every few days and I have to go do the FSCK every time, I might actually cry.

Link to comment
Share on other sites

@Lore you could add fsck.repair=yes to extraargs line on /boot/armbianEnv.txt

 

extraargs=earlyprintk ignore_loglevel fsck.repair=yes

it will run

fsck.ext4 -y /dev/mmcblk0p1

automatically.

 

---

You have another issue (I'd say bigger issue), why you often got files system corruption.

Does it always happen after a reboot?

Could you monitor/log serial console during reboot ? maybe there are some processes that killed forcefully by systemd because missing the timeout to shutdown

 

 

Link to comment
Share on other sites

The corruption seems to happen over time, if I leave it run for a few days it will end up rebooting itself and Ill have to FSCK again before it will boot normally. 

 

I just did a FSCK, so I can try to reboot manually but I dont think that will show me any corruption.  If I leave it to do its thing with the console up and monitoring I can hopefully capture the output and Ill update.

Link to comment
Share on other sites

So, it rebooted itself.  Here is the reboot output.

 

Spoiler

[79083.780521] Unable to handle kernel paging request at virtual address 7bff0000f77baf8c
[79083.781213] Mem abort info:
[79083.781458]   ESR = 0x96000004
[79083.781727]   EC = 0x25: DABT (current EL), IL = 32 bits
[79083.782189]   SET = 0, FnV = 0
[79083.782457]   EA = 0, S1PTW = 0
[79083.782733] Data abort info:
[79083.782985]   ISV = 0, ISS = 0x00000004
[79083.783319]   CM = 0, WnR = 0
[79083.783580] [7bff0000f77baf8c] address between user and kernel address ranges
[79083.784202] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[79083.784688] Modules linked in: softdog governor_performance rfkill quota_v2 quota_tree r8152 snd_soc_hdmi_codec hantro_vpu(C) zstd rockchip_rga rockchip_vdec(C) v4l2_h264 videobuf2_dma_contig leds_pwm videobuf2_vmalloc v4l2_mem2mem gpio_charger videobuf2_dma_sg panfrost snd_soc_rockchip_i2s pwm_fan rockchipdrm videobuf2_memops videobuf2_v4l2 snd_soc_core fusb302 gpu_sched tcpm dw_mipi_dsi snd_pcm_dmaengine typec videobuf2_common snd_pcm dw_hdmi analogix_dp drm_kms_helper snd_timer cec snd sg videodev rc_core soundcore mc drm drm_panel_orientation_quirks gpio_beeper cpufreq_dt zram nfsd auth_rpcgss nfs_acl lockd grace sunrpc ledtrig_netdev lm75 ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx realtek md_mod dwmac_rk stmmac_platform stmmac mdio_xpcs adc_keys
[79083.791149] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G         C        5.9.14-rockchip64 #20.11.4
[79083.791913] Hardware name: Helios64 (DT)
[79083.792257] pstate: 80000085 (Nzcv daIf -PAN -UAO BTYPE=--)
[79083.792753] pc : tick_sched_handle.isra.19+0xc/0x58
[79083.793179] lr : tick_sched_timer+0x58/0xb0
[79083.793543] sp : ffff800011b03e00
[79083.793833] x29: ffff800011b03e00 x28: 000047ed20604465 
[79083.794298] x27: ffff0000f77baa40 x26: 0000000000000002 
[79083.794761] x25: 0000000000000080 x24: ffff80001150f018 
[79083.795225] x23: ffff80001183a1c8 x22: ffff800010147f20 
[79083.795688] x21: ffff800011bcbdf0 x20: 000047ed206048f4 
[79083.796151] x19: 7bff0000f77baf40 x18: 0000000000000000 
[79083.796615] x17: 0000000000000000 x16: 0000000000000000 
[79083.797078] x15: 0000000000000000 x14: 0000000000000000 
[79083.797541] x13: 000000000000018c x12: 0000000000000000 
[79083.798004] x11: 0000000000000040 x10: ffff80001185ea30 
[79083.798468] x9 : ffff80001185ea28 x8 : ffff0000f6800270 
[79083.798931] x7 : 0000000000000000 x6 : 000001ba05a22d6f 
[79083.799395] x5 : 00ffffffffffffff x4 : ffff8000e62a3000 
[79083.799858] x3 : ffff80001183a000 x2 : 0000000000000004 
[79083.800321] x1 : ffff800011bcbef8 x0 : 7bff0000f77baf40 
[79083.800785] Call trace:
[79083.801002]  tick_sched_handle.isra.19+0xc/0x58
[79083.801399]  tick_sched_timer+0x58/0xb0
[79083.801737]  __hrtimer_run_queues+0x104/0x3b0
[79083.802118]  hrtimer_interrupt+0xf4/0x250
[79083.802471]  arch_timer_handler_phys+0x30/0x40
[79083.802861]  handle_percpu_devid_irq+0xa0/0x2b8
[79083.803257]  generic_handle_irq+0x30/0x48
[79083.803609]  __handle_domain_irq+0x94/0x108
[79083.803977]  gic_handle_irq+0xc8/0x170
[79083.804307]  el1_irq+0xb8/0x180
[79083.804582]  arch_cpu_idle+0x14/0x20
[79083.804897]  do_idle+0x1fc/0x270
[79083.805181]  cpu_startup_entry+0x24/0x60
[79083.805525]  secondary_start_kernel+0x16c/0x180
[79083.805923] Code: d503201f d503233f a9bf7bfd 910003fd (39413002) 
[79083.806461] ---[ end trace 9432c2eec90ab0ec ]---
[79083.806864] Kernel panic - not syncing: Fatal exception in interrupt
[79083.807419] SMP: stopping secondary CPUs
[79083.807766] Kernel Offset: disabled
[79083.808071] CPU features: 0x0240022,2000200c
[79083.808444] Memory Limit: none
[79083.808719] Rebooting in 90 seconds..

 

Link to comment
Share on other sites

On 2/4/2021 at 10:26 PM, Lore said:

Here is a pastebin of me just pressing the reset button.  Full unedited output.

 

https://pastebin.com/jB1xJjLE

 

It booted normally but I did notice that there was a FSCK that happened.

yes, it is expected since pressing the reset button will force the system reset and skip graceful shutdown procedure, which one of the step flaggging the filesystem as clean. therefore next boot will read the filesystem dirty and do fsck to checck if there is any corruption.
 

On 2/5/2021 at 9:00 PM, Lore said:

So, it rebooted itself.  Here is the reboot output.

 

  Reveal hidden contents


[79083.780521] Unable to handle kernel paging request at virtual address 7bff0000f77baf8c
[79083.781213] Mem abort info:
[79083.781458]   ESR = 0x96000004
[79083.781727]   EC = 0x25: DABT (current EL), IL = 32 bits
[79083.782189]   SET = 0, FnV = 0
[79083.782457]   EA = 0, S1PTW = 0
[79083.782733] Data abort info:
[79083.782985]   ISV = 0, ISS = 0x00000004
[79083.783319]   CM = 0, WnR = 0
[79083.783580] [7bff0000f77baf8c] address between user and kernel address ranges
[79083.784202] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[79083.784688] Modules linked in: softdog governor_performance rfkill quota_v2 quota_tree r8152 snd_soc_hdmi_codec hantro_vpu(C) zstd rockchip_rga rockchip_vdec(C) v4l2_h264 videobuf2_dma_contig leds_pwm videobuf2_vmalloc v4l2_mem2mem gpio_charger videobuf2_dma_sg panfrost snd_soc_rockchip_i2s pwm_fan rockchipdrm videobuf2_memops videobuf2_v4l2 snd_soc_core fusb302 gpu_sched tcpm dw_mipi_dsi snd_pcm_dmaengine typec videobuf2_common snd_pcm dw_hdmi analogix_dp drm_kms_helper snd_timer cec snd sg videodev rc_core soundcore mc drm drm_panel_orientation_quirks gpio_beeper cpufreq_dt zram nfsd auth_rpcgss nfs_acl lockd grace sunrpc ledtrig_netdev lm75 ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx realtek md_mod dwmac_rk stmmac_platform stmmac mdio_xpcs adc_keys
[79083.791149] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G         C        5.9.14-rockchip64 #20.11.4
[79083.791913] Hardware name: Helios64 (DT)
[79083.792257] pstate: 80000085 (Nzcv daIf -PAN -UAO BTYPE=--)
[79083.792753] pc : tick_sched_handle.isra.19+0xc/0x58
[79083.793179] lr : tick_sched_timer+0x58/0xb0
[79083.793543] sp : ffff800011b03e00
[79083.793833] x29: ffff800011b03e00 x28: 000047ed20604465 
[79083.794298] x27: ffff0000f77baa40 x26: 0000000000000002 
[79083.794761] x25: 0000000000000080 x24: ffff80001150f018 
[79083.795225] x23: ffff80001183a1c8 x22: ffff800010147f20 
[79083.795688] x21: ffff800011bcbdf0 x20: 000047ed206048f4 
[79083.796151] x19: 7bff0000f77baf40 x18: 0000000000000000 
[79083.796615] x17: 0000000000000000 x16: 0000000000000000 
[79083.797078] x15: 0000000000000000 x14: 0000000000000000 
[79083.797541] x13: 000000000000018c x12: 0000000000000000 
[79083.798004] x11: 0000000000000040 x10: ffff80001185ea30 
[79083.798468] x9 : ffff80001185ea28 x8 : ffff0000f6800270 
[79083.798931] x7 : 0000000000000000 x6 : 000001ba05a22d6f 
[79083.799395] x5 : 00ffffffffffffff x4 : ffff8000e62a3000 
[79083.799858] x3 : ffff80001183a000 x2 : 0000000000000004 
[79083.800321] x1 : ffff800011bcbef8 x0 : 7bff0000f77baf40 
[79083.800785] Call trace:
[79083.801002]  tick_sched_handle.isra.19+0xc/0x58
[79083.801399]  tick_sched_timer+0x58/0xb0
[79083.801737]  __hrtimer_run_queues+0x104/0x3b0
[79083.802118]  hrtimer_interrupt+0xf4/0x250
[79083.802471]  arch_timer_handler_phys+0x30/0x40
[79083.802861]  handle_percpu_devid_irq+0xa0/0x2b8
[79083.803257]  generic_handle_irq+0x30/0x48
[79083.803609]  __handle_domain_irq+0x94/0x108
[79083.803977]  gic_handle_irq+0xc8/0x170
[79083.804307]  el1_irq+0xb8/0x180
[79083.804582]  arch_cpu_idle+0x14/0x20
[79083.804897]  do_idle+0x1fc/0x270
[79083.805181]  cpu_startup_entry+0x24/0x60
[79083.805525]  secondary_start_kernel+0x16c/0x180
[79083.805923] Code: d503201f d503233f a9bf7bfd 910003fd (39413002) 
[79083.806461] ---[ end trace 9432c2eec90ab0ec ]---
[79083.806864] Kernel panic - not syncing: Fatal exception in interrupt
[79083.807419] SMP: stopping secondary CPUs
[79083.807766] Kernel Offset: disabled
[79083.808071] CPU features: 0x0240022,2000200c
[79083.808444] Memory Limit: none
[79083.808719] Rebooting in 90 seconds..

 

Have you tried to limit the cpu speed and governor?

Change CPU governor to Performance, and optionally reduce the CPU speed to keep the system run cool.

 

armbian-config > System > CPU

 

Minimum CPU speed = 1200000

Maximum CPU speed = 1200000

CPU governor = performance

Link to comment
Share on other sites

@aprayoga, @gprovost,

One thing that has worked for me since I got the board around on Nov of last year is to install package 'tuned' and set the profile to 'server-powersave'.

The two BIG cores can go up to 1.8GHz and the little cores go up to 1.4GHz under heavy usage without any lockups - most of the time in idle they stay at 408MHz.

 

apt -y install tuned
tuned-adm profile server-powersave
systemctl restart tuned

 

Link to comment
Share on other sites

6 hours ago, snakekick said:

hello @SIGSEGV tuned is only a another CPU Scaling Governor or?

do you have problems before you switch to tuned and this solve your problems ?

 

I did have a few lockups when running my setup scripts that actually install a few of the packages I needed/wanted to run on the Helios64. It would lock sometimes while idle.

'tuned' was one of the packages that is installed on our cloud environments and some options are available through the cockpit management interface. I wouldn't describe 'tuned' as a CPU Scaling Governor.

The one thing I noticed was that if configured tuned in the early steps of installation - my scripts would finish running their task and I didn't have lockups (if tuned was not configured early in the process it would freeze at random points).

 

tuned project - their website description is as follows

Tuned is a system tuning service for Linux. It:

monitors connected devices using the udev device manager
tunes system settings according to a selected profile
supports various types of configuration like sysctl, sysfs, or kernel boot command line parameters, which are integrated in a plug-in architecture
supports hot plugging of devices and can be controlled from the command line or through D-Bus, so it can be easily integrated into existing administering solutions: for example, with Cockpit
can be run in no-daemon mode with limited functionality (for example, no support for D-Bus, udev, tuning of newly created processes, and so on) for systems with reduced resources
stores all its configuration cleanly in one place – in the Tuned profile – instead of having configuration on multiple places and in custom scripts

 

Take my word with a grain of salt - this is only my experience with the Helios64.

 

 

Link to comment
Share on other sites

22 hours ago, gprovost said:

Are you trying to ssh with root user ?

I have tried with both root and the account I created at install.  Both give the same reject response.  Using both SSH and Console.

 

10 hours ago, SIGSEGV said:

@aprayoga, @gprovost,

One thing that has worked for me since I got the board around on Nov of last year is to install package 'tuned' and set the profile to 'server-powersave'.

The two BIG cores can go up to 1.8GHz and the little cores go up to 1.4GHz under heavy usage without any lockups - most of the time in idle they stay at 408MHz.

 



apt -y install tuned
tuned-adm profile server-powersave
systemctl restart tuned

 

Ill give Tuned a shot... If I can ever get logged back in.

Link to comment
Share on other sites

1 hour ago, Lore said:

I have tried with both root and the account I created at install.  Both give the same reject response.  Using both SSH and Console.

 If you can still connect to OMV web portal, then go check your created user and be user they right to connect to SSH.

Link to comment
Share on other sites

7 minutes ago, gprovost said:

 If you can still connect to OMV web portal, then go check your created user and be user they right to connect to SSH.

The web portal doesn't work anymore, if that is because of the corruption issues or the bad passwords I am not sure.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines