linux-image-legacy-sunxi=24.5.1 (kernel 6.1.92) is broken: stuck at "Starting kernel ..."

mikhailai · June 17, 2024

The Linux kernel contained in the latest "linux-image-legacy-sunxi" (version 24.5.1) package appears to broken to the point of locking-up right from the start. It prints "Starting kernel ...", and no more messages appear even with "verbosity=7" set in the "armbianEnv.txt". The "linux-image-legacy-sunxi" version 24.2.1 boots just fine.

Here are the steps to reproduce the problem. I've done this on "Orange Pi One" board, but exactly the same issue occurs on (community maintained) Banana Pi M1.

1. Download and write the Armbian image to a MicroSD card.

2. Connect the serial console, boot the board, finish setup, do all the upgrades: everything works fine at this point.

3. Set "verbosity=7" in the "armbianEnv.txt", reboot and observe the kernel messages. At this point, the "linux-image-current-sunxi", version 24.5.1 (kernel 6.6.31) is installed.

4. Install "armbian-config" and use it to switch to "linux-image-legacy-sunxi=24.2.1 (6.1.77)". Observe that the board boots up fine.

5. Now switch to "linux-image-legacy-sunxi=24.5.1 (6.1.92)". The boot process now gets stuck at "Starting kernel ..." message.

So as a summary:

* "linux-image-current-sunxi" version 24.5.1 with 6.6.31 kernel: boots fine.

* "linux-image-legacy-sunxi" version 24.2.1 with "6.1.77" kernel: boots fine.

* "linux-image-legacy-sunxi" version 24.5.1 with "6.1.92" kernel: broken: stuck at "Starting kernel ..." message.

I wonder if anyone could check what could have happened with "linux-image-legacy-sunxi" in the latest Armbian build.

going · June 17, 2024

31 минуту назад, mikhailai сказал:

I wonder if anyone could check what could have happened with "linux-image-legacy-sunxi" in the latest Armbian build.

The last time these patches were changed:

Date:   Wed Mar 27 20:50:41 2024

Obviously, patches need to be rebased to the new kernel version and conflicts need to be fixed.

If you are ready to volunteer to support these patches, I can tell you how to do it.

Regards.

mikhailai · June 18, 2024

I can try doing one-off fix for the current Armbian release, but I cannot commit to support these patches going forward: I'm very short on time right now. LMK if you're still interested giving me the information. I guess I should start off with reading documentation on building the Armbian (never built any image).

going · June 18, 2024

54 минуты назад, mikhailai сказал:

I can try doing one-off fix for the current Armbian release

That's enough.
It is not necessary to collect an image.

It is enough to assemble the kernel package, install it in the OS and check its performance.

I'll write the instructions.

Stephen Graf · June 18, 2024

I just tried to build a legacy image for orangepione and it fails. I'll try again later.

https://paste.armbian.com/ijiyegidak

[🚸] Command failed, retrying in 15s [ apt_find_upstream_package_version_and_download_url base-files ]
curl: (28) Operation timed out after 10306 milliseconds with 0 bytes received

Stephen Graf · June 19, 2024

On 6/17/2024 at 10:52 AM, mikhailai said:

stuck at "Starting kernel ..." message.

I did manage to build a minimal legacy image (24.8.0-trunk, sunxi-legacy:6.1.94) from the current Armbian build system and it gets stuck at the "Starting kernel" message.

putty.txt

going · June 19, 2024

54 минуты назад, Stephen Graf сказал:

I did manage to build a minimal legacy image (24.8.0-trunk, sunxi-legacy:6.1.94)

Will you be able to publish part of the kernel build log?

The part that reports on the application of patches.

9 часов назад, Stephen Graf сказал:

I just tried to build a legacy image for orangepione and it fails.

We don't need this build logic path.

Force the build system to always build the kernel package:

./compile.sh test ARTIFACT_IGNORE_CACHE="yes" kernel

Configuration file:

~/build$ cat userpatches/config-test.conf 
display_alert "Common settings for Armbian OS images" "setting default values" "info"
#declare -g USE_MAINLINE_GOOGLE_MIRROR="yes"
declare -g SYNC_CLOCK="no"
declare -g INSTALL_HEADERS="no"
declare -g WIREGUARD="no"
declare -g VENDOR="Armbian_community"
declare -g VENDORURL="https://github.com/armbian/build"
declare -g VENDORDOCS="https://docs.armbian.com"
declare -g VENDORSUPPORT="https://community.armbian.com/"
declare -g VENDORPRIVACY="https://duckduckgo.com/"
declare -g VENDORBUGS="https://github.com/armbian/community/issues"
declare -g VENDORLOGO="armbian-logo"
declare -g MAINTAINERMAIL=info@armbian.com
declare -g MAINTAINER="The-going"
declare -g COMPRESS_OUTPUTIMAGE="sha,img,xz"
declare -g IMAGE_XZ_COMPRESSION_RATIO=5

declare -g EXPERT="yes"
#declare -g KERNEL_CONFIGURE=yes
#declare -g DONT_BUILD_ARTIFACTS="firmware,full_firmware,fake_ubuntu_advantage_tools,armbian-config,armbian-zsh,armbian-plymouth-theme"

#Upload the log file to the armbian website.
#SHARE_LOG=yes
#ARTIFACT_IGNORE_CACHE="yes"

KERNEL_GIT=shallow

RELEASE=bookworm
BOARD=bananapim64

BRANCH=current

BUILD_DESKTOP=no
BUILD_MINIMAL=yes

P.S.

Edit: BOARD=XXXX BRANCH=YYYYY

Stephen Graf · June 19, 2024

14 hours ago, going said:
./compile.sh test ARTIFACT_IGNORE_CACHE="yes" kernel

@going

I compiled with your test script for legacy orangepione. There was no image produced. The curl command to upload the log file did not work and uploading the log file to this message also failed.

I did pull the attached patches section from the log file.

Can I email the files to you directly?

build_log_patches.txt

ColorfulRhino · June 19, 2024

You can paste logs here: https://paste.armbian.com/

(It's basically hastebin)

Stephen Graf · June 19, 2024

3 minutes ago, ColorfulRhino said:

You can paste logs here:

@ColorfulRhino No, it says "something went wrong" when I try to save. The files are over 2MB long.

Stephen Graf · June 19, 2024

@going

Cut the log file by taking out all the kernel build log entries.

https://paste.armbian.com/ibamekatak

ColorfulRhino · June 20, 2024

Strange, I don't see any error. The build log at the end says your file should be saved as

output/debs/linux-image-legacy-sunxi_24.8.0-trunk_armhf__6.1.94-Seb44-D54a0-Pee76-C2446H5c21-HK01ba-V014b-Bf15a-R448a.deb

in your build folder. Is this output/debs/ folder empty?

mikhailai · June 21, 2024

Ok, returning to the original question. I did some dissection, and the problem appears to be a 6.1.x kernel bug as opposed to something being broken on the Armbian side.

Disclaimer: I did not use a proper Armbian build; rather just took the kernel code from "linux-6.1.y" branch and used "config-6.1.77-legacy-sunxi".

So here are my results:

The v6.1.87 is booting fine: the same way as "linux-image-legacy-sunxi" version 24.2.1.
The v6.1.88 is broken with the same symptoms as "linux-image-legacy-sunxi" version 24.5.1.

The culprit is the following commit:

07b37f227c8daa27e68f57b1c691fab34a06731e (HEAD) random: handle creditable entropy from atomic process context

commit 07b37f227c8daa27e68f57b1c691fab34a06731e
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Wed Apr 17 13:38:29 2024 +0200

    random: handle creditable entropy from atomic process context
    
    commit e871abcda3b67d0820b4182ebe93435624e9c6a4 upstream.
    
    The entropy accounting changes a static key when the RNG has
    initialized, since it only ever initializes once. Static key changes,
    however, cannot be made from atomic context, so depending on where the
    last creditable entropy comes from, the static key change might need to
    be deferred to a worker.
    
    Previously the code used the execute_in_process_context() helper
    function, which accounts for whether or not the caller is
    in_interrupt(). However, that doesn't account for the case where the
    caller is actually in process context but is holding a spinlock.
    
    This turned out to be the case with input_handle_event() in
    drivers/input/input.c contributing entropy:
    
      [<ffffffd613025ba0>] die+0xa8/0x2fc
      [<ffffffd613027428>] bug_handler+0x44/0xec
      [<ffffffd613016964>] brk_handler+0x90/0x144
      [<ffffffd613041e58>] do_debug_exception+0xa0/0x148
      [<ffffffd61400c208>] el1_dbg+0x60/0x7c
      [<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90
      [<ffffffd613011294>] el1h_64_sync+0x64/0x6c
      [<ffffffd613102d88>] __might_resched+0x1fc/0x2e8
      [<ffffffd613102b54>] __might_sleep+0x44/0x7c
      [<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec
      [<ffffffd6132c2820>] static_key_enable+0x14/0x38
      [<ffffffd61400ac08>] crng_set_ready+0x14/0x28
      [<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8
      [<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc
      [<ffffffd6138580c8>] add_timer_randomness+0x264/0x270
      [<ffffffd613857e54>] add_input_randomness+0x38/0x48
      [<ffffffd613a80f94>] input_handle_event+0x2b8/0x490
      [<ffffffd613a81310>] input_event+0x6c/0x98
    
    According to Guoyong, it's not really possible to refactor the various
    drivers to never hold a spinlock there. And in_atomic() isn't reliable.
    
    So, rather than trying to be too fancy, just punt the change in the
    static key to a workqueue always. There's basically no drawback of doing
    this, as the code already needed to account for the static key not
    changing immediately, and given that it's just an optimization, there's
    not exactly a hurry to change the static key right away, so deferal is
    fine.
    
    Reported-by: Guoyong Wang <guoyong.wang@mediatek.com>
    Cc: stable@vger.kernel.org
    Fixes: f5bda35fba61 ("random: use static branch for crng_ready()")
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d1c8e1c99b5..fd57eb372d49 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -683,7 +683,7 @@ static void extract_entropy(void *buf, size_t len)
 
 static void __cold _credit_init_bits(size_t bits)
 {
-	static struct execute_work set_ready;
+	static DECLARE_WORK(set_ready, crng_set_ready);
 	unsigned int new, orig, add;
 	unsigned long flags;
 
@@ -699,8 +699,8 @@ static void __cold _credit_init_bits(size_t bits)
 
 	if (orig < POOL_READY_BITS && new >= POOL_READY_BITS) {
 		crng_reseed(); /* Sets crng_init to CRNG_READY under base_crng.lock. */
-		if (static_key_initialized)
-			execute_in_process_context(crng_set_ready, &set_ready);
+		if (static_key_initialized && system_unbound_wq)
+			queue_work(system_unbound_wq, &set_ready);
 		wake_up_interruptible(&crng_init_wait);
 		kill_fasync(&fasync, SIGIO, POLL_IN);
 		pr_notice("crng init done\n");
@@ -870,8 +870,8 @@ void __init random_init(void)
 
 	/*
 	 * If we were initialized by the cpu or bootloader before jump labels
-	 * are initialized, then we should enable the static branch here, where
-	 * it's guaranteed that jump labels have been initialized.
+	 * or workqueues are initialized, then we should enable the static
+	 * branch here, where it's guaranteed that these have been initialized.
 	 */
 	if (!static_branch_likely(&crng_is_ready) && crng_init >= CRNG_READY)
 		crng_set_ready(NULL);

The code change is rather simple: it switches from using "execute_in_process_context" to "queue_work", but that switch is causing the lock-up. I don't have enough knowledge to debug why it is happening: suspect some sort of a deadlock.

I've tried taking the "random.c" from the 6.6.34 kernel and doing hacky modifications to get to to compile on 6.1.y: that fixed the problem, so I'm guessing the "random.c" on the 6.1.y branch is not in a good state.

Does anyone have suggestions on how to proceed from here?

going · June 21, 2024

21.06.2024 в 09:54, mikhailai сказал:

Does anyone have suggestions on how to proceed from here?

Analysis:

linux-stable> git log --pretty=oneline v6.1.87..07b37f227c8daa27e68f57b1c691fab34a06731e | wc -l
8

Maybe we will do the following:

1) Freeze the outdated kernel to version 6.1.87.

diff --git a/config/sources/families/include/sunxi64_common.inc b/config/sources/families/include/sunxi64_common.inc
index 18775666..e37fe516 100644
--- a/config/sources/families/include/sunxi64_common.inc
+++ b/config/sources/families/include/sunxi64_common.inc
@@ -25,6 +25,7 @@ case $BRANCH in
 
        legacy)
                declare -g KERNEL_MAJOR_MINOR="6.1" # Major and minor versions of this kernel.
+               declare -g KERNELBRANCH="tag:v6.1.87"
                ;;
 
        current)
diff --git a/config/sources/families/include/sunxi_common.inc b/config/sources/families/include/sunxi_common.inc
index 93b14ab8..f6261767 100644
--- a/config/sources/families/include/sunxi_common.inc
+++ b/config/sources/families/include/sunxi_common.inc
@@ -26,6 +26,7 @@ case $BRANCH in
 
        legacy)
                declare -g KERNEL_MAJOR_MINOR="6.1" # Major and minor versions of this kernel.
+               declare -g KERNELBRANCH="tag:v6.1.87"
                ;;
 
        current)

2) Переработаем (извлечём заново патчи) для этой версии ядра.

3) Leave this kernel in this state, and eliminate the cause for the current 6.6 kernel. If it is present in it.

going · June 21, 2024

17.06.2024 в 20:52, mikhailai сказал:

"linux-image-current-sunxi" version 24.5.1 with 6.6.31 kernel: boots fine.

3 часа назад, mikhailai сказал:

The culprit is the following commit:

07b37f227c8daa27e68f57b1c691fab34a06731e (HEAD) random: handle creditable entropy from atomic process context

This patch in the 6.6 kernel is present after the v6.6.28 tag 998f52a860555a9f02242bc0a4b3e9b47d47dc11

I think the problem lies elsewhere.

going · June 21, 2024

20.06.2024 в 02:21, Stephen Graf сказал:

Cut the log file by taking out all the kernel build log entries.

https://paste.armbian.com/ibamekatak

Summary: kernel patching: 498 total patches; 498 applied; 81 with problems; 80 needs_rebase; 4 not_mbox

This line indicates that problems exist, but is silent about what kind of problems they are. Row offset? Diffusion?

Here, a separate piece can be applied to another node in the DTS or to another function in the C code.

Only a person who reads the source code of the file and reads the patch file can detect the problem.

mikhailai · June 21, 2024

6 hours ago, going said:

This patch in the 6.6 kernel is present after the v6.6.28 tag 998f52a860555a9f02242bc0a4b3e9b47d47dc11

I think the problem lies elsewhere.

True, but the "random.c" on the 6.6 branch contains bunch of other changes not present in 6.1 (15 commits to be precise). I suppose the change "random: handle creditable entropy from atomic process context" woks well with these commits, but is broken without some of these changes.

In fact, I kind-of confirmed that, per my comment below.

10 hours ago, mikhailai said:

I've tried taking the "random.c" from the 6.6.34 kernel and doing hacky modifications to get to to compile on 6.1.y: that fixed the problem, so I'm guessing the "random.c" on the 6.1.y branch is not in a good state.

Overall, this looks plausible. The change was originally done and tested on the mainline, with all other changes being present. Then it was cherry-picked into 6.6 and 6.1 branches, where it received more limited testing that did not catch the problem. I'm guessing the problem does not show up on x86 and shows up on armhf. It could be timing dependent, so only shows up under specific circumstances.

I'm hoping there would be just a few commits (ideally just one) that could be cherry-picked into 6.1 branch to make it work.

going · June 21, 2024

50 минут назад, mikhailai сказал:

I'm hoping there would be just a few commits (ideally just one) that could be cherry-picked into 6.1 branch to make it work.

Okay, I get it.
Can we just take these few patches from the 6.6 kernel and add them to the 6.1 kernel?

It is better if they are in the form in which they already exist in 6.6.

I mean, what have you already tested.

mikhailai · July 10, 2024

Hi y'all. Sorry for disappearing for a while: it really took some time to investigate, but now I'm pretty sure I've found the root cause.

TLDR:

My analysis above is wrong. The bug is present in v6.1.87, and change to "drivers/char/random.c" has nothing to do with it: just accidentally happens to trigger the bug.
The problem only manifests itself when the ftrace "mcount" call instruction for _raw_spin_unlock_irqrestore function in the kernel code straddles the instruction cache lines. This happens when "_raw_spin_unlock_irqrestore" address ends on (hex): 1c, 3c, 5c, ... fc (see System.map for that).
Due to above arbitrary changes to the kernel code may trigger this problem to appear or disappear. In other words, the hang may look fixed, but then show up later.
It is present through the whole 6.1.y kernel branch, as well as 6.6.y branch. I did not check the mainline or earlier branches.
The problem does not appear when the kernel is compiled with GCC 9, which is a default cross-compiler on Ubuntu 20.04 (Focal).
See the bottom of this post for the correct fix. I'll try to get into the upstream Linux kernel.

LONG VERSION:

Problem: Linux kernel hangs in early boot on 32-bit ARM platform, when ftrace 4-byte "mcount" function call location for "_raw_spin_unlock_irqrestore" function straddles icache lines.

The problem persist through the whole 6.1.y kernel branch and likely beyond. Could also reproduce it in the 6.6.y branch with a bit more "nop" placement (see below).

ROOT CAUSE ANALYSIS:

The hang is inside:
start_kernel -> ftrace_init -> ftrace_process_locs -> ftrace_update_code.

It hangs when it updates the ftrace location (by calling "ftrace_nop_initialize") for the entry for:

_raw_spin_unlock_irqrestore

The reason is the following:

"ftrace_nop_initialize" calls "ftrace_init_nop", which on 32-bit ARM goes to "ftrace_make_nop".
"ftrace_make_nop" calls "ftrace_modify_code" that calls "__patch_text", that in-turn calls "__patch_text_real" (defined in "arch/arm/kernel/patch.c") with remap=true.

After writing the actual instruction, "__patch_text_real" does the following:

    if (waddr != addr) {
        flush_kernel_vmap_range(waddr, twopage ? size / 2 : size);
        patch_unmap(FIX_TEXT_POKE0, &flags);
    }

    flush_icache_range((uintptr_t)(addr),
               (uintptr_t)(addr) + size);

The "patch_unmap" calls the above-mentioned "_raw_spin_unlock_irqrestore".

Hereby lies the problem. If it's patching the "_raw_spin_unlock_irqrestore", it invokes the function BEFORE flushing the icache, so there is a possibility of that function having an invalid code created by the combination of the updated and non-updated pieces of the instruction residing in different cache lines. The occurrence of the error strongly depends on other factors: that's why it worked for earlier 6.1.y kernels. Necessary factors:

The ftrace location for "_raw_spin_unlock_irqrestore" is NOT 4-byte aligned and 4 bytes at this location straddle the instruction cache line (0x20) boundaries. I.e. the pg->records[i]->ip (hex) value ends on: 0x1e, 0x3e, 0x5e, ... 0xfe. For that function, this value is offset from the function address by 2 bytes.
The previous Ftrace entry needs to be updated as well. That is probably needed to get the icache into inconsistent state. For the reproduced hangs, the previous entry is inside the "_raw_write_unlock_irqrestore" (unlike _raw_spin_unlock_irqrestore, it is NOT being invoked when "ftrace_update_code" is executing).
The problem is present for (cross-compiler) GCC 10, 11, 12. It does not happen when the kernel is compiled with GCC 9, even when condition (1) is satisfied. Not sure what is the reason: could be different code or condition (2) being different, leading to cache NOT get into an inconsistent state. Note, the default cross-compiler on Ubuntu 22.04 (Jammy) is GCC 11, while the default compiler on Ubuntu 20.04 (Focal) is GCC 9.

Note, the condition (1) can be achieved by increasing/decreasing code size of certain functions. The following algorithm can be used.

Add 4 "nop" instructions at a time to "drivers/char/random.c", "try_to_generate_entropy" function, until "_raw_spin_unlock_irqrestore" address ends on -x8, or -xC, where "x" is odd. E.g. ...1c, ...3c, ...5c, etc. E.g. asm("nop;nop;nop;nop; ");
If it ends on 8, add 2 more "nop" instructions to one of the lock functions inside the "__lock_text_start" section: see the System.map on which one comes first/earlier.

PROPOSED FIX:

The fix is really simple: just swap the order of "patch_unmap" and "flush_icache_range" in the above code snippet (from "arch/arm/kernel/patch.c", "__patch_text_real" function). I.e. replace the above code snippet with:

	if (waddr != addr)
		flush_kernel_vmap_range(waddr, twopage ? size / 2 : size);

	flush_icache_range((uintptr_t)(addr),
			   (uintptr_t)(addr) + size);

	/* Can only call 'patch_unmap' after flushing dcache and icache,
	 * because it calls 'raw_spin_unlock_irqrestore', but that may
	 * happen to be the very function we're currently patching
	 * (as it happens during the ftrace init).
	 */
	if (waddr != addr)
		patch_unmap(FIX_TEXT_POKE0, &flags);

going · July 12, 2024

11.07.2024 в 00:31, mikhailai сказал:

PROPOSED FIX:

It usually looks like this:

diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index e9e828b6bb30..ce0fd3aeb575 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -101,11 +101,18 @@ void __kprobes __patch_text_real(void *addr, unsigned int insn, bool remap)
 
        if (waddr != addr) {
                flush_kernel_vmap_range(waddr, twopage ? size / 2 : size);
-               patch_unmap(FIX_TEXT_POKE0, &flags);
        }
 
        flush_icache_range((uintptr_t)(addr),
                           (uintptr_t)(addr) + size);
+
+       /* Can only call 'patch_unmap' after flushing dcache and icache,
+        * because it calls 'raw_spin_unlock_irqrestore', but that may
+        * happen to be the very function we're currently patching
+        * (as it happens during the ftrace init).
+        */
+       if (waddr != addr)
+               patch_unmap(FIX_TEXT_POKE0, &flags);
 }
 
 static int __kprobes patch_text_stop_machine(void *data)

@Gunjan Gupta You might want to take a look at this.

Gunjan Gupta · July 14, 2024

This sounds more like it. I lack the kernel knowledge required to review it myself. I would suggest to try submitting the patch with the explanation in the cover letter to mainline kernel and see if we can get it accepted there.

going · July 14, 2024

11.07.2024 в 00:31, mikhailai сказал:

Hereby lies the problem. If it's patching the "_raw_spin_unlock_irqrestore", it invokes the function BEFORE flushing the icache, so there is a possibility of that function having an invalid code created by the combination of the updated and non-updated pieces of the instruction residing in different cache lines. The occurrence of the error strongly depends on other factors: that's why it worked for earlier 6.1.y kernels. Necessary factors:

The ftrace location for "_raw_spin_unlock_irqrestore" is NOT 4-byte aligned and 4 bytes at this location straddle the instruction cache line (0x20) boundaries. I.e. the pg->records[i]->ip (hex) value ends on: 0x1e, 0x3e, 0x5e, ... 0xfe. For that function, this value is offset from the function address by 2 bytes.

The previous Ftrace entry needs to be updated as well. That is probably needed to get the icache into inconsistent state. For the reproduced hangs, the previous entry is inside the "_raw_write_unlock_irqrestore" (unlike _raw_spin_unlock_irqrestore, it is NOT being invoked when "ftrace_update_code" is executing).

The problem is present for (cross-compiler) GCC 10, 11, 12. It does not happen when the kernel is compiled with GCC 9, even when condition (1) is satisfied. Not sure what is the reason: could be different code or condition (2) being different, leading to cache NOT get into an inconsistent state. Note, the default cross-compiler on Ubuntu 22.04 (Jammy) is GCC 11, while the default compiler on Ubuntu 20.04 (Focal) is GCC 9.

Note, the condition (1) can be achieved by increasing/decreasing code size of certain functions. The following algorithm can be used.

Add 4 "nop" instructions at a time to "drivers/char/random.c", "try_to_generate_entropy" function, until "_raw_spin_unlock_irqrestore" address ends on -x8, or -xC, where "x" is odd. E.g. ...1c, ...3c, ...5c, etc. E.g. asm("nop;nop;nop;nop; ");

If it ends on 8, add 2 more "nop" instructions to one of the lock functions inside the "__lock_text_start" section: see the System.map on which one comes first/earlier.

PROPOSED FIX:

The fix is really simple: just swap the order of "patch_unmap" and "flush_icache_range" in the above code snippet (from "arch/arm/kernel/patch.c", "__patch_text_real" function). I.e. replace the above code snippet with:

@mikhailai Thanks for this detailed study of the problem.

After you posted about this issue, I did the following. Just the reverse of the specified commit:

sunxi-6.1: Revert: handle entropy from atomic process context

Your changes are here today. But I do not know who to write the author. That's why I indicated your account on the forum.

PR: #6945

going · July 14, 2024

2 часа назад, Gunjan Gupta сказал:

I would suggest to try submitting the patch with the explanation in the cover letter to mainline kernel

There is a 99.99% chance that our email will be ignored.
Our core is not vanilla.
Our compiler belongs to ubuntu, which means that we have to write there.

But even there we will be ignored, because we are not using ubuntu core.

With respect

Gunjan Gupta · July 14, 2024

52 minutes ago, going said:

There is a 99.99% chance that our email will be ignored.

I know our kernel is too different than mainline, but I don't think they will silently ignore the patches. We might get some review comments on it that might either acknowledge the problem, or point to some other place where problem can exist. The "drivers/rtc: rtc-sun6i: AutoCal Internal OSC Clock" patch series is a nice example where the mail was not simply ignored. Granted its not merged, still I think the knowledge gathered from external review would be valuable.

Also main reason for asking was that It might also become beneficial for someone else who might not be using Armbian but still might be having similar problem from time to time.

PS: its just an advice. And up to the contributor to take it or leave it. I am not using Armbian on any of my SBCs anyways.

mikhailai · July 30, 2024

Ok, I've filed the kernel bug:
https://bugzilla.kernel.org/show_bug.cgi?id=219089

And submitted the patch:
https://lore.kernel.org/linux-arm-kernel/20240729165036.7368-1-mikhailai@gmail.com/

Let's see whether anyone would pay attention to it. Unfortunately, I'm not sure if I ended-up CC-ing all the right people.

The merge window seem to have just closed, so it may take 2 months before getting pulled into the mainline + whatever time it would take to make it into release branches.

Also, I'm not really sure where non-vanilla core or compiler would come into play. I could certainly reproduce the problem with vanilla 6.10 (mainline) Linux kernel. I DID use Ubuntu compiler (on Linux Mint), but I'm 99.99% certain it'll behave identically if compiled on Debian with the same GCC version.

going · July 30, 2024

5 часов назад, mikhailai сказал:

Ok, I've filed the kernel bug:
https://bugzilla.kernel.org/show_bug.cgi?id=219089

Congratulations.

We will be watching, but the first comment is already available.

mikhailai · July 30, 2024

I saw the reply on the bug, but decided to just send a patch (that refers to the bug) and copy it to the correct lists. I'll follow up if there would be no replies to the patch.

Sign In

linux-image-legacy-sunxi=24.5.1 (kernel 6.1.92) is broken: stuck at "Starting kernel ..."

Recommended Posts

mikhailai

going

mikhailai

going

Stephen Graf

Stephen Graf

going

Stephen Graf

ColorfulRhino

Stephen Graf

Stephen Graf

ColorfulRhino

mikhailai

going

going

going

mikhailai

going

mikhailai

going

Gunjan Gupta

going

going

Gunjan Gupta

mikhailai

going

mikhailai

Join the conversation

Similar Content

Forums

My Activity Streams

Download

Store

Important Information