Jump to content

mikhailai

Members
  • Posts

    10
  • Joined

  • Last visited

Reputation Activity

  1. Like
    mikhailai got a reaction from going in linux-image-legacy-sunxi=24.5.1 (kernel 6.1.92) is broken: stuck at "Starting kernel ..."   
    Hi y'all. Sorry for disappearing for a while: it really took some time to investigate, but now I'm pretty sure I've found the root cause.
     
    TLDR:
    My analysis above is wrong. The bug is present in v6.1.87, and change to "drivers/char/random.c" has nothing to do with it: just accidentally happens to trigger the bug. The problem only manifests itself when the ftrace "mcount" call instruction for _raw_spin_unlock_irqrestore function in the kernel code straddles the instruction cache lines. This happens when "_raw_spin_unlock_irqrestore" address ends on (hex): 1c, 3c, 5c, ... fc (see System.map for that). Due to above arbitrary changes to the kernel code may trigger this problem to appear or disappear. In other words, the hang may look fixed, but then show up later. It is present through the whole 6.1.y kernel branch, as well as 6.6.y branch. I did not check the mainline or earlier branches. The problem does not appear when the kernel is compiled with GCC 9, which is a default cross-compiler on Ubuntu 20.04 (Focal). See the bottom of this post for the correct fix. I'll try to get into the upstream Linux kernel. LONG VERSION:
     
    Problem: Linux kernel hangs in early boot on 32-bit ARM platform, when ftrace 4-byte "mcount" function call location for "_raw_spin_unlock_irqrestore" function straddles icache lines.
     
    The problem persist through the whole 6.1.y kernel branch and likely beyond. Could also reproduce it in the 6.6.y branch with a bit more "nop" placement (see below).
     
    ROOT CAUSE ANALYSIS:

    The hang is inside:
    start_kernel -> ftrace_init -> ftrace_process_locs -> ftrace_update_code.
     
    It hangs when it updates the ftrace location (by calling "ftrace_nop_initialize") for the entry for:
    _raw_spin_unlock_irqrestore The reason is the following:
    "ftrace_nop_initialize" calls "ftrace_init_nop", which on 32-bit ARM goes to "ftrace_make_nop". "ftrace_make_nop" calls "ftrace_modify_code" that calls "__patch_text", that in-turn calls "__patch_text_real" (defined in "arch/arm/kernel/patch.c") with remap=true. After writing the actual instruction, "__patch_text_real" does the following:      if (waddr != addr) {         flush_kernel_vmap_range(waddr, twopage ? size / 2 : size);         patch_unmap(FIX_TEXT_POKE0, &flags);     }     flush_icache_range((uintptr_t)(addr),                (uintptr_t)(addr) + size);  
    The "patch_unmap" calls the above-mentioned "_raw_spin_unlock_irqrestore". Hereby lies the problem. If it's patching the "_raw_spin_unlock_irqrestore", it invokes the function BEFORE flushing the icache, so there is a possibility of that function having an invalid code created by the combination of the updated and non-updated pieces of the instruction residing in different cache lines. The occurrence of the error strongly depends on other factors: that's why it worked for earlier 6.1.y kernels. Necessary factors:
    The ftrace location for "_raw_spin_unlock_irqrestore" is NOT 4-byte aligned and 4 bytes at this location straddle the instruction cache line (0x20) boundaries. I.e. the pg->records[i]->ip (hex) value ends on: 0x1e, 0x3e, 0x5e, ... 0xfe. For that function, this value is offset from the function address by 2 bytes. The previous Ftrace entry needs to be updated as well. That is probably needed to get the icache into inconsistent state. For the reproduced hangs, the previous entry is inside the "_raw_write_unlock_irqrestore" (unlike _raw_spin_unlock_irqrestore, it is NOT being invoked when "ftrace_update_code" is executing). The problem is present for (cross-compiler) GCC 10, 11, 12. It does not happen when the kernel is compiled with GCC 9, even when condition (1) is satisfied. Not sure what is the reason: could be different code or condition (2) being different, leading to cache NOT get into an inconsistent state. Note, the default cross-compiler on Ubuntu 22.04 (Jammy) is GCC 11, while the default compiler on Ubuntu 20.04 (Focal) is GCC 9. Note, the condition (1) can be achieved by increasing/decreasing code size of certain functions. The following algorithm can be used.
    Add 4 "nop" instructions at a time to "drivers/char/random.c", "try_to_generate_entropy" function, until "_raw_spin_unlock_irqrestore" address ends on -x8, or -xC, where "x" is odd. E.g. ...1c, ...3c, ...5c, etc. E.g. asm("nop;nop;nop;nop; "); If it ends on 8, add 2 more "nop" instructions to one of the lock functions inside the "__lock_text_start" section: see the System.map on which one comes first/earlier. PROPOSED FIX:
    The fix is really simple: just swap the order of "patch_unmap" and "flush_icache_range" in the above code snippet (from  "arch/arm/kernel/patch.c", "__patch_text_real" function). I.e. replace the above code snippet with:
    if (waddr != addr) flush_kernel_vmap_range(waddr, twopage ? size / 2 : size); flush_icache_range((uintptr_t)(addr), (uintptr_t)(addr) + size); /* Can only call 'patch_unmap' after flushing dcache and icache, * because it calls 'raw_spin_unlock_irqrestore', but that may * happen to be the very function we're currently patching * (as it happens during the ftrace init). */ if (waddr != addr) patch_unmap(FIX_TEXT_POKE0, &flags);  
  2. Like
    mikhailai got a reaction from going in linux-image-legacy-sunxi=24.5.1 (kernel 6.1.92) is broken: stuck at "Starting kernel ..."   
    Ok, returning to the original question. I did some dissection, and the problem appears to be a 6.1.x kernel bug as opposed to something being broken on the Armbian side.
    Disclaimer: I did not use a proper Armbian build; rather just took the kernel code from "linux-6.1.y" branch and used "config-6.1.77-legacy-sunxi".
     
    So here are my results:
    The v6.1.87 is booting fine: the same way as "linux-image-legacy-sunxi" version 24.2.1. The v6.1.88 is broken with the same symptoms as "linux-image-legacy-sunxi" version 24.5.1. The culprit is the following commit:
    07b37f227c8daa27e68f57b1c691fab34a06731e (HEAD) random: handle creditable entropy from atomic process context
    commit 07b37f227c8daa27e68f57b1c691fab34a06731e Author: Jason A. Donenfeld <Jason@zx2c4.com> Date: Wed Apr 17 13:38:29 2024 +0200 random: handle creditable entropy from atomic process context commit e871abcda3b67d0820b4182ebe93435624e9c6a4 upstream. The entropy accounting changes a static key when the RNG has initialized, since it only ever initializes once. Static key changes, however, cannot be made from atomic context, so depending on where the last creditable entropy comes from, the static key change might need to be deferred to a worker. Previously the code used the execute_in_process_context() helper function, which accounts for whether or not the caller is in_interrupt(). However, that doesn't account for the case where the caller is actually in process context but is holding a spinlock. This turned out to be the case with input_handle_event() in drivers/input/input.c contributing entropy: [<ffffffd613025ba0>] die+0xa8/0x2fc [<ffffffd613027428>] bug_handler+0x44/0xec [<ffffffd613016964>] brk_handler+0x90/0x144 [<ffffffd613041e58>] do_debug_exception+0xa0/0x148 [<ffffffd61400c208>] el1_dbg+0x60/0x7c [<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90 [<ffffffd613011294>] el1h_64_sync+0x64/0x6c [<ffffffd613102d88>] __might_resched+0x1fc/0x2e8 [<ffffffd613102b54>] __might_sleep+0x44/0x7c [<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec [<ffffffd6132c2820>] static_key_enable+0x14/0x38 [<ffffffd61400ac08>] crng_set_ready+0x14/0x28 [<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8 [<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc [<ffffffd6138580c8>] add_timer_randomness+0x264/0x270 [<ffffffd613857e54>] add_input_randomness+0x38/0x48 [<ffffffd613a80f94>] input_handle_event+0x2b8/0x490 [<ffffffd613a81310>] input_event+0x6c/0x98 According to Guoyong, it's not really possible to refactor the various drivers to never hold a spinlock there. And in_atomic() isn't reliable. So, rather than trying to be too fancy, just punt the change in the static key to a workqueue always. There's basically no drawback of doing this, as the code already needed to account for the static key not changing immediately, and given that it's just an optimization, there's not exactly a hurry to change the static key right away, so deferal is fine. Reported-by: Guoyong Wang <guoyong.wang@mediatek.com> Cc: stable@vger.kernel.org Fixes: f5bda35fba61 ("random: use static branch for crng_ready()") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff --git a/drivers/char/random.c b/drivers/char/random.c index 5d1c8e1c99b5..fd57eb372d49 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -683,7 +683,7 @@ static void extract_entropy(void *buf, size_t len) static void __cold _credit_init_bits(size_t bits) { - static struct execute_work set_ready; + static DECLARE_WORK(set_ready, crng_set_ready); unsigned int new, orig, add; unsigned long flags; @@ -699,8 +699,8 @@ static void __cold _credit_init_bits(size_t bits) if (orig < POOL_READY_BITS && new >= POOL_READY_BITS) { crng_reseed(); /* Sets crng_init to CRNG_READY under base_crng.lock. */ - if (static_key_initialized) - execute_in_process_context(crng_set_ready, &set_ready); + if (static_key_initialized && system_unbound_wq) + queue_work(system_unbound_wq, &set_ready); wake_up_interruptible(&crng_init_wait); kill_fasync(&fasync, SIGIO, POLL_IN); pr_notice("crng init done\n"); @@ -870,8 +870,8 @@ void __init random_init(void) /* * If we were initialized by the cpu or bootloader before jump labels - * are initialized, then we should enable the static branch here, where - * it's guaranteed that jump labels have been initialized. + * or workqueues are initialized, then we should enable the static + * branch here, where it's guaranteed that these have been initialized. */ if (!static_branch_likely(&crng_is_ready) && crng_init >= CRNG_READY) crng_set_ready(NULL);  
    The code change is rather simple: it switches from using "execute_in_process_context" to "queue_work", but that switch is causing the lock-up. I don't have enough knowledge to debug why it is happening: suspect some sort of a deadlock.
     
    I've tried taking the "random.c" from the 6.6.34 kernel and doing hacky modifications to get to to compile on 6.1.y: that fixed the problem, so I'm guessing the "random.c" on the 6.1.y branch is not in a good state.
     
    Does anyone have suggestions on how to proceed from here?
  3. Like
    mikhailai got a reaction from guidol in linux-image-legacy-sunxi=24.5.1 (kernel 6.1.92) is broken: stuck at "Starting kernel ..."   
    Ok, returning to the original question. I did some dissection, and the problem appears to be a 6.1.x kernel bug as opposed to something being broken on the Armbian side.
    Disclaimer: I did not use a proper Armbian build; rather just took the kernel code from "linux-6.1.y" branch and used "config-6.1.77-legacy-sunxi".
     
    So here are my results:
    The v6.1.87 is booting fine: the same way as "linux-image-legacy-sunxi" version 24.2.1. The v6.1.88 is broken with the same symptoms as "linux-image-legacy-sunxi" version 24.5.1. The culprit is the following commit:
    07b37f227c8daa27e68f57b1c691fab34a06731e (HEAD) random: handle creditable entropy from atomic process context
    commit 07b37f227c8daa27e68f57b1c691fab34a06731e Author: Jason A. Donenfeld <Jason@zx2c4.com> Date: Wed Apr 17 13:38:29 2024 +0200 random: handle creditable entropy from atomic process context commit e871abcda3b67d0820b4182ebe93435624e9c6a4 upstream. The entropy accounting changes a static key when the RNG has initialized, since it only ever initializes once. Static key changes, however, cannot be made from atomic context, so depending on where the last creditable entropy comes from, the static key change might need to be deferred to a worker. Previously the code used the execute_in_process_context() helper function, which accounts for whether or not the caller is in_interrupt(). However, that doesn't account for the case where the caller is actually in process context but is holding a spinlock. This turned out to be the case with input_handle_event() in drivers/input/input.c contributing entropy: [<ffffffd613025ba0>] die+0xa8/0x2fc [<ffffffd613027428>] bug_handler+0x44/0xec [<ffffffd613016964>] brk_handler+0x90/0x144 [<ffffffd613041e58>] do_debug_exception+0xa0/0x148 [<ffffffd61400c208>] el1_dbg+0x60/0x7c [<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90 [<ffffffd613011294>] el1h_64_sync+0x64/0x6c [<ffffffd613102d88>] __might_resched+0x1fc/0x2e8 [<ffffffd613102b54>] __might_sleep+0x44/0x7c [<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec [<ffffffd6132c2820>] static_key_enable+0x14/0x38 [<ffffffd61400ac08>] crng_set_ready+0x14/0x28 [<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8 [<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc [<ffffffd6138580c8>] add_timer_randomness+0x264/0x270 [<ffffffd613857e54>] add_input_randomness+0x38/0x48 [<ffffffd613a80f94>] input_handle_event+0x2b8/0x490 [<ffffffd613a81310>] input_event+0x6c/0x98 According to Guoyong, it's not really possible to refactor the various drivers to never hold a spinlock there. And in_atomic() isn't reliable. So, rather than trying to be too fancy, just punt the change in the static key to a workqueue always. There's basically no drawback of doing this, as the code already needed to account for the static key not changing immediately, and given that it's just an optimization, there's not exactly a hurry to change the static key right away, so deferal is fine. Reported-by: Guoyong Wang <guoyong.wang@mediatek.com> Cc: stable@vger.kernel.org Fixes: f5bda35fba61 ("random: use static branch for crng_ready()") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff --git a/drivers/char/random.c b/drivers/char/random.c index 5d1c8e1c99b5..fd57eb372d49 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -683,7 +683,7 @@ static void extract_entropy(void *buf, size_t len) static void __cold _credit_init_bits(size_t bits) { - static struct execute_work set_ready; + static DECLARE_WORK(set_ready, crng_set_ready); unsigned int new, orig, add; unsigned long flags; @@ -699,8 +699,8 @@ static void __cold _credit_init_bits(size_t bits) if (orig < POOL_READY_BITS && new >= POOL_READY_BITS) { crng_reseed(); /* Sets crng_init to CRNG_READY under base_crng.lock. */ - if (static_key_initialized) - execute_in_process_context(crng_set_ready, &set_ready); + if (static_key_initialized && system_unbound_wq) + queue_work(system_unbound_wq, &set_ready); wake_up_interruptible(&crng_init_wait); kill_fasync(&fasync, SIGIO, POLL_IN); pr_notice("crng init done\n"); @@ -870,8 +870,8 @@ void __init random_init(void) /* * If we were initialized by the cpu or bootloader before jump labels - * are initialized, then we should enable the static branch here, where - * it's guaranteed that jump labels have been initialized. + * or workqueues are initialized, then we should enable the static + * branch here, where it's guaranteed that these have been initialized. */ if (!static_branch_likely(&crng_is_ready) && crng_init >= CRNG_READY) crng_set_ready(NULL);  
    The code change is rather simple: it switches from using "execute_in_process_context" to "queue_work", but that switch is causing the lock-up. I don't have enough knowledge to debug why it is happening: suspect some sort of a deadlock.
     
    I've tried taking the "random.c" from the 6.6.34 kernel and doing hacky modifications to get to to compile on 6.1.y: that fixed the problem, so I'm guessing the "random.c" on the 6.1.y branch is not in a good state.
     
    Does anyone have suggestions on how to proceed from here?
  4. Like
    mikhailai got a reaction from Bernie_O in K-worker problem on A20 based boards   
    Actually, there is a better way to eliminate this CPU load without disabling the ADC driver and losing the ability to access the SOC temperature. The Linux thermal management periodically polls the SOC temperature, and you can disable it by adding the following line to /etc/rc.local:
    echo disabled > /sys/devices/virtual/thermal/thermal_zone0/mode  
    Details/background:
    The A20 ADC driver is very inefficient: it seems to be using a busy loop (as mentioned here: https://linux-sunxi.narkive.com/1UetD5rh/bug-report-kworker-issue-for-mailline-kernel-4-12) The Linux thermal management periodically polls the SOC temperature (runs the ADC) at around 0.9Hz. I presume this would be useful for controlling the CPU fan, but probably not applicable to most A20 systems. Disabling this polling (as above) removes the CPU load, while still allowing inefficient one-off reads of the CPU temperature, as well as other ADC usage.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines