Hi y'all. Sorry for disappearing for a while: it really took some time to investigate, but now I'm pretty sure I've found the root cause.
TLDR:
My analysis above is wrong. The bug is present in v6.1.87, and change to "drivers/char/random.c" has nothing to do with it: just accidentally happens to trigger the bug.
The problem only manifests itself when the ftrace "mcount" call instruction for _raw_spin_unlock_irqrestore function in the kernel code straddles the instruction cache lines. This happens when "_raw_spin_unlock_irqrestore" address ends on (hex): 1c, 3c, 5c, ... fc (see System.map for that).
Due to above arbitrary changes to the kernel code may trigger this problem to appear or disappear. In other words, the hang may look fixed, but then show up later.
It is present through the whole 6.1.y kernel branch, as well as 6.6.y branch. I did not check the mainline or earlier branches.
The problem does not appear when the kernel is compiled with GCC 9, which is a default cross-compiler on Ubuntu 20.04 (Focal).
See the bottom of this post for the correct fix. I'll try to get into the upstream Linux kernel.
LONG VERSION:
Problem: Linux kernel hangs in early boot on 32-bit ARM platform, when ftrace 4-byte "mcount" function call location for "_raw_spin_unlock_irqrestore" function straddles icache lines.
The problem persist through the whole 6.1.y kernel branch and likely beyond. Could also reproduce it in the 6.6.y branch with a bit more "nop" placement (see below).
ROOT CAUSE ANALYSIS:
The hang is inside:
start_kernel -> ftrace_init -> ftrace_process_locs -> ftrace_update_code.
It hangs when it updates the ftrace location (by calling "ftrace_nop_initialize") for the entry for:
_raw_spin_unlock_irqrestore
The reason is the following:
"ftrace_nop_initialize" calls "ftrace_init_nop", which on 32-bit ARM goes to "ftrace_make_nop".
"ftrace_make_nop" calls "ftrace_modify_code" that calls "__patch_text", that in-turn calls "__patch_text_real" (defined in "arch/arm/kernel/patch.c") with remap=true.
After writing the actual instruction, "__patch_text_real" does the following:
if (waddr != addr) {
flush_kernel_vmap_range(waddr, twopage ? size / 2 : size);
patch_unmap(FIX_TEXT_POKE0, &flags);
}
flush_icache_range((uintptr_t)(addr),
(uintptr_t)(addr) + size);
The "patch_unmap" calls the above-mentioned "_raw_spin_unlock_irqrestore".
Hereby lies the problem. If it's patching the "_raw_spin_unlock_irqrestore", it invokes the function BEFORE flushing the icache, so there is a possibility of that function having an invalid code created by the combination of the updated and non-updated pieces of the instruction residing in different cache lines. The occurrence of the error strongly depends on other factors: that's why it worked for earlier 6.1.y kernels. Necessary factors:
The ftrace location for "_raw_spin_unlock_irqrestore" is NOT 4-byte aligned and 4 bytes at this location straddle the instruction cache line (0x20) boundaries. I.e. the pg->records[i]->ip (hex) value ends on: 0x1e, 0x3e, 0x5e, ... 0xfe. For that function, this value is offset from the function address by 2 bytes.
The previous Ftrace entry needs to be updated as well. That is probably needed to get the icache into inconsistent state. For the reproduced hangs, the previous entry is inside the "_raw_write_unlock_irqrestore" (unlike _raw_spin_unlock_irqrestore, it is NOT being invoked when "ftrace_update_code" is executing).
The problem is present for (cross-compiler) GCC 10, 11, 12. It does not happen when the kernel is compiled with GCC 9, even when condition (1) is satisfied. Not sure what is the reason: could be different code or condition (2) being different, leading to cache NOT get into an inconsistent state. Note, the default cross-compiler on Ubuntu 22.04 (Jammy) is GCC 11, while the default compiler on Ubuntu 20.04 (Focal) is GCC 9.
Note, the condition (1) can be achieved by increasing/decreasing code size of certain functions. The following algorithm can be used.
Add 4 "nop" instructions at a time to "drivers/char/random.c", "try_to_generate_entropy" function, until "_raw_spin_unlock_irqrestore" address ends on -x8, or -xC, where "x" is odd. E.g. ...1c, ...3c, ...5c, etc. E.g. asm("nop;nop;nop;nop; ");
If it ends on 8, add 2 more "nop" instructions to one of the lock functions inside the "__lock_text_start" section: see the System.map on which one comes first/earlier.
PROPOSED FIX:
The fix is really simple: just swap the order of "patch_unmap" and "flush_icache_range" in the above code snippet (from "arch/arm/kernel/patch.c", "__patch_text_real" function). I.e. replace the above code snippet with:
if (waddr != addr)
flush_kernel_vmap_range(waddr, twopage ? size / 2 : size);
flush_icache_range((uintptr_t)(addr),
(uintptr_t)(addr) + size);
/* Can only call 'patch_unmap' after flushing dcache and icache,
* because it calls 'raw_spin_unlock_irqrestore', but that may
* happen to be the very function we're currently patching
* (as it happens during the ftrace init).
*/
if (waddr != addr)
patch_unmap(FIX_TEXT_POKE0, &flags);