Jump to content

ESPRESSOBin Board not booting due to kernel panic if SATA connected


Rötti

Recommended Posts

Hello every one,

 

I own two ESPRESSOBin boards V5.

And to both I attached an XCSOURCE® MiniPCIe Sata3.0 AC696 extension card via MiniPCIe.

This is the link to amazon: https://www.amazon.de/dp/B06XRG2TGV

 

I tested several images from https://www.armbian.com/espressobin/#kernels-archive-all

Unfortunatelly all the old images have been deleted last week, so I could not continue testing.

 

Tested Kernels 8 weeks ago + the latest two this week:
- 5.10.09-mvebu64  #21.02.0-hirsute (trunk) <-- works not
- 5.08.18-mvebu64  #20.11.6-bionic <-- works not
- 5.08.18-mvebu64  #20.11.3-focal <-- works not
- 5.08.18-mvebu64  #20.11.3-bionic <-- works not
- 5.08.06-mvebu64  #20.08.2-focal <-- works not
- 4.14.135-mvebu64 #19.11.3-bionic <-- works

 

 

Here is the whole UART-dump:

TIM-1.0
WTMI-devel-18.12.0-a0a1cb8
WTMI: system early-init
SVC REV: 3, CPU VDD voltage: 1.155V
NOTICE:  Booting Trusted Firmware
NOTICE:  BL1: v1.5(release):1f8ca7e (Marvell-devel-18.12.2)
NOTICE:  BL1: Built : 09:48:09, Feb 20 2019
NOTICE:  BL1: Booting BL2
NOTICE:  BL2: v1.5(release):1f8ca7e (Marvell-devel-18.12.2)
NOTICE:  BL2: Built : 09:48:10, Feb 20 2019
NOTICE:  BL1: Booting BL31
NOTICE:  BL31: v1.5(release):1f8ca7e (Marvell-devel-18.12.2)
NOTICE:  BL31: Built : 09:4

U-Boot 2018.03-devel-18.12.3-gc9aa92c-armbian (Feb 20 2019 - 09:45:04 +0100)

Model: Marvell Armada 3720 Community Board ESPRESSOBin
       CPU     1000 [MHz]
       L2      800 [MHz]
       TClock  200 [MHz]
       DDR     800 [MHz]
DRAM:  2 GiB
Comphy chip #0:
Comphy-0: USB3          5 Gbps
Comphy-1: PEX0          2.5 Gbps
Comphy-2: SATA0         6 Gbps
Target spinup took 0 ms.
AHCI 0001.0300 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
flags: ncq led only pmp fbss pio slum part sxs
PCIE-0: Link up
MMC:   sdhci@d0000: 0, sdhci@d8000: 1
Loading Environment from SPI Flash... SF: Detected w25q32dw with page size 256 Bytes, erase size 4 KiB, total 4 MiB
OK
Model: Marvell Armada 3720 Community Board ESPRESSOBin
Net:   eth0: neta@30000 [PRIME]
Hit any key to stop autoboot:  0
starting USB...
USB0:   Register 2000104 NbrPorts 2
Starting the controller
USB XHCI 1.00
USB1:   USB EHCI 1.00
scanning bus 0 for devices... 1 USB Device(s) found
scanning bus 1 for devices... 1 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found

## Loading init Ramdisk from Legacy Image at 01100000 ...
   Image Name:   uInitrd
   Image Type:   AArch64 Linux RAMDisk Image (gzip compressed)
   Data Size:    10750023 Bytes = 10.3 MiB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 06000000
   Booting using the fdt blob at 0x6000000
   Loading Ramdisk to 7ebea000, end 7f62a847 ... OK
   Using Device Tree in place at 0000000006000000, end 00000000060059cd

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 5.8.18-mvebu64 (root@beast) (aarch64-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0, GNU ld (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 2.32.0.20190321) #20.11.3 SMP PREEMPT Fri Dec 11 21:10:52 CET 2020
[    0.000000] Machine model: Globalscale Marvell ESPRESSOBin Board
[    0.000000] earlycon: ar3700_uart0 at MMIO 0x00000000d0012000 (options '')
[    0.000000] printk: bootconsole [ar3700_uart0] enabled
Loading, please wait...
Starting version 245.4-4ubuntu3.3
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... Scanning for Btrfs filesystems
done.
Begin: Will now check root file system ... fsck from util-linux 2.34
[/usr/sbin/fsck.ext4 (1) -- /dev/mmcblk0p1] fsck.ext4 -a -C0 /dev/mmcblk0p1
/dev/mmcblk0p1: clean, 41739/1828336 files, 439779/7502824 blocks
done.
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... done.
[    3.694604] Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP
[    3.699465] Modules linked in: tag_edsa mv88e6xxx dsa_core bridge stp llc phy_mvebu_a3700_comphy
[    3.708518] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.8.18-mvebu64 #20.11.3
[    3.716037] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[    3.722685] Workqueue: events free_work
[    3.726614] pstate: 00000085 (nzcv daIf -PAN -UAO BTYPE=--)
[    3.732352] pc : ahci_single_level_irq_intr+0x1c/0x90
[    3.737549] lr : __handle_irq_event_percpu+0x5c/0x168
[    3.742737] sp : ffffffc0113bbd10
[    3.746142] x29: ffffffc0113bbd10 x28: ffffff807d48b700
[    3.751608] x27: 0000000000000060 x26: ffffffc010f085e8
[    3.757073] x25: ffffffc0113075a5 x24: ffffff8079101800
[    3.762539] x23: 000000000000002d x22: ffffffc0113bbdd4
[    3.768004] x21: 0000000000000000 x20: ffffffc011465008
[    3.773470] x19: ffffff8079381600 x18: 0000000000000000
[    3.778936] x17: 0000000000000000 x16: 0000000000000000
[    3.784401] x15: 000000d2c010fc50 x14: 0000000000000323
[    3.789867] x13: 00000000000002d4 x12: 0000000000000000
[    3.795332] x11: 0000000000000040 x10: ffffffc011282dd8
[    3.800798] x9 : ffffffc011282dd0 x8 : ffffff807d000270
[    3.806263] x7 : 0000000000000000 x6 : 0000000000000000
[    3.811729] x5 : ffffffc06ea93000 x4 : ffffffc0113bbe10
[    3.817196] x3 : ffffffc06ea93000 x2 : ffffff8079101a80
[    3.822661] x1 : ffffff8078803e00 x0 : 000000000000002d
[    3.828126] Call trace:
[    3.830642]  ahci_single_level_irq_intr+0x1c/0x90
[    3.835478]  __handle_irq_event_percpu+0x5c/0x168
[    3.840315]  handle_irq_event_percpu+0x38/0x90
[    3.844885]  handle_irq_event+0x48/0xe0
[    3.848828]  handle_simple_irq+0x94/0xd0
[    3.852860]  generic_handle_irq+0x30/0x48
[    3.856985]  advk_pcie_irq_handler+0x214/0x240
[    3.861552]  __handle_irq_event_percpu+0x5c/0x168
[    3.866389]  handle_irq_event_percpu+0x38/0x90
[    3.870959]  handle_irq_event+0x48/0xe0
[    3.874900]  handle_fasteoi_irq+0xb8/0x170
[    3.879112]  generic_handle_irq+0x30/0x48
[    3.883234]  __handle_domain_irq+0x64/0xc0
[    3.887447]  gic_handle_irq+0xc8/0x168
[    3.891298]  el1_irq+0xb8/0x180
[    3.894524]  unmap_kernel_range_noflush+0x128/0x188
[    3.899540]  remove_vm_area+0xac/0xd0
[    3.903303]  __vunmap+0x48/0x298
[    3.906618]  free_work+0x44/0x60
[    3.909937]  process_one_work+0x1e8/0x360
[    3.914057]  worker_thread+0x44/0x480
[    3.917820]  kthread+0x154/0x158
[    3.921135]  ret_from_fork+0x10/0x34
[    3.924812] Code: a90153f3 f9401022 f9400854 91002294 (b9400293)
[    3.931087] ---[ end trace 98b323414bb99c99 ]---
[    3.935829] Kernel panic - not syncing: Fatal exception in interrupt
[    3.942368] SMP: stopping secondary CPUs
[    3.946403] Kernel Offset: disabled
[    3.949985] CPU features: 0x240002,2000200c
[    3.954283] Memory Limit: none
[    3.957424] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

 

The boards boots up if I don't plug in any SATA HDDs into the extension card.

 

I hope this helps. If you need any other information just let me know, I'm absolutely willing to help. But please be aware that I'm a software developer coming from windows trying to get into linux. But I have no clue of kernel patching/compiling etc. Sorry!

 

Thank you very, very much in advance! You're doing an awesome job.

 

Sincerely Rötti

Link to comment
Share on other sites

10 hours ago, Rötti said:

But I have no clue of kernel patching/compiling etc. Sorry!


We have no time and absolutely no budget to cover bugs on this very buggy hardware. Officially Armbian don't have any maintainer(s) for this board anymore, so its basically a community supported, as is. @Pali is trying to bring up mainline u-boot support and that's about it. Don't know how far things are and if upgrading u-boot helps in this case. It's only alternative, so worth trying.

 

This is just a bug like any other - moved to bug tracker forums.

 

10 hours ago, Rötti said:

But please be aware that I'm a software developer coming from windows trying to get into linux.


You couldn't choose worse hardware :(

 

10 hours ago, Rötti said:

I'm absolutely willing to help.


So help.

Link to comment
Share on other sites

@RöttiThis looks like an AHCI or PCIe issue. Please report bug to the linux-ide@vger.kernel.org mailing list where are more SATA/AHCI developers and could help you to debug issue. Maybe it is also A3720 related PCIe issue, so send me an email I can provide you a A3720 PCIe patch which could fix some stability issues.

Link to comment
Share on other sites

Hello @Igor, hello @Pali

 

thank you for the information and thank you for moving the topic into the right forum.

On 1/25/2021 at 8:24 AM, Igor said:

You couldn't choose worse hardware 

Again thanks for that information, but I'm still here to get Armbian run on even this buggy hardware ;-)

 

On 1/25/2021 at 8:24 AM, Igor said:

So help.

If you expect me to do more than writing to the linux kernel mailing list and the EspressoBIN forum, please point to the right direction and I will help.

For example, do you have any backups of all previous images? If so, where can I find them. I'd like to try all images to narrow down when the bug has been introduced.

 

Update:

@Pali Thank you so much for pointing me to this mailing list. This was exactly what I was looking for. I wrote them a mail and will report back as soon as there are new information being revealed.

Furthermore I added a post to the EspressoBIN Forum, which is currently on review. As soon as it's being approved, I'll link the post here.

 

Thank in advance.

Edited by Rötti
Saw the later post of Pali
Link to comment
Share on other sites

Hello @lampra,

 

thanks a lot for this awesome hint! I have been using this link: https://redirect.armbian.com/region/EU/espressobin, on which I found the archived images, but were removed approx 2 weeks ago.

Feeling so dumb, this archive-button is so big, that my brain based ad exclusion just removed it ;-)

 

This is going to help a lot! I'll be able to find out in which exact version the bug has been introduced.

 

 

Link to comment
Share on other sites

@Rötti Are you able to boot from the onboard SATA? (not from the extension card via MiniPCIe).

I couldn't boot from SATA a year ago so I gave up and used openwrt but I would prefer using Armbian (I can't test newer releases as it is installed far away from home). 

 

Link to comment
Share on other sites

Hello guys,

 

On 1/27/2021 at 12:22 PM, Pali said:

@RöttiThis looks like an AHCI or PCIe issue. Please report bug to the linux-ide@vger.kernel.org mailing list where are more SATA/AHCI developers and could help you to debug issue. Maybe it is also A3720 related PCIe issue, so send me an email I can provide you a A3720 PCIe patch which could fix some stability issues.

 

@Pali I posted the problem to the ide-linux kernel mailing list as proposed, but unfortunately received no answer.
Here is the link: https://www.spinics.net/lists/linux-ide/msg60178.html

 

Furthermore I were able to narrow down the kernel versions and exact image version of Armbian where it broke:
Armbian 19.11.3 with Kernel 4.14.135 <- last version which was working
Armbian      5.65 with Kernel 4.18.16   <- first version which is not working

 

 

On 1/31/2021 at 3:28 AM, lampra said:

@Rötti Are you able to boot from the onboard SATA? (not from the extension card via MiniPCIe).

I couldn't boot from SATA a year ago so I gave up and used openwrt but I would prefer using Armbian (I can't test newer releases as it is installed far away from home). 

 

@lampra I'm not booting from sata. I'm still booting from SD-card, but get a kernel panic by mere plugging in the SATA cable via the pcie sata-controller. I'm sorry I'm not having further information about your issue.

Link to comment
Share on other sites

On 1/31/2021 at 3:28 AM, lampra said:

I couldn't boot from SATA a year ago so I gave up and used openwrt but I would prefer using Armbian (I can't test newer releases as it is installed far away from home). 

Do you mean loading firmware directly from SATA (without NOR)? Or loading firmware from NOR and only loading kernel from SATA?

 

Last week I updated documentation how to build & store fimware to SATA disk (needs special partition), see: https://trustedfirmware-a.readthedocs.io/en/latest/plat/marvell/armada/build.html (search for "SATA device boot")

 

And last year I have tested that U-Boot loaded from NOR is able to access SATA disk and load kernel from it without need to use uSD card. If it does not work post U-Boot output from serial console.

Link to comment
Share on other sites

On 2/7/2021 at 11:19 PM, Rötti said:

@Pali I posted the problem to the ide-linux kernel mailing list as proposed, but unfortunately received no answer.
Here is the link: https://www.spinics.net/lists/linux-ide/msg60178.html

 

Furthermore I were able to narrow down the kernel versions and exact image version of Armbian where it broke:
Armbian 19.11.3 with Kernel 4.14.135 <- last version which was working
Armbian      5.65 with Kernel 4.18.16   <- first version which is not working

 

I have looked at email which you sent to mailing list https://lore.kernel.org/linux-ide/cbbb2496501fed013ccbeba524e8d573@posteo.de/T/#u and you did not provide all / enough information. At least output from lspci -nn -vv is needed to correctly identify type of your PCIe SATA controller. Also there is missing dmesg output between [ 0.000000] and [ 3.694604] period. Please provide these informations (to mailing list).

Link to comment
Share on other sites

Hi Pali,

 

deeply sorry for the long delay, but we struggled with some Covid-19 related issue within the family.

 

On 2/10/2021 at 1:50 PM, Pali said:

 

I have looked at email which you sent to mailing list https://lore.kernel.org/linux-ide/cbbb2496501fed013ccbeba524e8d573@posteo.de/T/#u and you did not provide all / enough information. At least output from lspci -nn -vv is needed to correctly identify type of your PCIe SATA controller. Also there is missing dmesg output between [ 0.000000] and [ 3.694604] period. Please provide these informations (to mailing list).

I Added the lspci -nn -vv output to the mailing list.

But I could not find the according dmesg output. After unplugging the SATA cable and rebooting I were able to login and looking at /var/dmesg but I couldn't find any information from the time around the crash.

 

On 3/5/2021 at 10:41 PM, Pali said:

@Rötti: Also please boot linux kernel with console=ttyMV0,115200 earlycon=ar3700_uart,0xd0012000 command line option so output on UART would contain full boot log.

 

As you can see in the output below I already have these parameters in the console variable.
Is there a special way to boot with this parameter, or is it automatically used
when I call 'boot' because of 'set_bootargs' which contains 'console' already?

 

Marvell>> printenv
arch=arm
baudrate=115200
board=mvebu_armada-37xx
board_name=mvebu_armada-37xx
boot_a_script=ext4load ${boot_interface} ${devnum}:1 ${scriptaddr} ${prefix}boot.scr;source ${scriptaddr};
boot_prefixes=/ /boot/
boot_targets=usb mmc1 mmc0
bootcmd=for target in ${boot_targets}; do run bootcmd_${target}; done
bootcmd_mmc0=setenv devnum 0; setenv boot_interface mmc; run scan_dev_for_boot;
bootcmd_mmc1=setenv devnum 1; setenv boot_interface mmc; run scan_dev_for_boot;
bootcmd_usb=setenv devnum 0; usb start;setenv boot_interface usb; run scan_dev_for_boot;
bootdelay=2
console=console=ttyMV0,115200 earlycon=ar3700_uart,0xd0012000
cpu=armv8
eth1addr=00:51:82:11:22:01
eth2addr=00:51:82:11:22:02
eth3addr=00:51:82:11:22:03
ethact=neta@30000
ethaddr=00:51:82:11:22:00
ethprime=eth0
extra_params=pci=pcie_bus_safe
fdt_addr=0x6000000
fdt_addr_r=0x6f00000
fdt_high=0xffffffffffffffff
fdt_name=fdt.dtb
fdtcontroladdr=7f62d490
gatewayip=10.4.50.254
get_images=tftpboot $kernel_addr_r $image_name; tftpboot $fdt_addr_r $fdt_name; run get_ramfs
get_ramfs=if test "${ramfs_name}" != "-"; then setenv ramdisk_addr_r 0x8000000; tftpboot $ramdisk_addr_r $ramfs_name; else setenv ramdisk_addr_r -;fi
hostname=marvell
image_name=Image
initrd_addr=0x1100000
initrd_image=uInitrd
initrd_size=0x2000000
ipaddr=0.0.0.0
kernel_addr=0x7000000
kernel_addr_r=0x7000000
loadaddr=0x8000000
netdev=eth0
netmask=255.255.255.0
ramdisk_addr_r=0x8000000
ramfs_name=-
root=root=/dev/nfs rw
rootpath=/srv/nfs/
scan_dev_for_boot=for prefix in ${boot_prefixes}; do echo ${prefix};run boot_a_script; done
scriptaddr=0x6d00000
serverip=0.0.0.0
set_bootargs=setenv bootargs $console $root ip=$ipaddr:$serverip:$gatewayip:$netmask:$hostname:$netdev:none nfsroot=$serverip:$rootpath,tcp,v3 $extra_params $cpuidle
soc=mvebu
stderr=serial@12000
stdin=serial@12000
stdout=serial@12000
vendor=Marvell

Environment size: 1962/65532 bytes

 

I thank you very much for your awesome support!

 

 

Link to comment
Share on other sites

14 minutes ago, Rötti said:

As you can see in the output below I already have these parameters in the console variable.
Is there a special way to boot with this parameter, or is it automatically used
when I call 'boot' because of 'set_bootargs' which contains 'console' already?

 

If you call 'boot' command it executes 'bootcmd' variable. And if you trace 'bootcmd' from your printenv output it can be clear that 'set_bootargs' is not called in this path.

 

Seems that your 'bootcmd' ends in 'boot_a_script' variable which loads external boot script (from uSD card?) and this one boots kernel. Script can do anything, including setting new variables, etc. So it may be possible that this script set or does not set 'console' into 'bootargs'. You need to investigate it.

 

You could try to unset 'console' (= booting without console=ttyMV0,115200), maybe it helps. For recent kernels this console should not be needed.

Link to comment
Share on other sites

This is issue in ASMedia SATA controller card, not in Espressobin PCIe. Card announces support for 512 byte long PCIe packets, but when PCIe controller is configured for such long payload size then card cause system crash. We have reproduce this issue on other platform too.

 

Marek sent kernel patch which adds quirk for this ASMedia SATA controller to set maximal payload size to 265 bytes https://lore.kernel.org/linux-pci/20210317115924.31885-1-kabel@kernel.org/T/#u and which should workaround this issue.

Link to comment
Share on other sites

Hello Pali,

 

as said in the kernel mailing list: Thanks to you and Marek.

What I'd like to ask is, how does the workflow or the next steps look like?

 

I mean, how long does it take to get the patch into the kernel (weeks, month)?

How likely is it to be rejected?

As soon as there is a new Kernel, will there be any nightly builds from armbian side?

Are those changes flowing into Armbian directly or are there backports?

 

Thank you in advance.

Link to comment
Share on other sites

2 minutes ago, Rötti said:

I mean, how long does it take to get the patch into the kernel (weeks, month)?

 

It would be merged either into next -rc version or into next mainline version (depends on how maintainers decide). See https://www.kernel.org/ for current released versions. And see https://www.kernel.org/category/faq.html for question When will the next kernel be released?

 

After it is merged into rc or mainline version then this patch (because it is marked as bugfix) would be automatically included also into all longterm versions.

 

2 minutes ago, Rötti said:

How likely is it to be rejected?

 

Unlikely. In case it is rejected it would mean it is needed to update this patch (fix issues) and Marek or me will do it.

 

For rest of armbian related questions ask armbian people.

 

Link to comment
Share on other sites

The new patch got into the current branch. I compiled Armbian with:

sudo ./compile.sh BOARD=espressobin BRANCH=current RELEASE=focal BUILD_MINIMAL=no BUILD_DESKTOP=no KERNEL_ONLY=no KERNEL_CONFIGURE=no COMPRESS_OUTPUTIMAGE=sha,gpg,img

 

And can confirm that the bug is gone now and the AS-Media based SATA controller chips are now working again.

@PaliThank you for organizing the patch.

 

@WernerThank you for you support and help getting the patch into Armbian.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines