Jump to content

How to make ESPRESSObin v7 stable?


Anders

Recommended Posts

> We need to identify the patch Armbian is missing ...

I though that would be simple. Turns out it is not. How to build 4.4.52 is documented here http://wiki.espressobin.net/tiki-index.php?page=Build+From+Source+-+Kernel but the code is vastly different from whats in mainline Linux.

Ex, this patch sounds interesting:

commit 6640985b0697f08d5106e6c4cd66dc61ec5e6a2d
Author: Victor Gu <xigu@marvell.com>
Date:   Wed Sep 20 10:00:49 2017 +0800

    fix: regulator: armada-37xx: overwrite CPU voltage values in 1000MHZ
    
    The original CPU voltage values from load 1 to load 3 are too low for
    EspressoBin board with Armada-37xx SoC when CPU is 1000MHZ, which leads
    to instability that CPU gets stuck soon during dynamic voltage scaling.
    In order to fix this issue, this patch adds the compatible string for
    EspressoBin AVS, and update the CPU voltage values from load 1 to load 3
    in 1000MHZ mode accordingly, the value is updated from original 1.05v
    to 1.108v.
    
    Change-Id: Iae22cb3bb243b3345e7426e859313139637f09e7
    Signed-off-by: Victor Gu <xigu@marvell.com>

diff --git a/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt b/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt
index 7ed7a619..5a853dd6 100644
--- a/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt
@@ -1,7 +1,7 @@
 Marvell Armada 3700 voltage regulator by AVS
 
 Required properties:
-- compatible: must be "marvell,armada-3700-avs"
+- compatible: must be "marvell,armada-3700-avs" or "marvell,armada-3700-espressobin-avs"
 - reg: avs register address, which is used to control CPU voltage
 - clocks: CPU core clock to get the MAX CPU frequency
 - any required generic properties defined in regulator.txt
diff --git a/drivers/regulator/armada-37xx-regulator.c b/drivers/regulator/armada-37xx-regulator.c
index bd3b9504..1185f6a5 100644
--- a/drivers/regulator/armada-37xx-regulator.c
+++ b/drivers/regulator/armada-37xx-regulator.c
@@ -274,6 +274,7 @@ static int armada_3700_avs_probe(struct platform_device *pdev)
        struct regulator_config config = { };
        struct regulator_dev *rdev;
        u32 max_cpu_freq;
+       int idx;
 
        avs = devm_kzalloc(&pdev->dev, sizeof(*avs), GFP_KERNEL);
        if (!avs) {
@@ -315,6 +316,13 @@ static int armada_3700_avs_probe(struct platform_device *pdev)
                avs->freq_level = CPU_FREQ_LEVEL_800MHZ;
        } else if (max_cpu_freq == CPU_FREQ_1000MHZ) {
                avs->freq_level = CPU_FREQ_LEVEL_1000MHZ;
+               /*
+                * Overwrite the VDD values from load 1 to load 3 in 1000MHZ
+                * for EspressoBin, otherwize the CPU gets stuck.
+                */
+               if (of_device_is_compatible(np, "marvell,armada-3700-espressobin-avs"))
+                       for (idx = VDD_SET1; idx <= VDD_SET3; idx++)
+                               voltage_m_tbl[avs->freq_level][idx] = 1108;
        } else if (max_cpu_freq == CPU_FREQ_1200MHZ) {
                avs->freq_level = CPU_FREQ_LEVEL_1200MHZ;
        } else {
@@ -399,6 +407,7 @@ static const struct dev_pm_ops armada_3700_avs_pm_ops = {
 
 static const struct of_device_id armada3700_avs_of_match[] = {
        { .compatible = "marvell,armada-3700-avs", },
+       { .compatible = "marvell,armada-3700-espressobin-avs", },
        {}
 };

But the file drivers/regulator/armada-37xx-regulator.c is nowhere to be found in mainline or espressobin kernels.

 

 

Also I just tried building Linux 5.2.0-rc4-next-20190614, which runs fine, and reports 1GHz but 7zip still reports 800MHz.

Edited by Anders
Remove irelevant bug. Linux 5.2.0-rc4-next-20190614.
Link to comment
Share on other sites

1 hour ago, ebin-dev said:

Did you try this with the 800_800 or 600_600 bootloader  ?

With 800_800 it works fine so far. Some strange messages in dmesg though: http://ix.io/1KDu

They may be actual bugs in linux-next though (including the "Kernel stack is corrupted in: write_irq_affinity" error) https://lkml.org/lkml/2019/5/31/761

 

Do you want me to test 600_600 as well?

Edited by Anders
Add link to mailing list
Link to comment
Share on other sites

On 6/1/2019 at 4:25 PM, ebin-dev said:

 

No thanks - that's fine so far.

 

If you have some time for further tests on your V7 Espresso you could switch to kernel 4.14.79 (it is the current kernel provided by Marvell, cpufreq patch is not applied): in ‘armbian-config‘ switch to stable builds (select System->Stable) and if the process is finished, switch from next to other kernels (System->Other kernels-> 4.14.79). 

 

Edit: Marvells kernel 4.14.79 is an option - it ist stable at least on V5 EspressoBins with bootloader 1000_800. But the real CPU frequency reported by 7zip is also only 800MHz instead of 1000MHz (same behaviour as kernel 4.19.4x and 5.x)

Link to comment
Share on other sites

Here's an update:

I've been running Armbian with nightly kernels, (currently on 4.19.50-mvebu64) and flash-image-ddr4-1g-1cs-800_800-2019-05-21.bin uboot entirely stable for some time now. The only two remaining problems being:

1) U-boot sometimes get stuck. I just reset the board until it boots.

2) With flash-image-ddr4-1g-1cs-1000_800-2019-05-21.bin I'm not getting 1GHz. (Also tested with 5.2.0-rc4-next-20190614).

 

To help address the last problem, I just posted in the Espressobin forums: http://espressobin.net/forums/topic/mainline-linux-not-running-at-1ghz/

UPDATE: wtf - espressobin deleted that thread from their forum. Here's a repost.

 

Hopefully someone is able to reach out to Marvell and make them fix the bug.

 

Thanks for all the help so far :)

Edited by Anders
espressobin.net deleted my thread. Update link.
Link to comment
Share on other sites

Just to let you know: I sent back the V7, got a refund (I had it ordered at Amazon, so it wasn't an issue) and ordered a V5 directly at Globalscale, which arrived quickly. It's running on conservative 800MHz Armbian for now (I am more interested in stability than excessive speed) and seems stable (uptime 37h by now), giving me good speeds over GBitEthernet of almost 80MB/s with an old SATA HDD. So I guess this will be the prototype for my Simple Home Server start-up!

Link to comment
Share on other sites

So.. guys, i see here that you also have a problem of boot getting stuck very early in the process.


my v7 board is stuck here

TIM-1.0
WTMI-devel-18.12.1-e6bb176
WTMI: system early-init
SVC REV: 5, CPU VDD voltage: 1.155V

 

more often than not (i'd say s a 70% chance of getting stuck here). Once it boots it is relatively stable with Buster.

How did you solve this issue?

 

Also, i do have the same problem of CPU not working higher than 800MHz.

Link to comment
Share on other sites

Not sure if this is related, but the Netgate team is seeing an unusually high instability failure percentage on the SG1100, which is a branded Espressobin.  They say this is related to the power system on the board I think, and caused by a supplier component problem. I imagine it is possible the same supply chain issue has impacted the Espressobin. 

 

https://forum.netgate.com/topic/144636/sg-1100-intermittent-reboots

Link to comment
Share on other sites

I've been looking at something relating to this, recompiling and modifying atf/A3700-marvell-utils to build a viable u-boot. From what I've figure out already, it seems the voltage (VDD) applied by marvell is too low by default.

 

Does anyone have any visibility on where the modification to ddr4-*cs-/*g.txt from http://wiki.espressobin.net/tiki-download_file.php?fileId=216 came from? namely the following line:

+;Step9: DDRPHY Driver/Receiver & DQS internal Pullup/Pulldown settings
+;WRITE: 0xC0001004 0xD0133449

+WRITE: 0xC0001004 0xD0677449

 

That is the only difference between the Espressobin provided ddr init and the mainline Marvell A3700 ddr init code.

At the moment I'm running an Espressobin v7 1000/800 using WTMI-devel-18.12.1-e6bb176 / 2018.03-devel-18.12.3-gc9aa92ce70 with a modification to bump the AVS voltage from 1.032V to 1.155V by default. This is looking to be more stable currently..

 

If anyone wants the resulting builds let me know. I cannot use the latest https://dl.armbian.com/espressobin/u-boot/ provided versions of WTMI-devel-18.12.1-e6bb176 since it randomly fails to boot. I rebuilt and enabled debugging and so far no stalls.

 

Link to comment
Share on other sites

On 7/23/2019 at 9:21 PM, JDL said:

Not sure if this is related, but the Netgate team is seeing an unusually high instability failure percentage on the SG1100, which is a branded Espressobin.  They say this is related to the power system on the board I think, and caused by a supplier component problem. I imagine it is possible the same supply chain issue has impacted the Espressobin. 

 

https://forum.netgate.com/topic/144636/sg-1100-intermittent-reboots

 

Link to comment
Share on other sites

My V7 came in working fine  Then over time i hit these ranadom reboots

 

Tried slower bootloaders and latest armbian with no luck/

 

One interesting thing i found is after not touching it for several months, i t started up and worked ok.  I got it to boot, go into the OS and then after a few mins it reset again. The time its stayed up seemed to be growing shorter and shorter.

 

I left it unplugged over night and tried powering it up in the morning and it will not even get through INIT sequence to a console prompt.

 

470uf/16V SEEMS fine (no evidence of leaking)

 

Any ideas?

 

[  OK  ] Started Create Volatile Files and Directories.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Reached target System Time Synchronized.
[  OK  ] Started Entropy daemon using the HAVEGE algorithm.
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Found device /dev/ttyMV0.
[  OK  ] Started Raise network interfaces.
[  OK  ] St�TIM-1.0
WTMI-devel-18.12.1-e6bb176
WTMI: system early-init
SVC REV: 5, CPU VDD voltage: 1.050V
NOTICE:  Booting Trusted Firmware
NOTICE:  BL1: v1.5(release):1f8ca7e (Marvell-devel-18.12.2)
NOTICE:  BL1: Built : 16:25:52, May 21 2019
NOTICE:  BL1: Booting BL2
NOTICE:  BL2: v1.5(release):1f8ca7e (Marvell-devel-18.12.2)
NOTICE:  BL2: Built : 16:25:53, May 21 2019
NOTICE:  BL1: Booting BL31
NOTICE:  BL31: v1.5(release):1f8ca7e (Marvell-devel-18.12.2)
NOTICE:  BL31: Built : 16:25:56, May

U-Boot 2018.03-devel-18.12.3-gc9aa92c-armbian (Feb 20 2019 - 09:45:04 +0100)

Model: Marvell Armada 3720 Community Board ESPRESSOBin
       CPU     600 [MHz]
       L2      800 [MHz]
       TClock  200 [MHz]
       DDR     600 [MHz]
DRAM:  1 GiB
Comphy chip #0:
Comphy-0: USB3          5 Gbps    
Comphy-1: PEX0          2.5 Gbps  
Comphy-2: SATA0         6 Gbps    
SATA link 0 timeout.
AHCI 0001.0300 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
flags: ncq led only pmp fbss pio slum part sxs
PCIE-0: Link down
MMC:   sdhci@d0000: 0, sdhci@d8000: 1
Loading Environment from SPI Flash... SF: Detected mx25u3235f with page size 256 Bytes, erase size 64 KiB, total 4 MiB
OK
Model: Marvell Armada 3720 Community Board ESPRESSOBin
Net:   eth0: neta@30000 [PRIME]
Hit any key to stop autoboot:  0
starting USB...
USB0:   Register 2000104 NbrPorts 2
Starting the controller
USB XHCI 1.00

 

Link to comment
Share on other sites

Quote

One interesting thing i found is after not touching it for several months, i t started up and worked ok.  I got it to boot, go into the OS and then after a few mins it reset again. The time its stayed up seemed to be growing shorter and shorter.

 

I left it unplugged over night and tried powering it up in the morning and it will not even get through INIT sequence to a console prompt.

Thats the exact problem I was having. The capacitor only leaks under certain conditions, and by leak I don't mean leak the electrolytic fluid or bulge, but leak V through to GND when the temperature changes.

Yes I noticed the same thing occurring to me, over time the problem would get worse, presumably as the capacitor degraded further.

 

I would desolder the capacitor and replace with alike, I wouldn't mind betting that would fix your board.

 

I've informed GlobalScale how I fixed it, no response from them confirming the defective component, I don't expect them to confirm it either.

Link to comment
Share on other sites

10 hours ago, mu-b said:

Thats the exact problem I was having. The capacitor only leaks under certain conditions, and by leak I don't mean leak the electrolytic fluid or bulge, but leak V through to GND when the temperature changes.

 

I tried to place the bin in the fridge to bring the overall temperature down and then try boot it up.  This did not seem to help at all and the reboot happened before even the kernel fully initialized.

 

Im worried this is not my issue, i wish there was a way to confirm it.

 

But Im willing to try to replace the capacitor but it seems to be a not-so-easy replacement. Any tips?

 

 

Link to comment
Share on other sites

Quote

I tried to place the bin in the fridge to bring the overall temperature down and then try boot it up.  This did not seem to help at all and the reboot happened before even the kernel fully initialized.

Yes I had a fan on the device and it would cause it to perform the reboot continuously and more frequently (often before the device had a chance to complete even TIM/WTMI/u-boot, which rules out the Linux kernel).

 

Its not that difficult, if you have any experience with a soldering iron it'll take 10-15 minutes going slowly, the hardest part is desoldering the original component. In this case, youtube is your friend, plenty of tips on 'decapping'.

 

On another note, the device completed 2 threads of memtester 400 128 alongside stress --cpu 2 and has now been up for 3 days without a hiccup.

Link to comment
Share on other sites

On 10/7/2019 at 4:04 AM, mu-b said:

Yes I had a fan on the device and it would cause it to perform the reboot continuously and more frequently (often before the device had a chance to complete even TIM/WTMI/u-boot, which rules out the Linux kernel).

 

Its not that difficult, if you have any experience with a soldering iron it'll take 10-15 minutes going slowly, the hardest part is desoldering the original component. In this case, youtube is your friend, plenty of tips on 'decapping'.

 

On another note, the device completed 2 threads of memtester 400 128 alongside stress --cpu 2 and has now been up for 3 days without a hiccup.

So im waiting for the correct capacitor to arrive (470uF 16V 108C) but i tried booting the board up without it (since it was really just a filter anyway) and same thing happend.

 

Tried also with a 470uF 25V 85C jammed in and it still rebooted

 

So it doesn't look like this will work :/

EC1 is off the 12v rail left most column. 

image.png.02d3f30d5d75de95a50bc939f22166ab.png

 

Link to comment
Share on other sites

Yes it seems instability is still there, albeit it was stable for 4 days straight.

 

I'm still asking GlobalScale for their explanation of the faulty part. With any luck they will respond. I can't see why they wouldn't since the device is a 'community board' and thus they rely on the community for support so it would be rather hypocritical to refuse to disclose knowledge relating to the hardware to the community.

Link to comment
Share on other sites

What version number do you have

 

Its the little text at the bottom of the case on the right hand side between the two barcodes

 

i seen two 0702 and 0710

 

0702 had the bottom as one big heatsink

 

0710 has 3 small heatsinks and a bottom filled with holes

 

Also seems the capacitor is different in both.

 

I also noticed that the board does not stay powered when only the USB console cable is plugged in (V5 and earlier V7 would both show power  but not actually boot. )

 

so they must have re-done some power stuff on the new board!

Link to comment
Share on other sites

I have 0702 board (doesn't boot from just USB power; sometimes gets stuck at CPU VDD uboot line) and also started experiencing this reboot issue about a month ago. Before that it was happily working for months without any uptime issues, even during rather hot days (~40C). It may have coincided with the room it's in getting relatively chilly in the nights (~5C).

 

@mu-b, you seem to have put an amazing effort into investigating this. Thank you!

 

Have you been able to:

  1. Get GlobalScale reply on this matter?
  2. Increase voltage with custom U-Boot? My board also reports "CPU VDD voltage: 1.050V"

Is there anything I could try to help with?

Link to comment
Share on other sites

Im away at the moment and cannot pull the version number from the board. I suspect the dodgy board is the older revision as i have two of the newer and the capacitor is different (brand).

 

- GlobalScale didn't reply, they know of the problem though and essential admit it.

- Yes I recompiled a modified TIM/WTMI that changed the avs settings on boot and increased the voltage slightly in increments. I can supply these builds for you. I originally did this to get the latest versions of everything on the board as the armbian built versions caused random failures/lock-ups on TIM/WTMI, these versions have never caused this (suspect differences in compiler version)

Link to comment
Share on other sites

Hello! We found out that stability issue is not specific to Espressobin V7, but also for other boards with Armada 3720 SOC. And mentioned patch from Victor Gu which sets minimal voltage to 1.108V for 500 MHz and lower speeds seems to fixes this issue.

 

Marek prepared new patches for mainline kernel which include that Victor's patch and other fixes for armada 3720 cpufreq driver. See link:

https://lore.kernel.org/linux-arm-kernel/20201009125711.0176752a@kernel.org/

 

Could you please test these patches if they finally fix also your instability issues on Espressobin V7?

Link to comment
Share on other sites

Hi @Pali

 

Thank you for looking at this problem!

 

I updated my u-boot to flash-image-ddr4-1g-1cs-1000_800-2021-01-03.bin (sha256: e3a9d9605d5a9ad1ff848985c18b1ce41c2dddfffcc8f8364f2d57d833e652bb), and built your kernel like this:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git --depth 1 -b a3720-cpufreq-issues
$ export ARCH=arm64
$ export CROSS_COMPILE=aarch64-linux-gnu-
$ curl https://raw.githubusercontent.com/armbian/build/master/config/kernel/linux-mvebu64-current.config > .config
$ make oldconfig (picked default for all new)
$ make -j $(nproc)
$ cp arch/arm64/boot/Image /mnt/sdcard/boot/
$ cp arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtb /mnt/sdcard/boot/dtb/marvell/
$ make modules_install INSTALL_MOD_PATH=/mnt/sdcard/

 

This is the result:

root@espressobin:~# uname -a
Linux espressobin 5.11.0-rc1+ #1 SMP PREEMPT Sun Jan 31 23:35:17 CET 2021 aarch64 aarch64 aarch64 GNU/Linux
root@espressobin:~# cpufreq-info 
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
  driver: cpufreq-dt
  CPUs which run at the same hardware frequency: 0 1
  CPUs which need to have their frequency coordinated by software: 0 1
  maximum transition latency: 0.97 ms.
  hardware limits: 200 MHz - 1000 MHz
  available frequency steps: 200 MHz, 250 MHz, 500 MHz, 1000 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, performance, schedutil
  current policy: frequency should be within 200 MHz and 1000 MHz.
                  The governor "ondemand" may decide which speed to use
                  within this range.
  current CPU frequency is 1000 MHz (asserted by call to hardware).
  cpufreq stats: 200 MHz:34.71%, 250 MHz:13.20%, 500 MHz:6.26%, 1000 MHz:45.83%  (2015)
analyzing CPU 1:
  driver: cpufreq-dt
  CPUs which run at the same hardware frequency: 0 1
  CPUs which need to have their frequency coordinated by software: 0 1
  maximum transition latency: 0.97 ms.
  hardware limits: 200 MHz - 1000 MHz
  available frequency steps: 200 MHz, 250 MHz, 500 MHz, 1000 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, performance, schedutil
  current policy: frequency should be within 200 MHz and 1000 MHz.
                  The governor "ondemand" may decide which speed to use
                  within this range.
  current CPU frequency is 500 MHz (asserted by call to hardware).
  cpufreq stats: 200 MHz:34.71%, 250 MHz:13.20%, 500 MHz:6.26%, 1000 MHz:45.83%  (2015)
  
root@espressobin:~# 7za b

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs LE)

LE
CPU Freq:   974   997   993   996   997   997   996   997

RAM size:     983 MB,  # CPU hardware threads:   2
RAM usage:    441 MB,  # Benchmark threads:      2

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:        879   153    560    856  |      21904   199    941   1870
23:        871   154    576    888  |      21518   199    938   1863
24:        863   154    603    928  |      21147   199    935   1857
25:        857   154    634    979  |      20885   199    934   1859
----------------------------------  | ------------------------------
Avr:             154    593    913  |              199    937   1862
Tot:             176    765   1387

 

Yay! 1Ghz is finally working and is stable on Espressobin v7!

And the new u-boot seems to boot every time!

This is great - thanks! Time to push to mainline Linux, or do you want me to run more tests?

Link to comment
Share on other sites

3 minutes ago, Anders said:

Yay! 1Ghz is finally working and is stable on Espressobin v7!

And the new u-boot seems to boot every time!

This is great - thanks! Time to push to mainline Linux, or do you want me to run more tests?

 

Perfect! Thank you for testing. Would you mind replying to that mailing list email with "Tested-by: your name" line? If it is stable on your board then I think other tests are not needed. You may try to enable ondemand governor and let board running under normal conditions for a longer time if you do not see any other issues.

Link to comment
Share on other sites

On 2/1/2021 at 1:09 AM, Pali said:

 

Perfect! Thank you for testing. Would you mind replying to that mailing list email with "Tested-by: your name" line? If it is stable on your board then I think other tests are not needed. You may try to enable ondemand governor and let board running under normal conditions for a longer time if you do not see any other issues.

Done! https://lore.kernel.org/linux-arm-kernel/cf766197-666f-3d7d-3b9e-ba512619004e@gmail.com/

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines