2 2
Anders

How to make ESPRESSObin v7 stable?

Recommended Posts

OK -  kernel 4.4.52-armada-17.10.4-g719fc86-dirty  has some other known (dts) issues.

 

We need to identify the patch Armbian is missing ...

Share this post


Link to post
Share on other sites
(edited)

> We need to identify the patch Armbian is missing ...

I though that would be simple. Turns out it is not. How to build 4.4.52 is documented here http://wiki.espressobin.net/tiki-index.php?page=Build+From+Source+-+Kernel but the code is vastly different from whats in mainline Linux.

Ex, this patch sounds interesting:

commit 6640985b0697f08d5106e6c4cd66dc61ec5e6a2d
Author: Victor Gu <xigu@marvell.com>
Date:   Wed Sep 20 10:00:49 2017 +0800

    fix: regulator: armada-37xx: overwrite CPU voltage values in 1000MHZ
    
    The original CPU voltage values from load 1 to load 3 are too low for
    EspressoBin board with Armada-37xx SoC when CPU is 1000MHZ, which leads
    to instability that CPU gets stuck soon during dynamic voltage scaling.
    In order to fix this issue, this patch adds the compatible string for
    EspressoBin AVS, and update the CPU voltage values from load 1 to load 3
    in 1000MHZ mode accordingly, the value is updated from original 1.05v
    to 1.108v.
    
    Change-Id: Iae22cb3bb243b3345e7426e859313139637f09e7
    Signed-off-by: Victor Gu <xigu@marvell.com>

diff --git a/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt b/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt
index 7ed7a619..5a853dd6 100644
--- a/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt
@@ -1,7 +1,7 @@
 Marvell Armada 3700 voltage regulator by AVS
 
 Required properties:
-- compatible: must be "marvell,armada-3700-avs"
+- compatible: must be "marvell,armada-3700-avs" or "marvell,armada-3700-espressobin-avs"
 - reg: avs register address, which is used to control CPU voltage
 - clocks: CPU core clock to get the MAX CPU frequency
 - any required generic properties defined in regulator.txt
diff --git a/drivers/regulator/armada-37xx-regulator.c b/drivers/regulator/armada-37xx-regulator.c
index bd3b9504..1185f6a5 100644
--- a/drivers/regulator/armada-37xx-regulator.c
+++ b/drivers/regulator/armada-37xx-regulator.c
@@ -274,6 +274,7 @@ static int armada_3700_avs_probe(struct platform_device *pdev)
        struct regulator_config config = { };
        struct regulator_dev *rdev;
        u32 max_cpu_freq;
+       int idx;
 
        avs = devm_kzalloc(&pdev->dev, sizeof(*avs), GFP_KERNEL);
        if (!avs) {
@@ -315,6 +316,13 @@ static int armada_3700_avs_probe(struct platform_device *pdev)
                avs->freq_level = CPU_FREQ_LEVEL_800MHZ;
        } else if (max_cpu_freq == CPU_FREQ_1000MHZ) {
                avs->freq_level = CPU_FREQ_LEVEL_1000MHZ;
+               /*
+                * Overwrite the VDD values from load 1 to load 3 in 1000MHZ
+                * for EspressoBin, otherwize the CPU gets stuck.
+                */
+               if (of_device_is_compatible(np, "marvell,armada-3700-espressobin-avs"))
+                       for (idx = VDD_SET1; idx <= VDD_SET3; idx++)
+                               voltage_m_tbl[avs->freq_level][idx] = 1108;
        } else if (max_cpu_freq == CPU_FREQ_1200MHZ) {
                avs->freq_level = CPU_FREQ_LEVEL_1200MHZ;
        } else {
@@ -399,6 +407,7 @@ static const struct dev_pm_ops armada_3700_avs_pm_ops = {
 
 static const struct of_device_id armada3700_avs_of_match[] = {
        { .compatible = "marvell,armada-3700-avs", },
+       { .compatible = "marvell,armada-3700-espressobin-avs", },
        {}
 };

But the file drivers/regulator/armada-37xx-regulator.c is nowhere to be found in mainline or espressobin kernels.

 

 

Also I just tried building Linux 5.2.0-rc4-next-20190614, which runs fine, and reports 1GHz but 7zip still reports 800MHz.

Edited by Anders
Remove irelevant bug. Linux 5.2.0-rc4-next-20190614.

Share this post


Link to post
Share on other sites
39 minutes ago, Anders said:

Also I just tried building Linux 5.2.0-rc2-next-20190531, which fails on boot:

 

Did you try this with the 800_800 or 600_600 bootloader  ?

Share this post


Link to post
Share on other sites
(edited)
1 hour ago, ebin-dev said:

Did you try this with the 800_800 or 600_600 bootloader  ?

With 800_800 it works fine so far. Some strange messages in dmesg though: http://ix.io/1KDu

They may be actual bugs in linux-next though (including the "Kernel stack is corrupted in: write_irq_affinity" error) https://lkml.org/lkml/2019/5/31/761

 

Do you want me to test 600_600 as well?

Edited by Anders
Add link to mailing list

Share this post


Link to post
Share on other sites
1 hour ago, Anders said:

With 800_800 it works fine so far. Some strange messages in dmesg though: http://ix.io/1KDu

 

Do you want me to test 600_600 as well? 

 

No thanks - that's fine so far.

Share this post


Link to post
Share on other sites
On 6/1/2019 at 4:25 PM, ebin-dev said:

 

No thanks - that's fine so far.

 

If you have some time for further tests on your V7 Espresso you could switch to kernel 4.14.79 (it is the current kernel provided by Marvell, cpufreq patch is not applied): in ‘armbian-config‘ switch to stable builds (select System->Stable) and if the process is finished, switch from next to other kernels (System->Other kernels-> 4.14.79). 

 

Edit: Marvells kernel 4.14.79 is an option - it ist stable at least on V5 EspressoBins with bootloader 1000_800. But the real CPU frequency reported by 7zip is also only 800MHz instead of 1000MHz (same behaviour as kernel 4.19.4x and 5.x)

Share this post


Link to post
Share on other sites

Here's an update:

I've been running Armbian with nightly kernels, (currently on 4.19.50-mvebu64) and flash-image-ddr4-1g-1cs-800_800-2019-05-21.bin uboot entirely stable for some time now. The only two remaining problems being:

1) U-boot sometimes get stuck. I just reset the board until it boots.

2) With flash-image-ddr4-1g-1cs-1000_800-2019-05-21.bin I'm not getting 1GHz. (Also tested with 5.2.0-rc4-next-20190614).

 

To help address the last problem, I just posted in the Espressobin forums: http://espressobin.net/forums/topic/mainline-linux-not-running-at-1ghz/

 

Hopefully someone is able to reach out to Marvell and make them fix the bug.

 

Thanks for all the help so far :)

Share this post


Link to post
Share on other sites

Just to let you know: I sent back the V7, got a refund (I had it ordered at Amazon, so it wasn't an issue) and ordered a V5 directly at Globalscale, which arrived quickly. It's running on conservative 800MHz Armbian for now (I am more interested in stability than excessive speed) and seems stable (uptime 37h by now), giving me good speeds over GBitEthernet of almost 80MB/s with an old SATA HDD. So I guess this will be the prototype for my Simple Home Server start-up!

Share this post


Link to post
Share on other sites

So.. guys, i see here that you also have a problem of boot getting stuck very early in the process.


my v7 board is stuck here

TIM-1.0
WTMI-devel-18.12.1-e6bb176
WTMI: system early-init
SVC REV: 5, CPU VDD voltage: 1.155V

 

more often than not (i'd say s a 70% chance of getting stuck here). Once it boots it is relatively stable with Buster.

How did you solve this issue?

 

Also, i do have the same problem of CPU not working higher than 800MHz.

Share this post


Link to post
Share on other sites

Not sure if this is related, but the Netgate team is seeing an unusually high instability failure percentage on the SG1100, which is a branded Espressobin.  They say this is related to the power system on the board I think, and caused by a supplier component problem. I imagine it is possible the same supply chain issue has impacted the Espressobin. 

 

https://forum.netgate.com/topic/144636/sg-1100-intermittent-reboots

Share this post


Link to post
Share on other sites

I've been looking at something relating to this, recompiling and modifying atf/A3700-marvell-utils to build a viable u-boot. From what I've figure out already, it seems the voltage (VDD) applied by marvell is too low by default.

 

Does anyone have any visibility on where the modification to ddr4-*cs-/*g.txt from http://wiki.espressobin.net/tiki-download_file.php?fileId=216 came from? namely the following line:

+;Step9: DDRPHY Driver/Receiver & DQS internal Pullup/Pulldown settings
+;WRITE: 0xC0001004 0xD0133449

+WRITE: 0xC0001004 0xD0677449

 

That is the only difference between the Espressobin provided ddr init and the mainline Marvell A3700 ddr init code.

At the moment I'm running an Espressobin v7 1000/800 using WTMI-devel-18.12.1-e6bb176 / 2018.03-devel-18.12.3-gc9aa92ce70 with a modification to bump the AVS voltage from 1.032V to 1.155V by default. This is looking to be more stable currently..

 

If anyone wants the resulting builds let me know. I cannot use the latest https://dl.armbian.com/espressobin/u-boot/ provided versions of WTMI-devel-18.12.1-e6bb176 since it randomly fails to boot. I rebuilt and enabled debugging and so far no stalls.

 

Share this post


Link to post
Share on other sites
On 7/23/2019 at 9:21 PM, JDL said:

Not sure if this is related, but the Netgate team is seeing an unusually high instability failure percentage on the SG1100, which is a branded Espressobin.  They say this is related to the power system on the board I think, and caused by a supplier component problem. I imagine it is possible the same supply chain issue has impacted the Espressobin. 

 

https://forum.netgate.com/topic/144636/sg-1100-intermittent-reboots

 

Share this post


Link to post
Share on other sites

My V7 came in working fine  Then over time i hit these ranadom reboots

 

Tried slower bootloaders and latest armbian with no luck/

 

One interesting thing i found is after not touching it for several months, i t started up and worked ok.  I got it to boot, go into the OS and then after a few mins it reset again. The time its stayed up seemed to be growing shorter and shorter.

 

I left it unplugged over night and tried powering it up in the morning and it will not even get through INIT sequence to a console prompt.

 

470uf/16V SEEMS fine (no evidence of leaking)

 

Any ideas?

 

[  OK  ] Started Create Volatile Files and Directories.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Reached target System Time Synchronized.
[  OK  ] Started Entropy daemon using the HAVEGE algorithm.
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Found device /dev/ttyMV0.
[  OK  ] Started Raise network interfaces.
[  OK  ] St�TIM-1.0
WTMI-devel-18.12.1-e6bb176
WTMI: system early-init
SVC REV: 5, CPU VDD voltage: 1.050V
NOTICE:  Booting Trusted Firmware
NOTICE:  BL1: v1.5(release):1f8ca7e (Marvell-devel-18.12.2)
NOTICE:  BL1: Built : 16:25:52, May 21 2019
NOTICE:  BL1: Booting BL2
NOTICE:  BL2: v1.5(release):1f8ca7e (Marvell-devel-18.12.2)
NOTICE:  BL2: Built : 16:25:53, May 21 2019
NOTICE:  BL1: Booting BL31
NOTICE:  BL31: v1.5(release):1f8ca7e (Marvell-devel-18.12.2)
NOTICE:  BL31: Built : 16:25:56, May

U-Boot 2018.03-devel-18.12.3-gc9aa92c-armbian (Feb 20 2019 - 09:45:04 +0100)

Model: Marvell Armada 3720 Community Board ESPRESSOBin
       CPU     600 [MHz]
       L2      800 [MHz]
       TClock  200 [MHz]
       DDR     600 [MHz]
DRAM:  1 GiB
Comphy chip #0:
Comphy-0: USB3          5 Gbps    
Comphy-1: PEX0          2.5 Gbps  
Comphy-2: SATA0         6 Gbps    
SATA link 0 timeout.
AHCI 0001.0300 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
flags: ncq led only pmp fbss pio slum part sxs
PCIE-0: Link down
MMC:   sdhci@d0000: 0, sdhci@d8000: 1
Loading Environment from SPI Flash... SF: Detected mx25u3235f with page size 256 Bytes, erase size 64 KiB, total 4 MiB
OK
Model: Marvell Armada 3720 Community Board ESPRESSOBin
Net:   eth0: neta@30000 [PRIME]
Hit any key to stop autoboot:  0
starting USB...
USB0:   Register 2000104 NbrPorts 2
Starting the controller
USB XHCI 1.00

 

Share this post


Link to post
Share on other sites
Quote

One interesting thing i found is after not touching it for several months, i t started up and worked ok.  I got it to boot, go into the OS and then after a few mins it reset again. The time its stayed up seemed to be growing shorter and shorter.

 

I left it unplugged over night and tried powering it up in the morning and it will not even get through INIT sequence to a console prompt.

Thats the exact problem I was having. The capacitor only leaks under certain conditions, and by leak I don't mean leak the electrolytic fluid or bulge, but leak V through to GND when the temperature changes.

Yes I noticed the same thing occurring to me, over time the problem would get worse, presumably as the capacitor degraded further.

 

I would desolder the capacitor and replace with alike, I wouldn't mind betting that would fix your board.

 

I've informed GlobalScale how I fixed it, no response from them confirming the defective component, I don't expect them to confirm it either.

Share this post


Link to post
Share on other sites
10 hours ago, mu-b said:

Thats the exact problem I was having. The capacitor only leaks under certain conditions, and by leak I don't mean leak the electrolytic fluid or bulge, but leak V through to GND when the temperature changes.

 

I tried to place the bin in the fridge to bring the overall temperature down and then try boot it up.  This did not seem to help at all and the reboot happened before even the kernel fully initialized.

 

Im worried this is not my issue, i wish there was a way to confirm it.

 

But Im willing to try to replace the capacitor but it seems to be a not-so-easy replacement. Any tips?

 

 

Share this post


Link to post
Share on other sites
Quote

I tried to place the bin in the fridge to bring the overall temperature down and then try boot it up.  This did not seem to help at all and the reboot happened before even the kernel fully initialized.

Yes I had a fan on the device and it would cause it to perform the reboot continuously and more frequently (often before the device had a chance to complete even TIM/WTMI/u-boot, which rules out the Linux kernel).

 

Its not that difficult, if you have any experience with a soldering iron it'll take 10-15 minutes going slowly, the hardest part is desoldering the original component. In this case, youtube is your friend, plenty of tips on 'decapping'.

 

On another note, the device completed 2 threads of memtester 400 128 alongside stress --cpu 2 and has now been up for 3 days without a hiccup.

Share this post


Link to post
Share on other sites
On 10/7/2019 at 4:04 AM, mu-b said:

Yes I had a fan on the device and it would cause it to perform the reboot continuously and more frequently (often before the device had a chance to complete even TIM/WTMI/u-boot, which rules out the Linux kernel).

 

Its not that difficult, if you have any experience with a soldering iron it'll take 10-15 minutes going slowly, the hardest part is desoldering the original component. In this case, youtube is your friend, plenty of tips on 'decapping'.

 

On another note, the device completed 2 threads of memtester 400 128 alongside stress --cpu 2 and has now been up for 3 days without a hiccup.

So im waiting for the correct capacitor to arrive (470uF 16V 108C) but i tried booting the board up without it (since it was really just a filter anyway) and same thing happend.

 

Tried also with a 470uF 25V 85C jammed in and it still rebooted

 

So it doesn't look like this will work :/

EC1 is off the 12v rail left most column. 

image.png.02d3f30d5d75de95a50bc939f22166ab.png

 

Share this post


Link to post
Share on other sites

Something else that kinda seems interesting....

 

im sitting in a MARVEL>> uboot prompt and its not rebooting.

 

its been an hour now.

 

 

Share this post


Link to post
Share on other sites

Yes it seems instability is still there, albeit it was stable for 4 days straight.

 

I'm still asking GlobalScale for their explanation of the faulty part. With any luck they will respond. I can't see why they wouldn't since the device is a 'community board' and thus they rely on the community for support so it would be rather hypocritical to refuse to disclose knowledge relating to the hardware to the community.

Share this post


Link to post
Share on other sites

What version number do you have

 

Its the little text at the bottom of the case on the right hand side between the two barcodes

 

i seen two 0702 and 0710

 

0702 had the bottom as one big heatsink

 

0710 has 3 small heatsinks and a bottom filled with holes

 

Also seems the capacitor is different in both.

 

I also noticed that the board does not stay powered when only the USB console cable is plugged in (V5 and earlier V7 would both show power  but not actually boot. )

 

so they must have re-done some power stuff on the new board!

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
2 2