17 17
lanefu

Espressobin support development efforts

Recommended Posts

Built! Installed! Port order flipped! That was easy!

 

I'm leaving it running to make sure I don't see any freezes or kernel panics.

Share this post


Link to post
Share on other sites
21 hours ago, ebin-dev said:

So - I can't confirm that the patched kernel would work without issues on a V5_0_1 EspressoBin with the current boot loader.

 

@FlashBurn @ebin-dev

 

Should we consider the patch PR a WIP and leave open until the v5 stability is figured out?

Share this post


Link to post
Share on other sites
18 hours ago, lanefu said:

Should we consider the patch PR a WIP and leave open until the v5 stability is figured out?

 

Since the patch is not working for all EspressoBins I would suggest to consider it a WIP or to delete it. Marvell already allocated resources in order to solve the issue. 

 

Edit: I have tested patched kernel 4.19.20 on three V5 EspressoBins - one used and two new ones. All of them produce the same issues (i.e. a kernel panic "Internal error: synchronous parity or ECC error" with bootloader 1000_800). With this patch applied we risk to lose the installed base of V3-V5 EspressoBins. It is simply not enough to just change parts of the cpufreq code - the problem has a wider scope and the bootloader probably needs some adjustments too.

Share this post


Link to post
Share on other sites

The patches which where sent to the kernel mailing list are the same as the one in my pull request.

 

As for me the newer kernels also do not work (kernels without my patch) is this not a problem of the patches, but of something which is wrong in the kernel.

 

Edit::

 

Some more notes regarding the pull request.

 

Situation now:

 

Firmware running 1000 MHz, real cpu frequency is 800 MHz.

 

Situation with the patch from the pull request:

 

Firmware running 1000 MHz, real cpu frequency is 1000 MHz.

 

So if the board does not run with 1000 MHz, but it did before, just flash the firmware with 800 MHz cpu frequency and it has the same performance like before, but everyone can run the board at the right cpu frequency.

 

Just to be clear, I don´t see the problem why not applying the pull request. The pull request fixes 2 bugs and now things work as expected, but it also maybe does show that some boards are not running up to their specs.

Share this post


Link to post
Share on other sites

I built kernel v4.19.20 as described above and it seems to be running stably on my (previously known stable) v5 and v7 hardware at 1GHz.

The only change to "patches" that I made was the switch of wan and lan1 (which, I think,  had been backed out of next).

The v7 (my main focus) has been up over 48 hours the v5 less than 12 hours, but I'm leaving it running and will update tomorrow.

 

Share this post


Link to post
Share on other sites

@spqr

 

I have to be picky, but your board does not run at 1 GHz, but 800 MHz and that is why it is stable ;) We still don't know if our boards are running stable with 1 GHz without fixing the cpu scaling bug. But as your board boots justs fine at 1 GHz your board is doing better than my V7 board.

Share this post


Link to post
Share on other sites

Sorry, I didn't know this. I assumed that because I used the u-boot 1000-800 (ddr3 1 GB RAM 2 chip) for the v5 I was running at 1GHz.

I'm really only messing with the v5 as a "lab experiment" because I've been ordering v7 for my application.

(FYI The v5 I have has no been up well over 24 hours on the kernel I built with no panics or freezes.)

Stability on v7 has been fairly elusive and I don't know if that was just bad luck or some actual hardware issue that isn't being properly handled.

I'm awaiting answers from Globalscale but continuing to make (software) progress on my project with the v5 and three working v7 units I have.

Share this post


Link to post
Share on other sites

Um... wait, really I'm not running at 1GHz? This is my v5.

 

root@espressobin:/var/log# tail armbian-hardware-monitor.log                    
                name: SS08G                                                     
                                                                                
### Boot system health:                                                         
                                                                                
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU                     
20:11:06: 1000MHz  1.18  78%  37%  38%   0%   2%   0% -75000°C                  
20:11:07: 1000MHz  1.16  60%   7%  50%   1%   1%   0% -75000°C                  
20:11:07: 1000MHz  1.16  60%   7%  50%   3%   0%   0% -75000°C                  
20:11:07: 1000MHz  1.16  67%   9%  56%   1%   0%   0% -75000°C                  
20:11:08: 1000MHz  1.16  93%  12%  79%   2%   0%   0% -75000°C                  

Share this post


Link to post
Share on other sites

Hey this is just kind of a general question..    i have a v4 board does it have an additional set of irregularities that i need to worry about.     Like are the assumptions for v5 that i cant make for v4?

Share this post


Link to post
Share on other sites

Just my 2-cents on the ethernet port ordering issue...

Even though boards prior to v7 had the port naming flipped, it seems more "industry standard" to have the wan port on the left.

Now that globalscale is shipping cases and the metal version has WAN screen printed on the left one (which is also separated off from the

other two) would it really offend any v5 users if the order changes?

I think it would be good to just commit that change unless some v5 users really feel strongly about it...

 

Share this post


Link to post
Share on other sites
1 hour ago, spqr said:

Now that globalscale is shipping cases and the metal version has WAN screen printed on the left one (which is also separated off from the

 

Continuity is a fantastic justification.  One ask..  come up with easy to understand paragraph to explain the difference that we can add to the board download page, etc

Share this post


Link to post
Share on other sites

I just gave a hunch a try and this is the result running sbc-bench stable at 1000 MHz:

 

sbc-bench.sh log output

 

If anyone wants to test this kernel I could upload it to dropbox for everyone to download.

 

As my board was reacting kind of unpredictable, my assumption was that there is a hardware problem with stability, but only at 1000 MHz. So my guess was that the core voltage is to blame, but as the board does run fine at 1000 MHz, it has to be a problem of the core voltage at lower frequencies. So I made a kernel without activating the AVS (automatic voltage scaling). This seems to solve the stability problems on my board. Next step would be to try to let the AVS not use so low voltages, but at the moment I don´t have the time to try this out. I will tell these findings to the linux kernel developer of the cpufreq driver and we will see if he can fix it.

 

Share this post


Link to post
Share on other sites

Interesting.  Is the kernel developer connected with Globalscale or Marvell?

I returned my batch of failing units to Globalscale and they have reproduced and are looking into the problem.

It could be completely unrelated as well... but maybe not.

Share this post


Link to post
Share on other sites

As far as I know, Bootlin is the company which writes the kernel code for Marvell.

 

Interessting that they can reproduce the problem, so maybe this gets more attention.

Share this post


Link to post
Share on other sites

There is a patch for AVS on the espressobin site.

I'm not sure if this was somehow missed in the ubuntu and other builds. this patch was specific for v7 of the board.

found here: http://wiki.espressobin.net/tiki-index.php?page=Build+From+Source+-+Kernel#Kernel_version_4.14

in the zip file...

I'm going to build/test in Armbian.

 

From 0bb9aab4c7706139a8d18f67a07c462fcfcf7327 Mon Sep 17 00:00:00 2001
From: Victor Gu <xigu@marvell.com>
Date: Wed, 20 Sep 2017 10:00:49 +0800
Subject: [PATCH 3/8] fix: regulator: armada-37xx: overwrite CPU voltage values
 in 1000MHZ

The original CPU voltage values from load 1 to load 3 are too low for
EspressoBin board with Armada-37xx SoC when CPU is 1000MHZ, which leads
to instability that CPU gets stuck soon during dynamic voltage scaling.
In order to fix this issue, this patch adds the compatible string for
EspressoBin AVS, and update the CPU voltage values from load 1 to load 3
in 1000MHZ mode accordingly, the value is updated from original 1.05v
to 1.108v.

Change-Id: Iae22cb3bb243b3345e7426e859313139637f09e7
Signed-off-by: Victor Gu <xigu@marvell.com>
---
 .../devicetree/bindings/regulator/armada3700-regulator.txt       | 2 +-
 drivers/regulator/armada-37xx-regulator.c                        | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt b/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt
index 7ed7a61..5a853dd 100644
--- a/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/armada3700-regulator.txt
@@ -1,7 +1,7 @@
 Marvell Armada 3700 voltage regulator by AVS
 
 Required properties:
-- compatible: must be "marvell,armada-3700-avs"
+- compatible: must be "marvell,armada-3700-avs" or "marvell,armada-3700-espressobin-avs"
 - reg: avs register address, which is used to control CPU voltage
 - clocks: CPU core clock to get the MAX CPU frequency
 - any required generic properties defined in regulator.txt
diff --git a/drivers/regulator/armada-37xx-regulator.c b/drivers/regulator/armada-37xx-regulator.c
index bd3b950..1185f6a 100644
--- a/drivers/regulator/armada-37xx-regulator.c
+++ b/drivers/regulator/armada-37xx-regulator.c
@@ -274,6 +274,7 @@ static int armada_3700_avs_probe(struct platform_device *pdev)
     struct regulator_config config = { };
     struct regulator_dev *rdev;
     u32 max_cpu_freq;
+    int idx;
 
     avs = devm_kzalloc(&pdev->dev, sizeof(*avs), GFP_KERNEL);
     if (!avs) {
@@ -315,6 +316,13 @@ static int armada_3700_avs_probe(struct platform_device *pdev)
         avs->freq_level = CPU_FREQ_LEVEL_800MHZ;
     } else if (max_cpu_freq == CPU_FREQ_1000MHZ) {
         avs->freq_level = CPU_FREQ_LEVEL_1000MHZ;
+        /*
+         * Overwrite the VDD values from load 1 to load 3 in 1000MHZ
+         * for EspressoBin, otherwize the CPU gets stuck.
+         */
+        if (of_device_is_compatible(np, "marvell,armada-3700-espressobin-avs"))
+            for (idx = VDD_SET1; idx <= VDD_SET3; idx++)
+                voltage_m_tbl[avs->freq_level][idx] = 1108;
     } else if (max_cpu_freq == CPU_FREQ_1200MHZ) {
         avs->freq_level = CPU_FREQ_LEVEL_1200MHZ;
     } else {
@@ -399,6 +407,7 @@ static const struct dev_pm_ops armada_3700_avs_pm_ops = {
 
 static const struct of_device_id armada3700_avs_of_match[] = {
     { .compatible = "marvell,armada-3700-avs", },
+    { .compatible = "marvell,armada-3700-espressobin-avs", },
     {}
 };
 
--
1.9.1

Share this post


Link to post
Share on other sites

This file does not exist anymore with a current kernel, but the code got merged with the cpufreq driver.

 

But nice finding, because this particular patch is missing or somewhere where I did not find it.

Share this post


Link to post
Share on other sites

I was just hunting around and could find no *regulator.c file for Marvell and was wondering what happened. :-)

I've been running armbian on one of the three "stable" v7 devices I had and it is no longer "stable."

I'm not getting kernel panics, but it sporadically reboots... I suspect it may be this same cause.

My application for the espressobin really requires long term stable operation, so I hope we can get to the bottom of this.

Can you post where you apply this patch so I can also test it with you?

 

Share this post


Link to post
Share on other sites

I found the avs tables etc. over in armada-37xx-cpufreq.c where you mentioned.

If I understand the patch correctly, it just sets level 1-3 to 1108 while leaving level 4 whatever it was.

(basically making 1108 the lowest voltage ever goes for freq 1000MHz)

I wonder if Marvel has anything in the chip errata about this or if they will be fixing it?

 

How does this fit in with the 1000MHz is really 800MHz that you explained to me above?

Is the code actually executing the voltage selection for 1000MHz even though the chip is really running at 800?

 

Share this post


Link to post
Share on other sites

It seems that in the current code, the simplest patch would be to simply change the value of  MIN_VOLT_MV to 1108.

Is that too "brute force?"

Share this post


Link to post
Share on other sites
16 minutes ago, spqr said:

It seems that in the current code, the simplest patch would be to simply change the value of  MIN_VOLT_MV to 1108.

Is that too "brute force?"


what are the risks?

Share this post


Link to post
Share on other sites

Only side effect would be slightly increased power consumption at idle, for other clock frequencies. I don't have a sense for how many non 1GHz espressobins are out in the world.

As it stands many v7 boards kernel panic, freeze or reboot with all kernels missing the equivalent of this patch.

I have v5 and v7 and all are 1GHz (though if you read above, apparently it really is 800 MHz internally and I'm waiting for FlashBurn to confirm the cpufreq code executed the 1000 MHz code path.

Share this post


Link to post
Share on other sites
27 minutes ago, spqr said:

Only side effect would be slightly increased power consumption at idle, for other clock frequencies. I don't have a sense for how many non 1GHz espressobins are out in the world.

As it stands many v7 boards kernel panic, freeze or reboot with all kernels missing the equivalent of this patch.

I have v5 and v7 and all are 1GHz (though if you read above, apparently it really is 800 MHz internally and I'm waiting for FlashBurn to confirm the cpufreq code executed the 1000 MHz code path.

 

I've got a v4 and a v5.. and i've often ran 800 cuz i just don't know what I'm supposed to run....

is there any reason i shouldnt use 1000,800... i barely understand the clocking stuff here..

Share this post


Link to post
Share on other sites

I'm fuzzy on it myself. I always thought my boards were running at 1GHz until FlashBurn explained otherwise.

I hope he will chime in soon on this point.

Share this post


Link to post
Share on other sites

Here it comes ;)

 

It works like this, you have a base frequency of e.g. 1000MHz and you have 4 load levels. For every load level you can define a divider and a core voltage.

 

Base frequency divider real frequency core voltage
1000 MHz 1 1000 MHz 1.10 V
1000 MHz 2 500 MHz 1.05 V
1000 MHz 4 250 MHz 1.00 V
1000 MHz 5 200 MHz 0.95 V

 

Without the patches for the cpu freq driver the kernel thinks it uses a base frequency of 1000 MHz, but in reality is looks like this:

 

Base frequency divider real frequency core voltage
800 MHz 1 800 MHz 1.10 V
800 MHz 2 400 MHz 1.05 V
800 MHz 4 200 MHz 1.00 V
800 MHz 5 160 MHz 0.95 V

 

As the cpu is running with a lower frequency in reality it also is not a problem that the core voltage is not high enough for the lower load levels.

 

The problem is not the core voltage of the highest load level, but the lower load levels (which one precisely I don´t know).

 

I hope I could explain it to you, if something is still not clear, just ask.

Share this post


Link to post
Share on other sites

Thanks!

I tried the patch of MIN_VOLT_MV overnight and it didn't address the spontaneous rebboting that I'm seeing on one of my v7 boards.

Thanks to your description above I see that setting MV to 1108 is effectively the same as disabling AVS anyway. :-)

The beauty of the convoluted code was that I couldn't tell what the unchanged levels were. ;-)

Sadly, I have another unstable v7 board on my hands. :-(

 

UPDATE... actually it looks like when I went to install the updated kernel deb that the process was interrupted by... a spontaneous reboot.

So, it actaully hadn't installed my new kernel with the voltage patched. I have it installed now running. We will see if this fixes the reboot problem.

 

UPDATE-2... up over 5 hours with the MV patch, stay tuned.

 

UPDATE-3... no go, rebooted after 8:49 :-(

 

This appears to have nothing to do with the voltage.

 

I really don't get it because this unit stayed running for weeks when I first got it.

Meanwhile my v5 unit with the same build of Armbian has been up 26 days.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
17 17