Jump to content

[Fixed] Clearfog[Pro] Huge timedrift between hw and sysclock


Heisath

Recommended Posts

Hi,

 

I have a problem with my ClearfogPro, it worked fine for some time but now theres something wrong with the system clock and I am unsure wether this is caused by the hardware or software.

 

I set the time with the date command, then copy it to the rtc with hwclock --systohc...

 

After a few moments the times in sys/hw are already a few seconds apart. The system clock seems to be about 1/3 second to slow....

 

So after a few minutes its completely off, this causes ntp to stop working ofc.

I tried fixing this by using adjtimex but the setable value range ist not large enough...

 

This is the output of timedatectl ca 5 seconds after doing hwclock --hctosys 

Local time: Tue 2018-02-06 19:52:37 CET
Universal time: Tue 2018-02-06 18:52:37 UTC
RTC time: Tue 2018-02-06 18:52:44

The RTC is pretty accurate btw. But the system time keeps lagging behind..

 

I also tried a clean armbian image,  but the problem persists.

 

For the moment I fixed this by using crontab to copy the hw time to systime every 5minutes but this can't be the solution.

 

Is this a bug in the current builds oder rather a hardware problem?

 

 

Greetings,

count-doku

 

Link to comment
Share on other sites

37 minutes ago, count-doku said:

 

Is this a bug in the current builds oder rather a hardware problem?


An unknown problem at this moment.  Have you tried on both kernels? 4.4.y and 4.14.y?

Link to comment
Share on other sites

Hi,

 

thanks for the fast reply. I have tested with clean images from https://dl.armbian.com/clearfogpro/archive/ and can confirm that it is a bug in the "stretch next" 4.14.(14) Kernel.

The "stretch default" image 4.4.112 contains a working systemclock.

 

Whew am I relieved that my board aint broken.

 

Is it possible to fix this in the next kernel?

 

Edit: I will try to build the 4.15.y kernel and update. Check if this already fixes the issue

Edit Edit: Updated to 4.14.17 problem still persist. Couldn't get a build with 4.15.y

 

Greetings,

count-doku

Link to comment
Share on other sites

6 hours ago, count-doku said:

Is it possible to fix this in the next kernel?


This is yet another issue on our big list of issues. The first stage of solving is recognizing the issue (checked), then it has to be replicated on our board, checked if it is an upstream problem or hw related ... 

 

I am not sure if/can't promise it will be solved in a next release. 

 

Link to comment
Share on other sites

Oh I did not expect you to fix this in the next kernel. I am already amazed that the support for armbian is so much better than from the solid run guys.

Take your time!

 

For reproducing the bug, just take your images:

https://dl.armbian.com/clearfogpro/archive/Armbian_5.38_Clearfogpro_Debian_stretch_default_4.4.112.7z

https://dl.armbian.com/clearfogpro/archive/Armbian_5.38_Clearfogpro_Debian_stretch_next_4.14.14.7z

 

burn them onto an sd card an try:

:> date --set "07 FEB 10:11:13"
:> hwclock --systohc
:> timedatectl 

Repeat the timedatectl after a minute or so and you can already see the drift...

 

 

Is it possible to tell the armbian ./compile.sh to use the new 4.15.y Kernel? Then I would try this myself.

Yesterday it selected 4.14.y for next and dev options.

 

Greetings,

count-doku

 

Link to comment
Share on other sites

21 minutes ago, count-doku said:

Is it possible to tell the armbian ./compile.sh to use the new 4.15.y Kernel? Then I would try this myself.

Should be possible if you delete or disable additional wireless patches and set the branch to 4.15 manually.

 

BTW, looks like starting with 4.15 it will be possible to manually calibrate the RTC on Armada 38x, though I'm assuming that this is for HW related clock drifting, so if it works fine in 4.4.x this may not be related.

Link to comment
Share on other sites

I've just reread the first post, if RTC time is correct but the system time is not, then maybe this is related, the A388 SoC datasheed needs to be checked to see if it has SSCG too.

 

Edit:

 

Quote

This series allows to correct the affected clock when the SSCG is enabled. This drift can happen on all the mvebu SoC on the cpu clock block (ie cpu, ddr and l2 cache). Currently the only notable effect is for the Armada 370 because this SoC use the l2cache clock as source for the timer. That's why even if the series allow any of the mvebu SoC to benefit to this correction, Armada 370 is the only user of it.

... and just recently (in 4.14) Cortex-A9 global timer support was added for A38x, so it may be related.

Link to comment
Share on other sites

I read the thread posted by zador and it seems like this is exactly the problem.

 

After checking the A38x Marvell Fu-Spec. I would say the A388 also has SSCG.

See the A38x-Functional-Spec-PU0A.pdf at 

https://marvellcorp.wufoo.com/forms/marvell-armada-38x-functional-specifications/

(or reupped  https://jannis.imserv.org/nextcloud/index.php/s/y39mH3eA6D4jTRI )

 

On page 65 it says that there is SSCG functionality.

 

 

Link to comment
Share on other sites

I did some tests and looks like our forward-ported DFS/cpufreq code affects the system timer (which is not that surprising since looks like the CPU and global timer have the same parent clock or at least the same clock controller).

Since DFS doesn't help much with the temperature and there are only 2 OPPs anyway it can be disabled, I'll push the relevant changes soon.

Link to comment
Share on other sites

Morning,

 

I just tested your changes zador, and the time is working properly once again! Now also ntp is synchronising etc...

 

Many thanks!

 

One thing though, because with cpufreq ondemand my clearfog was running at 666MHz at about 90% of the time... So now the consume would have about doubled. 

As "The dynamic power (switching power) dissipated per unit of time by a chip is C·V²·A·f, where C is the capacitance being switched per clock cycle, V is voltage, A is the Activity Factor[3] indicating the average number of switching events undergone by the transistors in the chip (as a unitless quantity) and f is the switching frequency." (Wikipedia)

 

 

So now I have to decide if I'd rather use ntpdate + hwclock  every few minutes or let the clearfog run at full speed the hole time :/ 

If I were to re-enable dfs, I only would have to take the files out from the disable_dfs folder and copy them one layer higher, correct?

https://github.com/armbian/build/tree/master/patch/kernel/mvebu-next

 

 

Greetings,

count-doku

 

Btw. just donated, grab yourself a beer, awesome work!

 

 

 

Link to comment
Share on other sites

1 minute ago, count-doku said:

If I were to re-enable dfs, I only would have to take the files out from the disable_dfs folder and copy them one layer higher, correct?

https://github.com/armbian/build/tree/master/patch/kernel/mvebu-next

Yes. Though I didn't try disabling the A9 timer to fix the drifting - without it system may use a different, more stable clock source.

 

1 minute ago, count-doku said:

So now the consume would have about doubled. 

Consumption by the CPU cores in idle most likely will be doubled (+/- due to different effects like higher leakage currents at a higher chip temperature), but other consumers (i.e. DRAM and DRAM controller, Ethernet MACs and PHYs, CESA, USB, SATA and PCIe controllers) will still consume the same amount of power.

Link to comment
Share on other sites

4 minutes ago, count-doku said:

Do you plan to really fix the drifting in a future release or just wait for an upstream fix?

Since DFS never reached upstream unfortunately it's unlikely to get fixed, and if the problem is indeed related to the A9 global timer I'm not sure if it can be easily fixed at all (without disabling the timer).

Link to comment
Share on other sites

Ah ok now I understand. Unfortunately I don't remember checking for time drifting before 4.14...

 

If you provide some patch / version for testing, I will gladly try it out.

 

EDIT:

Just compared clearfogpro power consume in my configuration as router / nas. With the old patch requiered about 15W in Standby(switching etc on, hdds turned off, 1 mPCIE SATA Card, 1 Wifi Card) and now about 20W. 

So it increased quite a bit. And unfortunately most of the time it is in this state -> having dfs working would decrease consumation to 3/4... 

Link to comment
Share on other sites

6 hours ago, count-doku said:

If you provide some patch / version for testing, I will gladly try it out.

Basically reverting this (just deleting the relevant DT node is enough)

 

Made some tests:

with DFS and with the global timer

[    0.000025] Switching to timer-based delay loop, resolution 1ns
[    3.121206] clocksource: Switched to clocksource arm_global_timer

observed the system time drift as expected

 

with DFS and without the global timer

[    0.000000] Switching to timer-based delay loop, resolution 40ns
[    2.598221] clocksource: Switched to clocksource armada_370_xp_clocksource

no system time drift as far as I see (waited for ~20 minutes)

Link to comment
Share on other sites

Great! This fixes everything. 

Time is running as it should, ntp is working and the cpu clocks down if not needed.

 

Out of interest though, could there be any disadvantage of not using the a9 global timer?

 

And where is the maximum frequence for the cpu configured? Device Tree? Or does it come directly from hardware?

Because my cpufreq-info says maximum clock is 1.33Ghz but Solidrun says it should be 1.6Ghz...

 

Greetings, 

count-doku

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines