[Fixed] Clearfog[Pro] Huge timedrift between hw and sysclock

Heisath · February 6, 2018

Hi,

I have a problem with my ClearfogPro, it worked fine for some time but now theres something wrong with the system clock and I am unsure wether this is caused by the hardware or software.

I set the time with the date command, then copy it to the rtc with hwclock --systohc...

After a few moments the times in sys/hw are already a few seconds apart. The system clock seems to be about 1/3 second to slow....

So after a few minutes its completely off, this causes ntp to stop working ofc.

I tried fixing this by using adjtimex but the setable value range ist not large enough...

This is the output of timedatectl ca 5 seconds after doing hwclock --hctosys

Local time: Tue 2018-02-06 19:52:37 CET
Universal time: Tue 2018-02-06 18:52:37 UTC
RTC time: Tue 2018-02-06 18:52:44

The RTC is pretty accurate btw. But the system time keeps lagging behind..

I also tried a clean armbian image, but the problem persists.

For the moment I fixed this by using crontab to copy the hw time to systime every 5minutes but this can't be the solution.

Is this a bug in the current builds oder rather a hardware problem?

Greetings,

count-doku

Igor · February 6, 2018

37 minutes ago, count-doku said:

Is this a bug in the current builds oder rather a hardware problem?

An unknown problem at this moment. Have you tried on both kernels? 4.4.y and 4.14.y?

Heisath · February 6, 2018

Hi,

thanks for the fast reply. I have tested with clean images from https://dl.armbian.com/clearfogpro/archive/ and can confirm that it is a bug in the "stretch next" 4.14.(14) Kernel.

The "stretch default" image 4.4.112 contains a working systemclock.

Whew am I relieved that my board aint broken.

Is it possible to fix this in the next kernel?

Edit: I will try to build the 4.15.y kernel and update. Check if this already fixes the issue

Edit Edit: Updated to 4.14.17 problem still persist. Couldn't get a build with 4.15.y

Greetings,

count-doku

Igor · February 7, 2018

6 hours ago, count-doku said:

Is it possible to fix this in the next kernel?

This is yet another issue on our big list of issues. The first stage of solving is recognizing the issue (checked), then it has to be replicated on our board, checked if it is an upstream problem or hw related ...

I am not sure if/can't promise it will be solved in a next release.

Heisath · February 7, 2018

Oh I did not expect you to fix this in the next kernel. I am already amazed that the support for armbian is so much better than from the solid run guys.

Take your time!

For reproducing the bug, just take your images:

https://dl.armbian.com/clearfogpro/archive/Armbian_5.38_Clearfogpro_Debian_stretch_default_4.4.112.7z

https://dl.armbian.com/clearfogpro/archive/Armbian_5.38_Clearfogpro_Debian_stretch_next_4.14.14.7z

burn them onto an sd card an try:

:> date --set "07 FEB 10:11:13"
:> hwclock --systohc
:> timedatectl

Repeat the timedatectl after a minute or so and you can already see the drift...

Is it possible to tell the armbian ./compile.sh to use the new 4.15.y Kernel? Then I would try this myself.

Yesterday it selected 4.14.y for next and dev options.

Greetings,

count-doku

zador.blood.stained · February 7, 2018

21 minutes ago, count-doku said:

Is it possible to tell the armbian ./compile.sh to use the new 4.15.y Kernel? Then I would try this myself.

Should be possible if you delete or disable additional wireless patches and set the branch to 4.15 manually.

BTW, looks like starting with 4.15 it will be possible to manually calibrate the RTC on Armada 38x, though I'm assuming that this is for HW related clock drifting, so if it works fine in 4.4.x this may not be related.

zador.blood.stained · February 7, 2018

I've just reread the first post, if RTC time is correct but the system time is not, then maybe this is related, the A388 SoC datasheed needs to be checked to see if it has SSCG too.

Edit:

Quote

This series allows to correct the affected clock when the SSCG is enabled. This drift can happen on all the mvebu SoC on the cpu clock block (ie cpu, ddr and l2 cache). Currently the only notable effect is for the Armada 370 because this SoC use the l2cache clock as source for the timer. That's why even if the series allow any of the mvebu SoC to benefit to this correction, Armada 370 is the only user of it.

... and just recently (in 4.14) Cortex-A9 global timer support was added for A38x, so it may be related.

Heisath · February 7, 2018

I read the thread posted by zador and it seems like this is exactly the problem.

After checking the A38x Marvell Fu-Spec. I would say the A388 also has SSCG.

See the A38x-Functional-Spec-PU0A.pdf at

https://marvellcorp.wufoo.com/forms/marvell-armada-38x-functional-specifications/

(or reupped https://jannis.imserv.org/nextcloud/index.php/s/y39mH3eA6D4jTRI )

On page 65 it says that there is SSCG functionality.

zador.blood.stained · February 7, 2018

... so now it has to be determined whether all boards are affected and whether it is related to the A9 global timer or anything else. I'll try to do some tests later.

zador.blood.stained · February 7, 2018

I did some tests and looks like our forward-ported DFS/cpufreq code affects the system timer (which is not that surprising since looks like the CPU and global timer have the same parent clock or at least the same clock controller).

Since DFS doesn't help much with the temperature and there are only 2 OPPs anyway it can be disabled, I'll push the relevant changes soon.

Heisath · February 8, 2018

Morning,

I just tested your changes zador, and the time is working properly once again! Now also ntp is synchronising etc...

Many thanks!

One thing though, because with cpufreq ondemand my clearfog was running at 666MHz at about 90% of the time... So now the consume would have about doubled.

As "The dynamic power (switching power) dissipated per unit of time by a chip is C·V²·A·f, where C is the capacitance being switched per clock cycle, V is voltage, A is the Activity Factor[3] indicating the average number of switching events undergone by the transistors in the chip (as a unitless quantity) and f is the switching frequency." (Wikipedia)

So now I have to decide if I'd rather use ntpdate + hwclock every few minutes or let the clearfog run at full speed the hole time :/

If I were to re-enable dfs, I only would have to take the files out from the disable_dfs folder and copy them one layer higher, correct?

https://github.com/armbian/build/tree/master/patch/kernel/mvebu-next

Greetings,

count-doku

Btw. just donated, grab yourself a beer, awesome work!

zador.blood.stained · February 8, 2018

1 minute ago, count-doku said:

If I were to re-enable dfs, I only would have to take the files out from the disable_dfs folder and copy them one layer higher, correct?

https://github.com/armbian/build/tree/master/patch/kernel/mvebu-next

Yes. Though I didn't try disabling the A9 timer to fix the drifting - without it system may use a different, more stable clock source.

1 minute ago, count-doku said:

So now the consume would have about doubled.

Consumption by the CPU cores in idle most likely will be doubled (+/- due to different effects like higher leakage currents at a higher chip temperature), but other consumers (i.e. DRAM and DRAM controller, Ethernet MACs and PHYs, CESA, USB, SATA and PCIe controllers) will still consume the same amount of power.

Heisath · February 8, 2018

Ok I will test and measure the consumation.

Do you plan to really fix the drifting in a future release or just wait for an upstream fix?

zador.blood.stained · February 8, 2018

4 minutes ago, count-doku said:

Do you plan to really fix the drifting in a future release or just wait for an upstream fix?

Since DFS never reached upstream unfortunately it's unlikely to get fixed, and if the problem is indeed related to the A9 global timer I'm not sure if it can be easily fixed at all (without disabling the timer).

Heisath · February 8, 2018

And disabling the timer is bad?

zador.blood.stained · February 8, 2018

5 minutes ago, count-doku said:

And disabling the timer is bad?

Support for it was added only in 4.14 so it worked without it before, but we didn't test the system clock for drifting - so the combination of DFS with no global timer needs to be tested first.

Heisath · February 8, 2018

Ah ok now I understand. Unfortunately I don't remember checking for time drifting before 4.14...

If you provide some patch / version for testing, I will gladly try it out.

EDIT:

Just compared clearfogpro power consume in my configuration as router / nas. With the old patch requiered about 15W in Standby(switching etc on, hdds turned off, 1 mPCIE SATA Card, 1 Wifi Card) and now about 20W.

So it increased quite a bit. And unfortunately most of the time it is in this state -> having dfs working would decrease consumation to 3/4...

zador.blood.stained · February 8, 2018

6 hours ago, count-doku said:

If you provide some patch / version for testing, I will gladly try it out.

Basically reverting this (just deleting the relevant DT node is enough)

Made some tests:

with DFS and with the global timer

[    0.000025] Switching to timer-based delay loop, resolution 1ns
[    3.121206] clocksource: Switched to clocksource arm_global_timer

observed the system time drift as expected

with DFS and without the global timer

[    0.000000] Switching to timer-based delay loop, resolution 40ns
[    2.598221] clocksource: Switched to clocksource armada_370_xp_clocksource

no system time drift as far as I see (waited for ~20 minutes)

Igor · February 8, 2018

Shall we include this in the update ? I already prepared one right before I saw this thread? Its currently on hold.

zador.blood.stained · February 8, 2018

7 minutes ago, Igor said:

Shall we include this in the update ? I already prepared one right before I saw this thread? Its currently on hold.

Yes, but we could reenable DFS and disable the global timer to get better power consumption - this needs to be pushed to the armbian/build repository first.

Igor · February 8, 2018

OK, push it and I'll recompile kernel once again.

Edit: Done.

Heisath · February 9, 2018

Great! This fixes everything.

Time is running as it should, ntp is working and the cpu clocks down if not needed.

Out of interest though, could there be any disadvantage of not using the a9 global timer?

And where is the maximum frequence for the cpu configured? Device Tree? Or does it come directly from hardware?

Because my cpufreq-info says maximum clock is 1.33Ghz but Solidrun says it should be 1.6Ghz...

Greetings,

count-doku

zador.blood.stained · February 9, 2018

2 hours ago, count-doku said:

And where is the maximum frequence for the cpu configured? Device Tree? Or does it come directly from hardware?

As I understand it is defined in hardware, and since it's not defined in DT the kernel may display the frequency incorrectly.

Heisath · February 10, 2018

Ok, it's not important anyways, even if it "just" runs at 2x 1.33GHz it's still fast enough.

Sign In

[Fixed] Clearfog[Pro] Huge timedrift between hw and sysclock

Recommended Posts

Heisath

Igor

Heisath

Igor

Heisath

zador.blood.stained

zador.blood.stained

Heisath

zador.blood.stained

zador.blood.stained

Heisath

zador.blood.stained

Heisath

zador.blood.stained

Heisath

zador.blood.stained

Heisath

zador.blood.stained

Igor

zador.blood.stained

Igor

Heisath

zador.blood.stained

Heisath

Similar Content

Forums

My Activity Streams

Download

Store

Important Information