Heisath Posted February 6, 2018 Posted February 6, 2018 Hi, I have a problem with my ClearfogPro, it worked fine for some time but now theres something wrong with the system clock and I am unsure wether this is caused by the hardware or software. I set the time with the date command, then copy it to the rtc with hwclock --systohc... After a few moments the times in sys/hw are already a few seconds apart. The system clock seems to be about 1/3 second to slow.... So after a few minutes its completely off, this causes ntp to stop working ofc. I tried fixing this by using adjtimex but the setable value range ist not large enough... This is the output of timedatectl ca 5 seconds after doing hwclock --hctosys Local time: Tue 2018-02-06 19:52:37 CET Universal time: Tue 2018-02-06 18:52:37 UTC RTC time: Tue 2018-02-06 18:52:44 The RTC is pretty accurate btw. But the system time keeps lagging behind.. I also tried a clean armbian image, but the problem persists. For the moment I fixed this by using crontab to copy the hw time to systime every 5minutes but this can't be the solution. Is this a bug in the current builds oder rather a hardware problem? Greetings, count-doku
Igor Posted February 6, 2018 Posted February 6, 2018 37 minutes ago, count-doku said: Is this a bug in the current builds oder rather a hardware problem? An unknown problem at this moment. Have you tried on both kernels? 4.4.y and 4.14.y?
Heisath Posted February 6, 2018 Author Posted February 6, 2018 Hi, thanks for the fast reply. I have tested with clean images from https://dl.armbian.com/clearfogpro/archive/ and can confirm that it is a bug in the "stretch next" 4.14.(14) Kernel. The "stretch default" image 4.4.112 contains a working systemclock. Whew am I relieved that my board aint broken. Is it possible to fix this in the next kernel? Edit: I will try to build the 4.15.y kernel and update. Check if this already fixes the issue Edit Edit: Updated to 4.14.17 problem still persist. Couldn't get a build with 4.15.y Greetings, count-doku
Igor Posted February 7, 2018 Posted February 7, 2018 6 hours ago, count-doku said: Is it possible to fix this in the next kernel? This is yet another issue on our big list of issues. The first stage of solving is recognizing the issue (checked), then it has to be replicated on our board, checked if it is an upstream problem or hw related ... I am not sure if/can't promise it will be solved in a next release.
Heisath Posted February 7, 2018 Author Posted February 7, 2018 Oh I did not expect you to fix this in the next kernel. I am already amazed that the support for armbian is so much better than from the solid run guys. Take your time! For reproducing the bug, just take your images: https://dl.armbian.com/clearfogpro/archive/Armbian_5.38_Clearfogpro_Debian_stretch_default_4.4.112.7z https://dl.armbian.com/clearfogpro/archive/Armbian_5.38_Clearfogpro_Debian_stretch_next_4.14.14.7z burn them onto an sd card an try: :> date --set "07 FEB 10:11:13" :> hwclock --systohc :> timedatectl Repeat the timedatectl after a minute or so and you can already see the drift... Is it possible to tell the armbian ./compile.sh to use the new 4.15.y Kernel? Then I would try this myself. Yesterday it selected 4.14.y for next and dev options. Greetings, count-doku
zador.blood.stained Posted February 7, 2018 Posted February 7, 2018 21 minutes ago, count-doku said: Is it possible to tell the armbian ./compile.sh to use the new 4.15.y Kernel? Then I would try this myself. Should be possible if you delete or disable additional wireless patches and set the branch to 4.15 manually. BTW, looks like starting with 4.15 it will be possible to manually calibrate the RTC on Armada 38x, though I'm assuming that this is for HW related clock drifting, so if it works fine in 4.4.x this may not be related.
zador.blood.stained Posted February 7, 2018 Posted February 7, 2018 I've just reread the first post, if RTC time is correct but the system time is not, then maybe this is related, the A388 SoC datasheed needs to be checked to see if it has SSCG too. Edit: Quote This series allows to correct the affected clock when the SSCG is enabled. This drift can happen on all the mvebu SoC on the cpu clock block (ie cpu, ddr and l2 cache). Currently the only notable effect is for the Armada 370 because this SoC use the l2cache clock as source for the timer. That's why even if the series allow any of the mvebu SoC to benefit to this correction, Armada 370 is the only user of it. ... and just recently (in 4.14) Cortex-A9 global timer support was added for A38x, so it may be related.
Heisath Posted February 7, 2018 Author Posted February 7, 2018 I read the thread posted by zador and it seems like this is exactly the problem. After checking the A38x Marvell Fu-Spec. I would say the A388 also has SSCG. See the A38x-Functional-Spec-PU0A.pdf at https://marvellcorp.wufoo.com/forms/marvell-armada-38x-functional-specifications/ (or reupped https://jannis.imserv.org/nextcloud/index.php/s/y39mH3eA6D4jTRI ) On page 65 it says that there is SSCG functionality.
zador.blood.stained Posted February 7, 2018 Posted February 7, 2018 ... so now it has to be determined whether all boards are affected and whether it is related to the A9 global timer or anything else. I'll try to do some tests later.
zador.blood.stained Posted February 7, 2018 Posted February 7, 2018 I did some tests and looks like our forward-ported DFS/cpufreq code affects the system timer (which is not that surprising since looks like the CPU and global timer have the same parent clock or at least the same clock controller). Since DFS doesn't help much with the temperature and there are only 2 OPPs anyway it can be disabled, I'll push the relevant changes soon.
Heisath Posted February 8, 2018 Author Posted February 8, 2018 Morning, I just tested your changes zador, and the time is working properly once again! Now also ntp is synchronising etc... Many thanks! One thing though, because with cpufreq ondemand my clearfog was running at 666MHz at about 90% of the time... So now the consume would have about doubled. As "The dynamic power (switching power) dissipated per unit of time by a chip is C·V²·A·f, where C is the capacitance being switched per clock cycle, V is voltage, A is the Activity Factor[3] indicating the average number of switching events undergone by the transistors in the chip (as a unitless quantity) and f is the switching frequency." (Wikipedia) So now I have to decide if I'd rather use ntpdate + hwclock every few minutes or let the clearfog run at full speed the hole time :/ If I were to re-enable dfs, I only would have to take the files out from the disable_dfs folder and copy them one layer higher, correct? https://github.com/armbian/build/tree/master/patch/kernel/mvebu-next Greetings, count-doku Btw. just donated, grab yourself a beer, awesome work!
zador.blood.stained Posted February 8, 2018 Posted February 8, 2018 1 minute ago, count-doku said: If I were to re-enable dfs, I only would have to take the files out from the disable_dfs folder and copy them one layer higher, correct? https://github.com/armbian/build/tree/master/patch/kernel/mvebu-next Yes. Though I didn't try disabling the A9 timer to fix the drifting - without it system may use a different, more stable clock source. 1 minute ago, count-doku said: So now the consume would have about doubled. Consumption by the CPU cores in idle most likely will be doubled (+/- due to different effects like higher leakage currents at a higher chip temperature), but other consumers (i.e. DRAM and DRAM controller, Ethernet MACs and PHYs, CESA, USB, SATA and PCIe controllers) will still consume the same amount of power.
Heisath Posted February 8, 2018 Author Posted February 8, 2018 Ok I will test and measure the consumation. Do you plan to really fix the drifting in a future release or just wait for an upstream fix?
zador.blood.stained Posted February 8, 2018 Posted February 8, 2018 4 minutes ago, count-doku said: Do you plan to really fix the drifting in a future release or just wait for an upstream fix? Since DFS never reached upstream unfortunately it's unlikely to get fixed, and if the problem is indeed related to the A9 global timer I'm not sure if it can be easily fixed at all (without disabling the timer).
zador.blood.stained Posted February 8, 2018 Posted February 8, 2018 5 minutes ago, count-doku said: And disabling the timer is bad? Support for it was added only in 4.14 so it worked without it before, but we didn't test the system clock for drifting - so the combination of DFS with no global timer needs to be tested first.
Heisath Posted February 8, 2018 Author Posted February 8, 2018 Ah ok now I understand. Unfortunately I don't remember checking for time drifting before 4.14... If you provide some patch / version for testing, I will gladly try it out. EDIT: Just compared clearfogpro power consume in my configuration as router / nas. With the old patch requiered about 15W in Standby(switching etc on, hdds turned off, 1 mPCIE SATA Card, 1 Wifi Card) and now about 20W. So it increased quite a bit. And unfortunately most of the time it is in this state -> having dfs working would decrease consumation to 3/4...
zador.blood.stained Posted February 8, 2018 Posted February 8, 2018 6 hours ago, count-doku said: If you provide some patch / version for testing, I will gladly try it out. Basically reverting this (just deleting the relevant DT node is enough) Made some tests: with DFS and with the global timer [ 0.000025] Switching to timer-based delay loop, resolution 1ns [ 3.121206] clocksource: Switched to clocksource arm_global_timer observed the system time drift as expected with DFS and without the global timer [ 0.000000] Switching to timer-based delay loop, resolution 40ns [ 2.598221] clocksource: Switched to clocksource armada_370_xp_clocksource no system time drift as far as I see (waited for ~20 minutes)
Igor Posted February 8, 2018 Posted February 8, 2018 Shall we include this in the update ? I already prepared one right before I saw this thread? Its currently on hold.
zador.blood.stained Posted February 8, 2018 Posted February 8, 2018 7 minutes ago, Igor said: Shall we include this in the update ? I already prepared one right before I saw this thread? Its currently on hold. Yes, but we could reenable DFS and disable the global timer to get better power consumption - this needs to be pushed to the armbian/build repository first.
Igor Posted February 8, 2018 Posted February 8, 2018 OK, push it and I'll recompile kernel once again. Edit: Done.
Heisath Posted February 9, 2018 Author Posted February 9, 2018 Great! This fixes everything. Time is running as it should, ntp is working and the cpu clocks down if not needed. Out of interest though, could there be any disadvantage of not using the a9 global timer? And where is the maximum frequence for the cpu configured? Device Tree? Or does it come directly from hardware? Because my cpufreq-info says maximum clock is 1.33Ghz but Solidrun says it should be 1.6Ghz... Greetings, count-doku
zador.blood.stained Posted February 9, 2018 Posted February 9, 2018 2 hours ago, count-doku said: And where is the maximum frequence for the cpu configured? Device Tree? Or does it come directly from hardware? As I understand it is defined in hardware, and since it's not defined in DT the kernel may display the frequency incorrectly.
Heisath Posted February 10, 2018 Author Posted February 10, 2018 Ok, it's not important anyways, even if it "just" runs at 2x 1.33GHz it's still fast enough.
Recommended Posts