Jump to content

pine64: massive date/time clock problem


linda

Recommended Posts

On 8/20/2019 at 2:32 PM, martinayotte said:

Do you have "CONFIG_SUN50I_ERRATUM_UNKNOWN1=y" in /boot/config-4.19.63-sunxi64 ?

 

 

I followed all the threads and checked all the config settings, I have this and all the others you usually ask enabled.

 

and for the record, it happened again this morning. 

 

Link to comment
Share on other sites

I got tired of trying to cleanup after the Pine A64/Armbian clock skips and shut off the board for a few months.

 

Then I came across this linux 5.3 patch for the Allwinner clock

https://github.com/torvalds/linux/commit/c950ca8c35eeb32224a63adc47e12f9e226da241

 

The stock 5.3-rc5 kernel built on the pine64 and installed without much trouble. Its been a day and a half under fairly heavy load (a heavy load would increase the frequency of clock skips) with no skips yet so I am cautiously optimistic.

 

the new clock fix kernel flag is: (already set in the tarball below)

CONFIG_SUN50I_ERRATUM_UNKNOWN1=y

 

If someone wants to try this on their own board without having to complie the kernel, I tar'ed up the kernel, modules, and /boot bits.

 

https://donp.org/linux-5.3-rc5-pinea64.tar.xz

 

By moving the /boot symlinks, the board should boot into this kernel:

 

lrwxrwxrwx  1 root root       11 Aug 21  2116 dtb -> dtb-5.3-rc5
lrwxrwxrwx  1 root root       17 Oct 13  2114 Image -> vmlinuz-5.3.0-rc5
lrwxrwxrwx  1 root root       17 Oct 13  2114 uInitrd -> uInitrd-5.3.0-rc5

 

haha you can see the timestamps of the symlinks happened after a clock skip.

 

Don

 

Link to comment
Share on other sites

Hi guys!
Facing the same issue :)
 

$ date  
Thu Oct 25 18:17:22 UTC 2114

$ uname -a
Linux pine64so-node6 4.19.57-sunxi64 #5.90 SMP Fri Jul 5 18:38:49 CEST 2019 aarch64 aarch64 aarch64 GNU/Linux

$ grep CONFIG_SUN50I_ERRATUM_UNKNOWN1 /boot/config-4.19.57-sunxi64 
CONFIG_SUN50I_ERRATUM_UNKNOWN1=y

 

Link to comment
Share on other sites

I compiled the kernel (branch next 4.19.y) applying a patch based in https://github.com/arctype-co/linux/commit/5fcb4e57eeaa4d670ef4acf5818c6fe16aa0d3d0

 

Strikes me as a more stable workaround, but I'm still evaluating.

 

 

Patch:

--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -328,16 +328,17 @@
  * number of CPU cycles in 3 consecutive 24 MHz counter periods.
  */
 #define __sun50i_a64_read_reg(reg) ({                                  \
-       u64 _val;                                                       \
-       int _retries = 150;                                             \
+       u64 _old, _new;                                                 \
+       int _retries = 200;                                             \
                                                                        \
        do {                                                            \
-               _val = read_sysreg(reg);                                \
+               _old = read_sysreg(reg);                                \
+                _new = read_sysreg(reg);                               \
                _retries--;                                             \
-       } while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries);        \
+       } while (unlikely(_old != _new) && _retries);                   \
                                                                        \
        WARN_ON_ONCE(!_retries);                                        \
-       _val;                                                           \
+       _new;                                                           \
 })
 
 static u64 notrace sun50i_a64_read_cntpct_el0(void)

 

Link to comment
Share on other sites

On 9/11/2019 at 9:15 PM, ams.br said:

I compiled the kernel (branch next 4.19.y) applying a patch based in https://github.com/arctype-co/linux/commit/5fcb4e57eeaa4d670ef4acf5818c6fe16aa0d3d0

 

Strikes me as a more stable workaround, but I'm still evaluating.

 

 

Patch:


--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -328,16 +328,17 @@
  * number of CPU cycles in 3 consecutive 24 MHz counter periods.
  */
 #define __sun50i_a64_read_reg(reg) ({                                  \
-       u64 _val;                                                       \
-       int _retries = 150;                                             \
+       u64 _old, _new;                                                 \
+       int _retries = 200;                                             \
                                                                        \
        do {                                                            \
-               _val = read_sysreg(reg);                                \
+               _old = read_sysreg(reg);                                \
+                _new = read_sysreg(reg);                               \
                _retries--;                                             \
-       } while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries);        \
+       } while (unlikely(_old != _new) && _retries);                   \
                                                                        \
        WARN_ON_ONCE(!_retries);                                        \
-       _val;                                                           \
+       _new;                                                           \
 })
 
 static u64 notrace sun50i_a64_read_cntpct_el0(void)

 

 

 

After 14 days, the date remains stable. The patch apparently worked

Link to comment
Share on other sites

16 hours ago, langerma said:

how can i compile this in my kernel? or are there any packages?

 

at a Linux OS (Armbian recommend Ubuntu), download the latest source of armbian build

git clone https://github.com/armbian/build.git

copy patch file arm_arch_timer.patch to folder userpatches/kernel/sunxi-next (more details in documentation). Run the compile script

./compile.sh

choose "U-boot and kernel packages", apply.

 

choose "Do not change the kernel configuration" for a standard kernel, or "Show a kernel configuration menu before compilation" for customize your kernel (recommended for advanced users), apply

 

choose your board

 

choose "next"

 

apply, and now just have a coffee and wait. After compilation deb packages generated at folder output/debs, copy the deb packages to you Armbian and run dpkg -i *.deb

 

more details at https://docs.armbian.com/Developer-Guide_Build-Preparation/

arm_arch_timer.patch

Link to comment
Share on other sites

On 10/4/2019 at 11:25 AM, langerma said:

@ams.br seems like your patch works!

 

1 hour ago, markolonius said:

I was able to successfully build with the patch this time as well.   Been running solid since!  Can we get this patch submitted into official builds?

Same here, running stable for 8 days. Before the patch, usually 1 or 2 days.

Link to comment
Share on other sites

Thank you ams.br for the solution.

I run 7 pine64so on the clusterboard and two of the SoCs had this problem. I would get the error almost immediately under heavy cpu usage.

Now my test SoC is running 20h straight at 99% cpu usage (Seti@home) and the date is still in 2019.

Cool stuff. thank you once again

Link to comment
Share on other sites

@ams.br Allow me to join the chorus. Thank you for posting your easy-to-follow instructions. I was able to build the current Armbian kernel with your patch and install it without any issue at all. The system has been up 1 day and 11 hours. I finally took the step of installing your patch after the system time-jumped twice over the weekend.

 

I can't say for sure yet that it fixes the problem for me, but I can say that your directions made it a piece of cake to build and install, and so far, so good.


Thanks!

 

Updated 2019-11-18: My patched Pine has been running for 7 days and 12 hours now without issue, which is the longest it has ever run without experiencing the time jump.

Link to comment
Share on other sites

I haven't had much time to follow-up on this.

 

FWIW the upstream patches should have included an improved version of the patch above already.  To clarify it is not a fix but rather a loop to read the hardware timer repeatedly until there are two consecutive reads that are the same.  It does add a different amount of overhead than the fix it is replacing.  For instance there must always be at least two reads.

 

I am wondering about the above solution being replaced since it is only a nine-bit mask, but it was found that this should probably be 11 bits.

 

I forget the reason that solution was not used as-is in the upstream but I do remember there was concern two reads close enough could read the same value before the hardware timer actually changed (IE:  The same bad value could be read twice consecutively)

 

Now that we're a few versions newer than the last time I checked it probably warrants another review.  FWIW I haven't seen the issue with the more recent kernels while previously I had one A64 that couldn't get through a half day.

 

Link to comment
Share on other sites

I've just installed latest armbian buster today, updated, switched to the latest 5.4.2 kernel via armbian-config and installed k3s (minimal k8s distro) on it and after a few minutes, 2113!

I will try the attached patch but just in case, to me it is still unresolved in 5.4.2.

Link to comment
Share on other sites

16 hours ago, martinayotte said:

I've just added the patch, and will do testing when my new build is finished :

 

https://github.com/armbian/build/commit/d688244fea291e81a42c64aaf3588ee1f83b741c

 

Thanks!!!

 

How can I update the kernel to the latest build? Is there a nightly builds kernel repo? Is there a way to check if the patch is there as a 'regular user'? Thanks again! :)

Link to comment
Share on other sites

4 hours ago, Igor said:


There are. See armbian-config and remember that those builds are not tested.

I've updated to the latest one (19.11.3.346) and I can confirm the fix is not yet there (probably by looking at the commit time and the build time it didn't entered in the build). In order to do that, I've downloaded the sources (https://apt.armbian.com/pool/main/l/linux-source-5.4.2-dev-sunxi64/linux-source-dev-sunxi64_19.11.3.346_all.deb) and the patch doesn't seem to be there (it would be nice if the debs have a changelog or if they can be shown somewhere)

I'll wait for the next build :)

Thanks!!!

Link to comment
Share on other sites

I've tried to build the kernel with the current sources but it seems to fail to apply the patch. From the log file:

Processing file /root/build/patch/kernel/sunxi-dev/fix-a64-timejump.patch
1 out of 1 hunk FAILED -- saving rejects to file drivers/clocksource/arm_arch_timer.c.rej

It is just a clean git clone from the armbian/build repository and the following build flags:

/compile.sh BOARD=pine64 BRANCH=dev KERNEL_ONLY=yes KERNEL_CONFIGURE=no

The rej file:

# find ~/build/cache/sources -type f -iname "*arm_arch_timer.c.rej*"
/root/build/cache/sources/linux-mainline/orange-pi-5.4/drivers/clocksource/arm_arch_timer.c.rej

# cat /root/build/cache/sources/linux-mainline/orange-pi-5.4/drivers/clocksource/arm_arch_timer.c.rej
--- drivers/clocksource/arm_arch_timer.c
+++ drivers/clocksource/arm_arch_timer.c
@@ -342,16 +342,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
  * number of CPU cycles in 3 consecutive 24 MHz counter periods.
  */
 #define __sun50i_a64_read_reg(reg) ({					\
-	u64 _val;							\
-	int _retries = 150;						\
+	u64 _old, _new;                                                 \
+	int _retries = 200;						\
 									\ 									\
 	do {								\
-		_val = read_sysreg(reg);				\
+		_old = read_sysreg(reg);				\
+		_new = read_sysreg(reg);				\
 		_retries--;						\
-	} while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries);	\
+	} while (unlikely(_old != _new) && _retries);			\
 									\
 	WARN_ON_ONCE(!_retries);					\
-	_val;								\
+	_new;								\
 })
 
 static u64 notrace sun50i_a64_read_cntpct_el0(void)

IDK why this file is hosted in the `orange-pi-5.4` folder.

 

I've just in case extracted the source from the deb file and the code is not patched.

 

Any help? Thanks!!!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines