Jump to content

A64 date/time clock issue


Recommended Posts

4 minutes ago, Petee said:

Not sure where you are at Martin.

12h36 EST, Quebec ...

I've tried again Armbian_5.79.190418_Pine64_Ubuntu_bionic_dev_5.0.7.img and network is working fine for me.

Did you tried not using 3.3V and using your normal PSU, or a 5V attached to GPIO 4/6 pins ?

(I'm leaving too, going to supermarket ...)

 

Link to comment
Share on other sites

Yes per your instruction did not utilize the 3.3VDC pin on the JTAG port and used a 2 AMP external PS connected to Euler Bus.

 

Aside from the time sync issue (well the time going way off) the network port just disconnects randomly after a day or so with the default Gb setting or adjusted to Mb setting.

 

Does this mean that I have a bad Pine64 here?

 

With the original posted on the Pine64 wiki / forum never had an issue with the legacy Ubuntu 16.04.  I did update that original build to Ubuntu 18.04 and it worked OK but did have some issues with it.

 

Had a few minutes here to type between this and that....

 

 

Link to comment
Share on other sites

10 minutes ago, Petee said:

Does this mean that I have a bad Rock64 here?

Don't confuse me : Pine64 != Rock64 ...

 

I don't understand why using Armbian_5.79.190418_Pine64_Ubuntu_bionic_dev_5.0.7.img.

On my side, here is "dmesg | grep eth" give :

root@pine64:~# dmesg | grep eth
[    4.651697] dwmac-sun8i 1c30000.ethernet: PTP uses main clock
[    4.662672] dwmac-sun8i 1c30000.ethernet: Linked as a consumer to regulator.6
[    4.677055] dwmac-sun8i 1c30000.ethernet: Current syscon value is not the default 6 (expect 0)
[    4.685758] dwmac-sun8i 1c30000.ethernet: No HW DMA feature register supported
[    4.693011] dwmac-sun8i 1c30000.ethernet: RX Checksum Offload Engine supported
[    4.700246] dwmac-sun8i 1c30000.ethernet: COE Type 2
[    4.705231] dwmac-sun8i 1c30000.ethernet: TX Checksum insertion supported
[    4.712026] dwmac-sun8i 1c30000.ethernet: Normal descriptors
[    4.717702] dwmac-sun8i 1c30000.ethernet: Chain mode enabled
[   18.403292] dwmac-sun8i 1c30000.ethernet eth0: No Safety Features support found
[   18.403309] dwmac-sun8i 1c30000.ethernet eth0: No MAC Management Counters available
[   18.403316] dwmac-sun8i 1c30000.ethernet eth0: PTP not supported by HW
[   31.712399] dwmac-sun8i 1c30000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[   31.712430] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

 

Link to comment
Share on other sites

Wierd...via serial terminal see:

 

root@pine64:~# dmesg | grep eth
[    0.000000] psci: probing for conduit method from DT.
[    2.795398] dwmac-sun8i 1c30000.ethernet: PTP uses main clock
[    2.795483] dwmac-sun8i 1c30000.ethernet: Linked as a consumer to regulator.6
[    2.796023] dwmac-sun8i 1c30000.ethernet: Current syscon value is not the default 6 (expect 0)
[    2.796045] dwmac-sun8i 1c30000.ethernet: No HW DMA feature register supported
[    2.796050] dwmac-sun8i 1c30000.ethernet: RX Checksum Offload Engine supported
[    2.796056] dwmac-sun8i 1c30000.ethernet: COE Type 2
[    2.796061] dwmac-sun8i 1c30000.ethernet: TX Checksum insertion supported
[    2.796067] dwmac-sun8i 1c30000.ethernet: Normal descriptors
[    2.796072] dwmac-sun8i 1c30000.ethernet: Chain mode enabled
[   11.869154] dwmac-sun8i 1c30000.ethernet eth0: Could not attach to PHY
[   11.875870] dwmac-sun8i 1c30000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   12.010109] dwmac-sun8i 1c30000.ethernet eth0: Could not attach to PHY
[   12.016825] dwmac-sun8i 1c30000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   12.102540] dwmac-sun8i 1c30000.ethernet eth0: Could not attach to PHY
[   12.109185] dwmac-sun8i 1c30000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   12.145562] dwmac-sun8i 1c30000.ethernet eth0: Could not attach to PHY
[   12.152178] dwmac-sun8i 1c30000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   12.191148] dwmac-sun8i 1c30000.ethernet eth0: Could not attach to PHY
[   12.197780] dwmac-sun8i 1c30000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)

Booting up with the standard release to updated nightly see:


 

root@ICS-Pine64:~# dmesg | grep eth
[    0.000000] psci: probing for conduit method from DT.
[    2.021202] dwmac-sun8i 1c30000.ethernet: PTP uses main clock
[    2.021289] dwmac-sun8i 1c30000.ethernet: Linked as a consumer to regulator.6
[    2.021480] dwmac-sun8i 1c30000.ethernet: Current syscon value is not the default 6 (expect 0)
[    2.021494] dwmac-sun8i 1c30000.ethernet: No HW DMA feature register supported
[    2.021499] dwmac-sun8i 1c30000.ethernet: RX Checksum Offload Engine supported
[    2.021505] dwmac-sun8i 1c30000.ethernet: COE Type 2
[    2.021510] dwmac-sun8i 1c30000.ethernet: TX Checksum insertion supported
[    2.021515] dwmac-sun8i 1c30000.ethernet: Normal descriptors
[    2.021521] dwmac-sun8i 1c30000.ethernet: Chain mode enabled
[   11.662540] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   11.665029] dwmac-sun8i 1c30000.ethernet eth0: No Safety Features support found
[   11.665042] dwmac-sun8i 1c30000.ethernet eth0: No MAC Management Counters available
[   11.665049] dwmac-sun8i 1c30000.ethernet eth0: PTP not supported by HW
[   11.665527] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   12.674853] dwmac-sun8i 1c30000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[   12.674889] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

 

Link to comment
Share on other sites

11 minutes ago, Petee said:

Booting up with the standard release to updated nightly see:

I'm confused which one is which one ...

Is the nightly Armbian_5.79.190418_Pine64_Ubuntu_bionic_dev_5.0.7.img is now working for you ?

 

Link to comment
Share on other sites

Is the nightly Armbian_5.79.190418_Pine64_Ubuntu_bionic_dev_5.0.7.img is now working for you ?

 

No.  If I write the nightly image to an SD card it does not boot and shows the errors above via the terminal session.

 

root@pine64:~# [  312.018172] dwmac-sun8i 1c30000.ethernet eth0: Could not attach to PHY
[  312.024768] dwmac-sun8i 1c30000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[  312.064009] dwmac-sun8i 1c30000.ethernet eth0: Could not attach to PHY

If I write the standard release Pine64 image to the SD card the Pine64 boots up fine. 

 

If I update the standard release to the new nightly build via armbian-config it boots up fine which doesn't make sense. 

 

Isn't it the same kernel being updated with the

 

1 - writing standard release then updating to current nightly build

2 - current nightly build write

Link to comment
Share on other sites

3 minutes ago, Petee said:

No.  If I write the nightly image to an SD card it does not boot and shows the errors above via the terminal session.

Strange, since I've done the same thing twice this morning, and I didn't have this issue.

4 minutes ago, Petee said:

Isn't it the same kernel being updated with the

 

1 - writing standard release then updating to current nightly build

2 - current nightly build write

I don't know, I'm never doing "updates", I always do fresh install.

 

With this standard image with update, what "uname -a" give you ? What is the output of "ls -l /boot" ?

 

Link to comment
Share on other sites

Actually looking at the difference here...looking at the updated standard via nightly good booting ...

 

See this on boot via serial...

 

U-Boot SPL 2018.11-rc3-armbian (Feb 08 2019 - 11:37:16 +0100)
 

root@ICS-Pine64:~# uname -a
Linux ICS-Pine64 4.19.35-sunxi64 #5.79.190418 SMP Thu Apr 18 01:36:11 CEST 2019 aarch64 aarch64 aarch64 GNU/Linux

 

root@ICS-Pine64:~# ls -l /boot
total 34144
-rw-r--r-- 1 root root      153 Apr 18 16:19 armbianEnv.txt
-rw-r--r-- 1 root root     1536 Feb 10 02:54 armbian_first_run.txt.template
-rw-r--r-- 1 root root   230454 Feb 10 02:54 boot.bmp
-rw-r--r-- 1 root root     2970 Feb 10 02:51 boot.cmd
-rw-r--r-- 1 root root     4882 Feb 10 02:54 boot-desktop.png
-rw-rw-r-- 1 root root     3042 Feb 10 02:55 boot.scr
-rw-r--r-- 1 root root   153079 Apr 17 18:36 config-4.19.35-sunxi64
lrwxrwxrwx 1 root root       19 Apr 18 05:38 dtb -> dtb-4.19.35-sunxi64
drwxr-xr-x 3 root root     4096 Apr 18 05:37 dtb-4.19.35-sunxi64
lrwxrwxrwx 1 root root       23 Apr 18 05:37 Image -> vmlinuz-4.19.35-sunxi64
-rw-r--r-- 1 root root  8624976 Apr 18 05:43 initrd.img-4.19.35-sunxi64
-rw-r--r-- 1 root root  3040151 Apr 17 18:36 System.map-4.19.35-sunxi64
lrwxrwxrwx 1 root root       23 Apr 18 05:43 uInitrd -> uInitrd-4.19.35-sunxi64
-rw-r--r-- 1 root root  8625040 Apr 18 05:43 uInitrd-4.19.35-sunxi64
-rwxr-xr-x 1 root root 14245896 Apr 17 18:36 vmlinuz-4.19.35-sunxi64

 

Link to comment
Share on other sites

Gb transfer speeds are fine...

 

Pine64:~# iperf -c 192.168.244.171
------------------------------------------------------------
Client connecting to 192.168.244.171, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.244.149 port 57916 connected with 192.168.244.171 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.08 GBytes   926 Mbits/sec

 

Link to comment
Share on other sites

3 minutes ago, Petee said:

Linux ICS-Pine64 4.19.35-sunxi64 #5.79.190418 SMP Thu Apr 18 01:36:11 CEST 2019 aarch64 aarch64 aarch64 GNU/Linux

Where the ICS in "Linux ICS-Pine64 4.19.35-sunxi64" ? It should normally be seen as "Linux pine64 4.19.35-sunxi64"

Anyway, so, this is updating to the Latest NEXT kernel , which seems to also have the timer fix, could you confirm by running "test_timer" ?

 

Link to comment
Share on other sites

I changed the host name from pine64 to ICS-Pine64.

 

root@ICS-Pine64:~# uname -a
Linux ICS-Pine64 4.19.35-sunxi64 #5.79.190418 SMP Thu Apr 18 01:36:11 CEST 2019 aarch64 aarch64 aarch64 GNU/Linux

 

Welcome to ARMBIAN 5.78.190415 nightly Ubuntu 18.04.2 LTS 4.19.35-sunxi64

 

root@ICS-Pine64:~# ./test_timer
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
# time1: 7d5fffff7, time2: 7d5e7c200, diff: -1588727
not ok 2 native counter reads are monotonic # 1 errors
# min: -1588727, avg: 8, max: 824
ok 3 Linux counter reads are monotonic # 0 errors
# min: 541, avg: 74002, max: 734418224158
# core 0: counter value: 34044992124 => 1418 sec
# core 0: offsets: back-to-back: 14, b-t-b synced: 9, b-t-b w/ delay: 11
# core 1: counter value: 34044993581 => 1418 sec
# core 1: offsets: back-to-back: 11, b-t-b synced: 8, b-t-b w/ delay: 10
# core 2: counter value: 34044995467 => 1418 sec
# core 2: offsets: back-to-back: 11, b-t-b synced: 8, b-t-b w/ delay: 10
# core 3: counter value: 34044996666 => 1418 sec
# core 3: offsets: back-to-back: 10, b-t-b synced: 8, b-t-b w/ delay: 10
1..3

 

Link to comment
Share on other sites

1 minute ago, Petee said:

not ok 2 native counter reads are monotonic # 1 errors

So, it is not fixed ? ...

Than, let's check the DTB : what appearing if doing "fdtdump /boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb | grep erra" ?

it should show :

        allwinner,erratum-unknown1;

If not, lets edit the DT, by first install DT compiler that support symbols :

wget http://ftp.debian.org/debian/pool/main/d/device-tree-compiler/device-tree-compiler_1.4.7-3_arm64.deb
dpkg -i device-tree-compiler_1.4.7-3_arm64.deb

Then, do backup of DTB and decompile DTB into DTS :

cp /boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb /boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb-ORIGINAL
git-work/dtc/dtc -I dtb -O dts -o sun50i-a64-pine64-plus.dts /boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb

Edit the resulting "sun50i-a64-pine64-plus.dts", find the "timer" node, add the line "allwinner,erratum-unknown1;" into that node and save.

Recompile the DTB from the updated DTS :

git-work/dtc/dtc -I dts -O dtb -o /boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb sun50i-a64-pine64-plus.dts 

The, reboot and retest "test_timer" ...

 

Link to comment
Share on other sites

root@ICS-Pine64:~# fdtdump / boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb | grep erra

**** fdtdump is a low-level debugging tool, not meant for general use.
**** If you want to decompile a dtb, you probably want
****     dtc -I dtb -O dts <filename>

Couldn't open blob from '/ boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb': No such file or directory
FATAL ERROR: could not read: / boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb
root@ICS-Pine64:/boot/dtb/allwinner# ls
overlay                              sun50i-h5-nanopi-k1-plus.dtb
sun50i-a64-amarula-relic.dtb         sun50i-h5-nanopi-m1-plus2.dtb
sun50i-a64-bananapi-m64.dtb          sun50i-h5-nanopi-neo2.dtb
sun50i-a64-nanopi-a64.dtb            sun50i-h5-nanopi-neo2-v1.1.dtb
sun50i-a64-olinuxino.dtb             sun50i-h5-nanopi-neo-core2.dtb
sun50i-a64-orangepi-win.dtb          sun50i-h5-nanopi-neo-plus2.dtb
sun50i-a64-pine64.dtb                sun50i-h5-orangepi-pc2.dtb
sun50i-a64-pine64-lts.dtb            sun50i-h5-orangepi-prime.dtb
sun50i-a64-pine64-plus.dtb           sun50i-h5-orangepi-zero-plus2.dtb
sun50i-a64-pine64-plus.dtb-ORIGINAL  sun50i-h5-orangepi-zero-plus.dtb
sun50i-a64-pinebook.dtb              sun50i-h6-orangepi-lite2.dtb
sun50i-a64-sopine-baseboard.dtb      sun50i-h6-orangepi-one-plus.dtb
sun50i-a64-teres-i.dtb               sun50i-h6-pine-h64.dtb
sun50i-h5-libretech-all-h3-cc.dtb

 

Getting an error here:

 

root@ICS-Pine64:/boot# git-work/dtc/dtc -I dtb -O dts -o sun50i-a64-pine64-plus.dts /boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb
-bash: git-work/dtc/dtc: No such file or directory

 

root@ICS-Pine64:/boot/dtb/allwinner# git-work/dtc/dtc
-bash: git-work/dtc/dtc: No such file or directory

 

Link to comment
Share on other sites

19 minutes ago, Petee said:

Couldn't open blob from '/ boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb': No such file or directory

You've mistakenly added a <space> between / and boot ...

19 minutes ago, Petee said:

git-work/dtc/dtc

Sorry, copy/paste error ... I'm using self compile dtc ... I've re-edit the above commands by removing this prefix path ...

Link to comment
Share on other sites

Ahh...thank you Martin...


 

root@ICS-Pine64:~# fdtdump /boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb | grep erra

**** fdtdump is a low-level debugging tool, not meant for general use.
**** If you want to decompile a dtb, you probably want
****     dtc -I dtb -O dts <filename>

        allwinner,erratum-unknown1;
root@ICS-Pine64:/boot/dtb/allwinner# dtc -I dtb -O dts -o sun50i-a64-pine64-plus.dts /boot/dtb/allwinner/sun50i-a64-pine64-plus.dtb
sun50i-a64-pine64-plus.dts: Warning (avoid_unnecessary_addr_size): /soc/spi@1c68000/spi-flash@0: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property
sun50i-a64-pine64-plus.dts: Warning (graph_child_address): /soc/lcd-controller@1c0c000/ports/port@0: graph node has single child node 'endpoint@0', #address-cells/#size-cells are not necessary

root@ICS-Pine64:/boot/dtb/allwinner# nano sun50i-a64-pine64-plus.dts

 

Apologies for being dense here....

 

Found the timer stuff..

 

add the line "allwinner,erratum-unknown1;" into that node and save.

 

timer {
        compatible = "arm,armv8-timer";
        allwinner,erratum-unknown1;
        interrupts = < 0x01 0x0d 0xf04 0x01 0x0e 0xf04 0x01 0x0b 0xf04 0x01 0x0a 0xf04 >;
    };

It is already there when I looked at the file sun50i-a64-pine64-plus.dts.

 

What do I do next?

 

Link to comment
Share on other sites

31 minutes ago, Petee said:

What do I do next?

So, even if you got 1 error with "test_timer", lets pray that in long terms, you wont get any the 95 years jump ...

So, you can now proceed to install your own application/use-case ...

 

Link to comment
Share on other sites

Interestingly I've never had a successful run with Andre's test_timer:

 

Spoiler

user@pine64:~/src$ ./test_timer 
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
# time1: b525ffff9, time2: b52583dff, diff: -508410
# time1: b52effff9, time2: b52e83dff, diff: -508410
# time1: b54ba0ff9, time2: b54ba0e00, diff: -505
not ok 2 native counter reads are monotonic # 3 errors
# min: -508410, avg: 6, max: 15879
# diffs: -660875, -20792, -660834, -660875
not ok 3 Linux counter reads are monotonic # 4 errors
# min: -660875, avg: 484, max: 71084
# core 0: counter value: 48987577802 => 2041 sec
# core 0: offsets: back-to-back: 23, b-t-b synced: 6, b-t-b w/ delay: 17
# core 1: counter value: 48987579166 => 2041 sec
# core 1: offsets: back-to-back: 13, b-t-b synced: 6, b-t-b w/ delay: 8
# core 2: counter value: 48987581662 => 2041 sec
# core 2: offsets: back-to-back: 8, b-t-b synced: 6, b-t-b w/ delay: 9
# core 3: counter value: 48987582921 => 2041 sec
# core 3: offsets: back-to-back: 8, b-t-b synced: 7, b-t-b w/ delay: 12
1..3

 

 

This board too is a 2GB PineA64+ from the initial Kickstarter campaign.  Using the latest DEV pre-built kernel via 'apt upgrade', with the UNKNOWN1 erratum active:

 

Spoiler

user@pine64:~/src$ dpkg --list linux-image-dev-sunxi64
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
ii  linux-image-de 5.79.190419  arm64        Linux kernel, version 5.0.7-sunxi

user@pine64:~/src$ cat /proc/version 
Linux version 5.0.7-sunxi64 (root@nightly) (gcc version 7.4.1 20181213 [linaro-7.4-2019.02 revision 56ec6f6b99cc167ff0c2f8e1a2eed33b1edc85d4] (Linaro GCC 7.4-2019.02)) #5.79.190419 SMP Thu Apr 18 12:08:24 CEST 2019

user@pine64:~/src$ dmesg | egrep -i errat
[    0.000000] arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1

 

 

I haven't seen any of the known symptoms from this issue (IE: date jump by multiple of 95 years) as long as the erratum workaround is active, but the test_timer has never passed fully.  For now I've chosen to leave this aside since I'm not seeing any other symptoms.

 

Link to comment
Share on other sites

Yes I do have an RTC backup battery attached, but that really only applies when power is physically removed from the board.  With just a 'reboot' the RTC remains powered.

 

This particular timer issue also is not related to the RTC - it is a problem with a hardware timer (really just a counter) that is part of the SoC itself.  When the symptoms manifest, the kernel (aka "system") time will jump by a multiple of 95 years and after that the kernel is no longer able to sync back to the hardware RTC which is independent.

 

I had also submitted PR #1329 following Kernel config: Change RTC_HCTOSYS as default? which has since been merged.  As a result there should be no need to otherwise explicitly use 'hwclock -s' to set the system clock from the RTC anymore.  Instead during boot a message should be seen, before even init is spawned indicating the system clock has been set from the RTC:

user@pine64:~$ dmesg | grep -w rtc
[    1.975054] sun6i-rtc 1f00000.rtc: registered as rtc0
[    1.975070] sun6i-rtc 1f00000.rtc: RTC enabled
[    2.031287] rtc-ldo: supplied by regulator-dummy
[    4.079694] sun6i-rtc 1f00000.rtc: setting system clock to 2019-04-19T01:29:14 UTC (1555637354)

 

Link to comment
Share on other sites

Another interesting thing, i have been running my kernel without any patch related to the issue and have not seen any problem on my NanoPi A64. Although i don't usually run for a long time so the issue could manifest in time. I had a comment about this on linux-sunxi list a long time ago, maybe i was just lucky. I will have a look at your PR.

Link to comment
Share on other sites

Again - the RTC and the timer in question are two separate entities and are unrelated.  The issue here is not with the RTC but with a completely different counter in the SoC.

 

The mainline fix for the timer issue here was formally added to the mainline in the 5.1.y branch for the A64.  It has been pulled back to the 5.0.y and 4.19.y branches in various parts between the upstream mainline fork by megous used by Armbian as well as Armbian patches applied on top of that branch.  The nightly builds and the packaged kernel updates have had these for a short while, and to see if the workaround from these has been applied and is active you can use:

 

user@pine64:~$ dmesg | egrep -i errat
[    0.000000] arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1

The 'UNKNOWN1' erratum is the issue in question.

 

There had been a previous fix that had been around for a long time and was part of what eventually became the formal mainline fix.  It was the period between these two where this issue had reappeared but should be resolved now.

 

Those are separate from enabling RTC use by the kernel during boot to set the system clock which is a different and independent timer.

 

It should be noted that trying to use 'hwclock -s' after another active process (such as ntpd, chrony, or systemd-timesyncd) has initiated synchronization to the RTC from the system clock can cause issues.  NTPd can normally recover but the others perhaps not so much.

 

 

Link to comment
Share on other sites

I should probably put in an update. . . . 

 

After running stable for over a week I tried to disable a tickless kernel (standard modern default) to reduce jitter:

user@pine64:~$ egrep nohz /boot/armbianEnv.txt
extraargs=nohz=off

user@pine64:~$ cat /proc/cmdline 
root=UUID=c80325d7-1d25-4151-85c3-47343b473fae rootwait rootfstype=ext4 console=ttyS0,115200 console=tty1 panic=10 consoleblank=0 loglevel=7 ubootpart=0f940383-01 usb-storage.quirks=0x2537:0x1066:u,0x2537:0x1068:u nohz=off  cgroup_enable=memory swapaccount=1

 

Sadly this once again brought about instability.  With a non-tickless kernel I start to see rcu_sched self-detected stalls such as I noted in DEV (5.0.y) on PineA64+ seeing occasional CPU stalls.  At some point the system clock will jump by 95 years and a reboot ends up being required, potentially forcibly (bad things start to happen)

 

Interestingly, with a non-tickless kernel Andy's test_timer shows considerably different results, confirming there is indeed still a problem despite the workaround being present and active:

Spoiler

user@pine64:~/src$ dmesg | egrep UNKNOWN
[    0.000000] arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1

user@pine64:~/src$ ./test_timer 
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
# time1: a266aaff9, time2: a266aae00, diff: -505
# time1: a26836ff8, time2: a26836e00, diff: -504
# time1: a26881ff9, time2: a26881e00, diff: -505
# time1: a26884ff9, time2: a26884e00, diff: -505
# time1: a26a67ff9, time2: a26a641ff, diff: -15866
# time1: a26c9bff9, time2: a26c9be00, diff: -505
# time1: a27085ff9, time2: a27085e00, diff: -505
# time1: a270a3ff9, time2: a270a3e00, diff: -505
# time1: a270c1ff9, time2: a270c1dff, diff: -506
# time1: a271e4ff9, time2: a271e4e00, diff: -505
# time1: a2726bff9, time2: a2726be00, diff: -505
# time1: a2733aff9, time2: a2733ae00, diff: -505
# time1: a27d99ff8, time2: a27d99e00, diff: -504
# time1: a27e62ff9, time2: a27e62e00, diff: -505
# time1: a27e89ff9, time2: a27e89e00, diff: -505
# time1: a2819eff9, time2: a2819ee00, diff: -505
# too many errors, stopping reports
not ok 2 native counter reads are monotonic # 41 errors
# min: -15866, avg: 6, max: 508423
# diffs: -660834, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -660833, -20791, -20791
# too many errors, stopping reports
not ok 3 Linux counter reads are monotonic # 33 errors
# min: -660834, avg: 510, max: 66197833
# core 0: counter value: 43968317728 => 1832 sec
# core 0: offsets: back-to-back: 12, b-t-b synced: 6, b-t-b w/ delay: 12
# core 1: counter value: 43968319238 => 1832 sec
# core 1: offsets: back-to-back: 8, b-t-b synced: 6, b-t-b w/ delay: 9
# core 2: counter value: 43968321292 => 1832 sec
# core 2: offsets: back-to-back: 9, b-t-b synced: 6, b-t-b w/ delay: 9
# core 3: counter value: 43968322532 => 1832 sec
# core 3: offsets: back-to-back: 10, b-t-b synced: 7, b-t-b w/ delay: 9
1..3

 

 

Unfortunately I likely won't have any time to look at this until after the weekend.

 

I'm still working towards getting the Pine64 back as a stable stratum-1 NTP server with a GPS-based reference clock.  With Kernel PPS (requires non-tickless kernel) and proper offset adjustments this was working very well on 4.14.y (and earlier).

 

Oh - and to add to how the RTC got brought into this discussion, though it is not really related:  The kernel has a capability to keep the RTC in sync with its system time, primarily for use with NTP or other external service that will discipline the system time.  When the symptom of the issue here manifests with the date jumping by a multiple of 95 years, the system time is no longer within the valid range for the RTC so any further attempts to synchronize the RTC from the system time result in failure:

[28410.996865] sun6i-rtc 1f00000.rtc: rtc only supports year in range 1970 - 2033

 

The above message will only appear in the live kernel message buffer (aka 'dmesg') and on the system console, so it can be easily missed.  Failures in setting the RTC (aka "hwclock") are only another indirect symptom of the issue and would be expected.

 

 

Link to comment
Share on other sites

I am still fine here after one day of running:

 

The hardware clock is now not in sync with the system time after 24 hours:

 

root@ICS-Pine64:~# date
Sat Apr 20 06:44:53 CDT 2019
root@ICS-Pine64:~# hwclock
2019-04-20 06:44:57.138


 

root@ICS-Pine64:~# ./test_timer
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
ok 2 native counter reads are monotonic # 0 errors
# min: 8, avg: 8, max: 4201
ok 3 Linux counter reads are monotonic # 0 errors
# min: 541, avg: 560, max: 94959
# core 0: counter value: 2816701110464 => 117362 sec
# core 0: offsets: back-to-back: 9, b-t-b synced: 8, b-t-b w/ delay: 10
# core 1: counter value: 2816701117040 => 117362 sec
# core 1: offsets: back-to-back: 10, b-t-b synced: 9, b-t-b w/ delay: 11
# core 2: counter value: 2816701118687 => 117362 sec
# core 2: offsets: back-to-back: 10, b-t-b synced: 9, b-t-b w/ delay: 10
# core 3: counter value: 2816701120142 => 117362 sec
# core 3: offsets: back-to-back: 9, b-t-b synced: 9, b-t-b w/ delay: 11
1..3

Personally with these little issues I would not utilize the Pine64 as an NTP server with PPS.  That is me though.

 

Currently running an NTP server here using BSD on an Intel CPU and it is working great.

 

Sure-GPS-animation-2.gif

ntpq> sysinfo
associd=0 status=041d leap_none, sync_uhf_radio, 1 event, kern,
system peer:        GPS_NMEA(0)
system peer mode:   client
leap indicator:     00
stratum:            1
log2 precision:     -24
root delay:         0.000
root dispersion:    1.150
reference ID:       GPS
reference time:     e0658a99.f937808b  Sat, Apr 20 2019  7:10:01.973
system jitter:      0.001422
clock jitter:       0.001
clock wander:       0.005
broadcast delay:    -50.000
symm. auth. delay:  0.000

 

Off of the main topic....

 

Doing an audible hourly chime here using SAPI or Alexa plus 15 touchscreens displaying time and they are in sync just fine.

Pull down NOAA weather maps on a cron schedule (when satellite passes over head) which is really time specific. 

 

Collect antique clocks and my old regulator pendulum clock is in sync just fine with NTP server (a few other clocks from the 1800's are totally off though).

 

Same with the use of geophones here fun stuff.

 

Always been in to NTP / PPS satellite time sync here (since 1990's) for use with a aeroline flight vectoring program used around the world and precise measurements of incoming aeroplanes (Unix software).

 

 

 

 

Link to comment
Share on other sites

10 minutes ago, Petee said:

I am still fine here after one day of running:

 

The hardware clock is now not in sync with the system time after 24 hours:

 

root@ICS-Pine64:~# date
Sat Apr 20 06:44:53 CDT 2019
root@ICS-Pine64:~# hwclock
2019-04-20 06:44:57.138


 

Personally with these little issues I would not utilize the Pine64 as an NTP server with PPS.  That is me though.

 

Currently running an NTP server here using BSD on an Intel CPU and it is working great.

 

 

 

If your hardware clock is not being synced from your system clock then something is broken.  Runing 'hwclock -s' after systemd-timesyncd or ntpd has started is one of those things that will cause such breakage but there can be other reasons.  Having an RTC isn't much value if it won't be kept decently accurate.  Most RTCs are intended to be periodically updated by the host OS as they utilize low-cost and not very-high-accuracy components and design.

 

As for using a Pine64 as an NTP server - it actually works very well.  I've had this pine64 doing exactly that with KPPS with 4.14.y and earlier kernels with great success, and with a total power utilization of less than 3.5W.   There is something with 4.19.y and later kernels that is not right, and trying to work with NTP is just one use case that is exposing this.

 

This is perhaps not a "little issue" - again it is not itself a time/date problem but rather a problem when reading a  hardware counter (one of many) in the SoC.  That counter is used by the kernel for core functionality, of which tracking system time is just one use.  If some are experiencing an issue while others are not, using the same kernel, then there is something about the difference in workloads that is exposing an un-mitigated issue that can be triggered at any time.

 

Link to comment
Share on other sites

Thank you WindySea.

 

If your hardware clock is not being synced from your system clock then something is broken.  Runing 'hwclock -s' after systemd-timesyncd or ntpd has started is one of those things that will cause such breakage but there can be other reasons.  Having an RTC isn't much value if it won't be kept decently accurate.  Most RTCs are intended to be periodically updated by the host OS as they utilize low-cost and not very-high-accuracy components and design. 

 

created a once a day cron job to sync the system time to the hardware time. 

 

root@ICS-Pine64:~# date
Sat Apr 20 10:05:00 CDT 2019
root@ICS-Pine64:~# hwclock
2019-04-20 10:05:06.192985-0500

 

As for using a Pine64 as an NTP server - it actually works very well.  I've had this pine64 doing exactly that with KPPS with 4.14.y and earlier kernels with great success, and with a total power utilization of less than 3.5W.   There is something with 4.19.y and later kernels that is not right, and trying to work with NTP is just one use case that is exposing this.

 

Understood.  Is your PPS off now with the newer kernels?  What pin on the Pine64 are you using for PPS?

 

Here have added an NTP/PPS GPS serial connection to the PFSense box.  Historically the NTP server here was an autonomous device. 

 

 

NTP.thumb.jpg.dc59688c682d400b2391a2856c473edf.jpg

 

Started the whole NTP server endeavor with GPS antenna mounted on the roof of a two story home in the 1990's using a Trimble GPS module (from a tank).  Then with the SURE GPS moved the antenna to the attic of the two story home.  Now have the antenna next to a glass block window in the basement of the two story home and it works great seeing 9 satellites just fine.  Syncs in about 30 seconds these days. 

 

In the automobiles with GPS doing similiar to sync time and position with computer connected to the HU (BMW).   The car computer connects to the ODB2 and bus ports (I can DIY almost drive them remotely these days).

 

The PFSense box today has two WAN ports / ISP connections and 6 LAN ports today. 

 

It's too big and uses too much power today running on a low powered iSeries CPU.  I would like to take this to an ARM based smaller footprint computer.

I am test running OpenWRT on a NEXX microrouter which is about 1/2" tall and 1" X 2" sized.  I have added an RTC/Battery to this device (tiny thing).

 

This is perhaps not a "little issue" - again it is not itself a time/date problem but rather a problem when reading a  hardware counter (one of many) in the SoC.  That counter is used by the kernel for core functionality, of which tracking system time is just one use.  If some are experiencing an issue while others are not, using the same kernel, then there is something about the difference in workloads that is exposing an un-mitigated issue that can be triggered at any time.

 

Here time will tell.  I am currently over 24 hours using the nightly build.  That said the "time" issue would crop up days after cold boot up.  And connecting the running OS with HA automation software and combo Alarm automation (OmniPro 2) wrecked havoc here with scheduled events.  That and might turn the Pine64 in to a weather computer here for connection to my Davis weather station and NOAA downloading stuff and move this microserver over to a TV box.

 

Link to comment
Share on other sites

2 hours ago, Petee said:

 

created a once a day cron job to sync the system time to the hardware time. 

[ - snip - ]

Understood.  Is your PPS off now with the newer kernels?  What pin on the Pine64 are you using for PPS?

 

 

You really shouldn't need to set a cron job to sync the system time back to the hardware clock since this is already a builtin kernel feature.  It just appears that feature was disabled (broken) by something.

Spoiler

user@pine64:~$ sudo hwclock --compare
hw-time      system-time         freq-offset-ppm   tick
1555786702   1555786700.608212
1555786712   1555786710.609443               123      1
1555786722   1555786720.610197                99      1
1555786732   1555786730.610937                91      1
1555786742   1555786740.611641                86      1
1555786752   1555786750.612358                83      1
1555786762   1555786760.613084                81      1
^C

user@pine64:~$ sudo hwclock && date "+%F %T.%N"
2019-04-20 15:04:32.793358-0400
2019-04-20 15:04:32.615001945

 

 

When ntpd or systemd-timesyncd start they will set a flag in the kernel to enable this feature:  ntpd will do so after it obtains sync but systemd-timesyncd does so almost immediately.  Running 'hwclock -s' after this will in turn disable this feature again.  After some time ntpd may discover this (if the kernel tells it) and try to reset the flag, but systemd-timesyncd will not.

 

When using systemd-timesynd (the standard default out-of-the-box) you should never use hwclock -s.  It only causes bad things to happen, and is why /etc/init.d/hwclock.sh and its many incantations explicitly check for running systemd and will exit without taking any action if so.  Before systemd that standard script would run before ntpd and all was good.  Then came systemd and that caused breakage so the extra checks were put in.  There are many discussions on this elsewhere so there's no need to discuss further here:  The summary is if you are using systemd do not run 'hwclock -s'.  Ever.

 

When using ntpd (and after systemd-timesyncd has been appropriately disabled/removed), 'hwclock -s' may be run once but only when it may be assured that this happens before ntpd is started.  That isn't as simple as it sounds with systemd either :( . This is also discussed in many other forums so no need to rehash that here.  I do have a proper implementation for a raspberry pi with an I2C-based RTC that requires this - it is doable but it is a bit more complex than simply putting a one-liner in /etc/rc.local

 

 

In short, all current armbian kernels for platforms with an on-board RTC, including the PineA64(+), should have the proper configuration for the kernel to read the RTC and set the system time during its early startup, as is standard.  This removes virtually any need to ever use 'hwclock -s' manually or during boot.  Whether using systemd-timesyncd or ntpd the system time will already be correct, the appropriate daemon will begin synchronizing the system time from a reliable source, and will enable the kernel to periodically update the hardware clock from the system clock on its own as well (every 11 minutes IIRC).  There should be no need to manually sync time in either direction.

 

 

I am using pin PH9 for PPS on a pine64 since that is one that is interrupt-capable (not all are):

 

user@pine64:~$ egrep pps_pin /boot/armbianEnv.txt
param_pps_pin=PH9

Using a GPS as a reference clock without PPS is not very useful unless it is the only source of time.  There is significant delay that must be accurately compensated for ("fudge time2" with the .20 NMEA GPS driver), plus there will always be quite a bit of jitter especially if the GPS output isn't reduced to the single sentence desired.  I've found that typically the offsets are close to 1/2 second even with 115Kbaud.

 

Using kernel ("hard") PPS can provide much lower jitter than NTP ("soft") PPS, but requires a custom kernel be built with this capability enabled.  That in turn requires the kernel be configured at compile-time as non non-tickless since the two configurations are not currently compatible.  A standard kernel may also be configured at boot time to run non-tickless by adding 'nohz=off' to the kernel command line arguments.  Running non-tickless can help reduce jitter but will increase power usage (slightly)

 

For now I am using NTP PPS and attempting to use non-tickless on the pine64.   This gets good results on its own:

Spoiler

user@pine64:~$ /usr/local/sbin/ntptime && ntpq -c rv
ntp_gettime() returns code 0 (OK)
  time e065e417.a321a2e8  Sat, Apr 20 2019 14:31:51.637, (.637232256),
  maximum error 2000 us, estimated error 3 us, TAI offset 37
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 0.787 us, frequency 27.737 ppm, interval 1 s,
  maximum error 2000 us, estimated error 3 us,
  status 0x2001 (PLL,NANO),
  time constant 5, precision 1.000 us, tolerance 500 ppm,
associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync,
version="ntpd 4.2.8p13@1.3847-o Tue Apr 16 23:50:07 UTC 2019 (1)",
processor="aarch64", system="Linux/5.0.7-sunxi64", leap=00, stratum=1,
precision=-20, rootdelay=0.000, rootdisp=1.030, refid=PPS,
reftime=e065e415.391ac159  Sat, Apr 20 2019 14:31:49.223,
clock=e065e417.a556d15a  Sat, Apr 20 2019 14:31:51.645, peer=40755, tc=5,
mintc=3, offset=0.000800, frequency=27.737, sys_jitter=0.002519,
clk_jitter=0.003, clk_wander=0.000, tai=37, leapsec=201701010000,
expire=201912280000

 

 

I had spent some time dialing-in the offset for the PPS, since there is more overhead with NTP discipline over kernel discipline.

 

 

The only issue is that I am hitting what appears to be the known timer issue on topic for this thread so that is where I want to try to look next.  With 4.14 (or earlier) kernels this is not an issue and the same mitigation appears to work well there but does not seem so complete with 4.19 and later.

 

 

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines