windysea

Members
  • Content Count

    41
  • Joined

About windysea

  • Rank
    Advanced Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. An ntp stratum is not related to accuracy nor precision - it is simply an indication of how many "hops" a given NTP server is from a reference clock. A stratum-0 is a reference clock (IE: atomic clock, GPS receiver, etc). A stratum-1 is the NTP server directly using that reference clock for time synchronization. An SBC with a serial GPS indeed can be a stratum-1 (the GPS would be stratum-0), and there are many public postings on doing this. In fact the NTPsec team is doing "research" on this topic and has published documentation regarding this. The nature of the reference implementation of NTPd is specifically to maintain accurate time regardless of any hardware timers. Today's "50-cent" parts are still more stable than those orders-of-magnitude more expensive decades-ago when NTPd was first developed. Google's NTP project may use their own "atomic clocks", but their public NTP servers tend to be on the poor end with respect to jitter. They're intended to be "close-enough", stable, and highly-available. They are not intended to be highly accurate. Their public NTP servers, for instance, implement leap-smearing rather than advertise a leap-second (when appropriate). For this reason Google strongly recommends not mixing their public NTP servers in a configuration with other NTP sources (bad things can happen, and in fact have happened in the past). Google's NTP servers also are behind anycast load-balancers. While this improves availability and end-device configuration simplicity it actually degrades performance. In my own testing google's ntp servers typically have higher jitter than most of the larger NTP pool project pools, the latter of which are already commonly used as defaults in many OS distributions. Configuring and building a non-tickless kernel is required in order to enable kernel-pps (aka "hard pps"), which typically has far less jitter than "soft pps". However, doing so even with the latest 5.1.y (DEV) kernels results in an unstable platform where the issue noted in this thread will manifest fairly frequently. It may just be that A64-based SBCs are not suitable to host NTP reference clocks and stratum-1 NTP servers but earlier kernels did not seem to have this issue so it may just be that a previous mitigation got lost along the way.
  2. Yes that fix should already be part of the existing kernels, however there are actually several timer-related issues on the A64 and it is not clear that they are all addressed even upstream. Even with all of the fixes that have been published I still see the issue recur occasionally. There is a thread in the Development section where this is (or was) being discussed.
  3. Thanks! I removed the patch from my local 'userpatches' and built a new -dev kernel using the default kernel config (no changes) and no local patches or changes. The kernel booted with expected network connectivity and all looks good. I'll verify tomorrow with the nightly builds as well.
  4. @martinayotte - I was researching this one since I experienced the same when updating from 5.0.y to 5.1.y. It looks like you had found similar and posted a query/comment to the upstream repo at https://github.com/megous/linux/commit/c797ad894874de33b2be4beeb1f5a1e6339a888d#comments I had found the same as you - the drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c driver was missing a change made specifically in megous' branch for 5.0 but not carried to 5.1. Did you intend to include this as a local patch or should I submit a pull request for a new patch to do this in the armbian tree? It looks like 0003-net-stmmac-dwmac-sun8i-support-RGMII-modes-with-PHY-.patch from patch/kernel/sunxi-next should do it. . .
  5. Beware that 'b' will literally trigger an immediate reboot without even attempting to sync the disks. That is possible, and even likely in some cases, to cause corruption. For the OP case this would not be an issue since it looks like the root filesystem is already trashed, though perhaps there may have been others? A sync may be triggered similarly, however, just before the reboot: root@host# echo s > /proc/sysirq-trigger root@host# echo b > /proc/sysirq-trigger The 's' can also be handy when /bin/sync is also missing, to help prevent corruption on additional filesystems that may be mounted when faced with forced poweroff/reboot. For the OP: If your disk is already overwritten then simply removing and re-applying power should not be an issue, unless you don't have physical access. If /bin/systemctl exists (it sounds like it may not here), you can try 'systemctl reboot --force'. That is the "modern" systemd replacement for 'reboot -f' Otherwise the step noted above using /proc/sysirq-trigger would be the next option, followed by just pulling the power.
  6. The fixes to date have all been community-provided based on observations and testing (IE: not based on anything from the SoC vendor). Allwinner has been silent on this specific issue, which actually isn't really surprising I did find a different fix from Allwinner for a different timer issue, which occurs when trying to write (rather than read) one of the affected timer registers. That fix is not part of the mainline (or armbian) kernels but does not immediately appear to be directly related, but that is part of what I hope to be able to look into.
  7. The mainline patch is almost identical to the previous patch used in alternate forks, including those used by Armbian. The mainline patch has only one change (other than the Kconfig symbol and matching macro): one less low-order bit is masked during the test for the condition when the problem has been seen to occur, which actually makes the mainline patch provide more "coverage" than the previous. The previous patch that had been applied to 4.19.y and 5.0.y kernels has similar behavior but was indeed working with 4.14.y and earlier, so there may be something else that has changed. I have not had the opportunity to continue looking into this just yet. It is also possible that the original fix and the subsequent mainline fix were not complete, which is actually noted in the comments for the various commits. Disabling a tickless kernel, for instance, seems to exacerbate the issue significantly though on the surface this would not be expected.
  8. You really shouldn't need to set a cron job to sync the system time back to the hardware clock since this is already a builtin kernel feature. It just appears that feature was disabled (broken) by something. When ntpd or systemd-timesyncd start they will set a flag in the kernel to enable this feature: ntpd will do so after it obtains sync but systemd-timesyncd does so almost immediately. Running 'hwclock -s' after this will in turn disable this feature again. After some time ntpd may discover this (if the kernel tells it) and try to reset the flag, but systemd-timesyncd will not. When using systemd-timesynd (the standard default out-of-the-box) you should never use hwclock -s. It only causes bad things to happen, and is why /etc/init.d/hwclock.sh and its many incantations explicitly check for running systemd and will exit without taking any action if so. Before systemd that standard script would run before ntpd and all was good. Then came systemd and that caused breakage so the extra checks were put in. There are many discussions on this elsewhere so there's no need to discuss further here: The summary is if you are using systemd do not run 'hwclock -s'. Ever. When using ntpd (and after systemd-timesyncd has been appropriately disabled/removed), 'hwclock -s' may be run once but only when it may be assured that this happens before ntpd is started. That isn't as simple as it sounds with systemd either . This is also discussed in many other forums so no need to rehash that here. I do have a proper implementation for a raspberry pi with an I2C-based RTC that requires this - it is doable but it is a bit more complex than simply putting a one-liner in /etc/rc.local In short, all current armbian kernels for platforms with an on-board RTC, including the PineA64(+), should have the proper configuration for the kernel to read the RTC and set the system time during its early startup, as is standard. This removes virtually any need to ever use 'hwclock -s' manually or during boot. Whether using systemd-timesyncd or ntpd the system time will already be correct, the appropriate daemon will begin synchronizing the system time from a reliable source, and will enable the kernel to periodically update the hardware clock from the system clock on its own as well (every 11 minutes IIRC). There should be no need to manually sync time in either direction. I am using pin PH9 for PPS on a pine64 since that is one that is interrupt-capable (not all are): user@pine64:~$ egrep pps_pin /boot/armbianEnv.txt param_pps_pin=PH9 Using a GPS as a reference clock without PPS is not very useful unless it is the only source of time. There is significant delay that must be accurately compensated for ("fudge time2" with the .20 NMEA GPS driver), plus there will always be quite a bit of jitter especially if the GPS output isn't reduced to the single sentence desired. I've found that typically the offsets are close to 1/2 second even with 115Kbaud. Using kernel ("hard") PPS can provide much lower jitter than NTP ("soft") PPS, but requires a custom kernel be built with this capability enabled. That in turn requires the kernel be configured at compile-time as non non-tickless since the two configurations are not currently compatible. A standard kernel may also be configured at boot time to run non-tickless by adding 'nohz=off' to the kernel command line arguments. Running non-tickless can help reduce jitter but will increase power usage (slightly) For now I am using NTP PPS and attempting to use non-tickless on the pine64. This gets good results on its own: I had spent some time dialing-in the offset for the PPS, since there is more overhead with NTP discipline over kernel discipline. The only issue is that I am hitting what appears to be the known timer issue on topic for this thread so that is where I want to try to look next. With 4.14 (or earlier) kernels this is not an issue and the same mitigation appears to work well there but does not seem so complete with 4.19 and later.
  9. If your hardware clock is not being synced from your system clock then something is broken. Runing 'hwclock -s' after systemd-timesyncd or ntpd has started is one of those things that will cause such breakage but there can be other reasons. Having an RTC isn't much value if it won't be kept decently accurate. Most RTCs are intended to be periodically updated by the host OS as they utilize low-cost and not very-high-accuracy components and design. As for using a Pine64 as an NTP server - it actually works very well. I've had this pine64 doing exactly that with KPPS with 4.14.y and earlier kernels with great success, and with a total power utilization of less than 3.5W. There is something with 4.19.y and later kernels that is not right, and trying to work with NTP is just one use case that is exposing this. This is perhaps not a "little issue" - again it is not itself a time/date problem but rather a problem when reading a hardware counter (one of many) in the SoC. That counter is used by the kernel for core functionality, of which tracking system time is just one use. If some are experiencing an issue while others are not, using the same kernel, then there is something about the difference in workloads that is exposing an un-mitigated issue that can be triggered at any time.
  10. I should probably put in an update. . . . After running stable for over a week I tried to disable a tickless kernel (standard modern default) to reduce jitter: user@pine64:~$ egrep nohz /boot/armbianEnv.txt extraargs=nohz=off user@pine64:~$ cat /proc/cmdline root=UUID=c80325d7-1d25-4151-85c3-47343b473fae rootwait rootfstype=ext4 console=ttyS0,115200 console=tty1 panic=10 consoleblank=0 loglevel=7 ubootpart=0f940383-01 usb-storage.quirks=0x2537:0x1066:u,0x2537:0x1068:u nohz=off cgroup_enable=memory swapaccount=1 Sadly this once again brought about instability. With a non-tickless kernel I start to see rcu_sched self-detected stalls such as I noted in DEV (5.0.y) on PineA64+ seeing occasional CPU stalls. At some point the system clock will jump by 95 years and a reboot ends up being required, potentially forcibly (bad things start to happen) Interestingly, with a non-tickless kernel Andy's test_timer shows considerably different results, confirming there is indeed still a problem despite the workaround being present and active: Unfortunately I likely won't have any time to look at this until after the weekend. I'm still working towards getting the Pine64 back as a stable stratum-1 NTP server with a GPS-based reference clock. With Kernel PPS (requires non-tickless kernel) and proper offset adjustments this was working very well on 4.14.y (and earlier). Oh - and to add to how the RTC got brought into this discussion, though it is not really related: The kernel has a capability to keep the RTC in sync with its system time, primarily for use with NTP or other external service that will discipline the system time. When the symptom of the issue here manifests with the date jumping by a multiple of 95 years, the system time is no longer within the valid range for the RTC so any further attempts to synchronize the RTC from the system time result in failure: [28410.996865] sun6i-rtc 1f00000.rtc: rtc only supports year in range 1970 - 2033 The above message will only appear in the live kernel message buffer (aka 'dmesg') and on the system console, so it can be easily missed. Failures in setting the RTC (aka "hwclock") are only another indirect symptom of the issue and would be expected.
  11. Just for completeness and closure PR #1329 implemented this for 'sunxi' and 'sunxi64' in both NEXT and DEV. Other platforms already have this feature configured. There should be no longer be any need for additional automated or otherwise scripted enhancements to do a separate 'hwclock -s' after boot as the kernel will do this itself very early, where it should be done when possible.
  12. Again - the RTC and the timer in question are two separate entities and are unrelated. The issue here is not with the RTC but with a completely different counter in the SoC. The mainline fix for the timer issue here was formally added to the mainline in the 5.1.y branch for the A64. It has been pulled back to the 5.0.y and 4.19.y branches in various parts between the upstream mainline fork by megous used by Armbian as well as Armbian patches applied on top of that branch. The nightly builds and the packaged kernel updates have had these for a short while, and to see if the workaround from these has been applied and is active you can use: user@pine64:~$ dmesg | egrep -i errat [ 0.000000] arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1 The 'UNKNOWN1' erratum is the issue in question. There had been a previous fix that had been around for a long time and was part of what eventually became the formal mainline fix. It was the period between these two where this issue had reappeared but should be resolved now. Those are separate from enabling RTC use by the kernel during boot to set the system clock which is a different and independent timer. It should be noted that trying to use 'hwclock -s' after another active process (such as ntpd, chrony, or systemd-timesyncd) has initiated synchronization to the RTC from the system clock can cause issues. NTPd can normally recover but the others perhaps not so much.
  13. Yes I do have an RTC backup battery attached, but that really only applies when power is physically removed from the board. With just a 'reboot' the RTC remains powered. This particular timer issue also is not related to the RTC - it is a problem with a hardware timer (really just a counter) that is part of the SoC itself. When the symptoms manifest, the kernel (aka "system") time will jump by a multiple of 95 years and after that the kernel is no longer able to sync back to the hardware RTC which is independent. I had also submitted PR #1329 following Kernel config: Change RTC_HCTOSYS as default? which has since been merged. As a result there should be no need to otherwise explicitly use 'hwclock -s' to set the system clock from the RTC anymore. Instead during boot a message should be seen, before even init is spawned indicating the system clock has been set from the RTC: user@pine64:~$ dmesg | grep -w rtc [ 1.975054] sun6i-rtc 1f00000.rtc: registered as rtc0 [ 1.975070] sun6i-rtc 1f00000.rtc: RTC enabled [ 2.031287] rtc-ldo: supplied by regulator-dummy [ 4.079694] sun6i-rtc 1f00000.rtc: setting system clock to 2019-04-19T01:29:14 UTC (1555637354)
  14. Interestingly I've never had a successful run with Andre's test_timer: This board too is a 2GB PineA64+ from the initial Kickstarter campaign. Using the latest DEV pre-built kernel via 'apt upgrade', with the UNKNOWN1 erratum active: I haven't seen any of the known symptoms from this issue (IE: date jump by multiple of 95 years) as long as the erratum workaround is active, but the test_timer has never passed fully. For now I've chosen to leave this aside since I'm not seeing any other symptoms.
  15. The PineA64+ is actually based on an Allwinner A64. Yes it sounds like you will want a more recent kernel. This thread may be of some help: A64 date/time clock issue