Jump to content

windysea

Members
  • Posts

    42
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. An ntp stratum is not related to accuracy nor precision - it is simply an indication of how many "hops" a given NTP server is from a reference clock. A stratum-0 is a reference clock (IE: atomic clock, GPS receiver, etc). A stratum-1 is the NTP server directly using that reference clock for time synchronization. An SBC with a serial GPS indeed can be a stratum-1 (the GPS would be stratum-0), and there are many public postings on doing this. In fact the NTPsec team is doing "research" on this topic and has published documentation regarding this. The nature of the reference implementation of NTPd is specifically to maintain accurate time regardless of any hardware timers. Today's "50-cent" parts are still more stable than those orders-of-magnitude more expensive decades-ago when NTPd was first developed. Google's NTP project may use their own "atomic clocks", but their public NTP servers tend to be on the poor end with respect to jitter. They're intended to be "close-enough", stable, and highly-available. They are not intended to be highly accurate. Their public NTP servers, for instance, implement leap-smearing rather than advertise a leap-second (when appropriate). For this reason Google strongly recommends not mixing their public NTP servers in a configuration with other NTP sources (bad things can happen, and in fact have happened in the past). Google's NTP servers also are behind anycast load-balancers. While this improves availability and end-device configuration simplicity it actually degrades performance. In my own testing google's ntp servers typically have higher jitter than most of the larger NTP pool project pools, the latter of which are already commonly used as defaults in many OS distributions. Configuring and building a non-tickless kernel is required in order to enable kernel-pps (aka "hard pps"), which typically has far less jitter than "soft pps". However, doing so even with the latest 5.1.y (DEV) kernels results in an unstable platform where the issue noted in this thread will manifest fairly frequently. It may just be that A64-based SBCs are not suitable to host NTP reference clocks and stratum-1 NTP servers but earlier kernels did not seem to have this issue so it may just be that a previous mitigation got lost along the way.
  2. Beware that 'b' will literally trigger an immediate reboot without even attempting to sync the disks. That is possible, and even likely in some cases, to cause corruption. For the OP case this would not be an issue since it looks like the root filesystem is already trashed, though perhaps there may have been others? A sync may be triggered similarly, however, just before the reboot: root@host# echo s > /proc/sysirq-trigger root@host# echo b > /proc/sysirq-trigger The 's' can also be handy when /bin/sync is also missing, to help prevent corruption on additional filesystems that may be mounted when faced with forced poweroff/reboot. For the OP: If your disk is already overwritten then simply removing and re-applying power should not be an issue, unless you don't have physical access. If /bin/systemctl exists (it sounds like it may not here), you can try 'systemctl reboot --force'. That is the "modern" systemd replacement for 'reboot -f' Otherwise the step noted above using /proc/sysirq-trigger would be the next option, followed by just pulling the power.
  3. The fixes to date have all been community-provided based on observations and testing (IE: not based on anything from the SoC vendor). Allwinner has been silent on this specific issue, which actually isn't really surprising I did find a different fix from Allwinner for a different timer issue, which occurs when trying to write (rather than read) one of the affected timer registers. That fix is not part of the mainline (or armbian) kernels but does not immediately appear to be directly related, but that is part of what I hope to be able to look into.
  4. The mainline patch is almost identical to the previous patch used in alternate forks, including those used by Armbian. The mainline patch has only one change (other than the Kconfig symbol and matching macro): one less low-order bit is masked during the test for the condition when the problem has been seen to occur, which actually makes the mainline patch provide more "coverage" than the previous. The previous patch that had been applied to 4.19.y and 5.0.y kernels has similar behavior but was indeed working with 4.14.y and earlier, so there may be something else that has changed. I have not had the opportunity to continue looking into this just yet. It is also possible that the original fix and the subsequent mainline fix were not complete, which is actually noted in the comments for the various commits. Disabling a tickless kernel, for instance, seems to exacerbate the issue significantly though on the surface this would not be expected.
  5. You really shouldn't need to set a cron job to sync the system time back to the hardware clock since this is already a builtin kernel feature. It just appears that feature was disabled (broken) by something. When ntpd or systemd-timesyncd start they will set a flag in the kernel to enable this feature: ntpd will do so after it obtains sync but systemd-timesyncd does so almost immediately. Running 'hwclock -s' after this will in turn disable this feature again. After some time ntpd may discover this (if the kernel tells it) and try to reset the flag, but systemd-timesyncd will not. When using systemd-timesynd (the standard default out-of-the-box) you should never use hwclock -s. It only causes bad things to happen, and is why /etc/init.d/hwclock.sh and its many incantations explicitly check for running systemd and will exit without taking any action if so. Before systemd that standard script would run before ntpd and all was good. Then came systemd and that caused breakage so the extra checks were put in. There are many discussions on this elsewhere so there's no need to discuss further here: The summary is if you are using systemd do not run 'hwclock -s'. Ever. When using ntpd (and after systemd-timesyncd has been appropriately disabled/removed), 'hwclock -s' may be run once but only when it may be assured that this happens before ntpd is started. That isn't as simple as it sounds with systemd either . This is also discussed in many other forums so no need to rehash that here. I do have a proper implementation for a raspberry pi with an I2C-based RTC that requires this - it is doable but it is a bit more complex than simply putting a one-liner in /etc/rc.local In short, all current armbian kernels for platforms with an on-board RTC, including the PineA64(+), should have the proper configuration for the kernel to read the RTC and set the system time during its early startup, as is standard. This removes virtually any need to ever use 'hwclock -s' manually or during boot. Whether using systemd-timesyncd or ntpd the system time will already be correct, the appropriate daemon will begin synchronizing the system time from a reliable source, and will enable the kernel to periodically update the hardware clock from the system clock on its own as well (every 11 minutes IIRC). There should be no need to manually sync time in either direction. I am using pin PH9 for PPS on a pine64 since that is one that is interrupt-capable (not all are): user@pine64:~$ egrep pps_pin /boot/armbianEnv.txt param_pps_pin=PH9 Using a GPS as a reference clock without PPS is not very useful unless it is the only source of time. There is significant delay that must be accurately compensated for ("fudge time2" with the .20 NMEA GPS driver), plus there will always be quite a bit of jitter especially if the GPS output isn't reduced to the single sentence desired. I've found that typically the offsets are close to 1/2 second even with 115Kbaud. Using kernel ("hard") PPS can provide much lower jitter than NTP ("soft") PPS, but requires a custom kernel be built with this capability enabled. That in turn requires the kernel be configured at compile-time as non non-tickless since the two configurations are not currently compatible. A standard kernel may also be configured at boot time to run non-tickless by adding 'nohz=off' to the kernel command line arguments. Running non-tickless can help reduce jitter but will increase power usage (slightly) For now I am using NTP PPS and attempting to use non-tickless on the pine64. This gets good results on its own: I had spent some time dialing-in the offset for the PPS, since there is more overhead with NTP discipline over kernel discipline. The only issue is that I am hitting what appears to be the known timer issue on topic for this thread so that is where I want to try to look next. With 4.14 (or earlier) kernels this is not an issue and the same mitigation appears to work well there but does not seem so complete with 4.19 and later.
  6. If your hardware clock is not being synced from your system clock then something is broken. Runing 'hwclock -s' after systemd-timesyncd or ntpd has started is one of those things that will cause such breakage but there can be other reasons. Having an RTC isn't much value if it won't be kept decently accurate. Most RTCs are intended to be periodically updated by the host OS as they utilize low-cost and not very-high-accuracy components and design. As for using a Pine64 as an NTP server - it actually works very well. I've had this pine64 doing exactly that with KPPS with 4.14.y and earlier kernels with great success, and with a total power utilization of less than 3.5W. There is something with 4.19.y and later kernels that is not right, and trying to work with NTP is just one use case that is exposing this. This is perhaps not a "little issue" - again it is not itself a time/date problem but rather a problem when reading a hardware counter (one of many) in the SoC. That counter is used by the kernel for core functionality, of which tracking system time is just one use. If some are experiencing an issue while others are not, using the same kernel, then there is something about the difference in workloads that is exposing an un-mitigated issue that can be triggered at any time.
  7. I should probably put in an update. . . . After running stable for over a week I tried to disable a tickless kernel (standard modern default) to reduce jitter: user@pine64:~$ egrep nohz /boot/armbianEnv.txt extraargs=nohz=off user@pine64:~$ cat /proc/cmdline root=UUID=c80325d7-1d25-4151-85c3-47343b473fae rootwait rootfstype=ext4 console=ttyS0,115200 console=tty1 panic=10 consoleblank=0 loglevel=7 ubootpart=0f940383-01 usb-storage.quirks=0x2537:0x1066:u,0x2537:0x1068:u nohz=off cgroup_enable=memory swapaccount=1 Sadly this once again brought about instability. With a non-tickless kernel I start to see rcu_sched self-detected stalls such as I noted in DEV (5.0.y) on PineA64+ seeing occasional CPU stalls. At some point the system clock will jump by 95 years and a reboot ends up being required, potentially forcibly (bad things start to happen) Interestingly, with a non-tickless kernel Andy's test_timer shows considerably different results, confirming there is indeed still a problem despite the workaround being present and active: Unfortunately I likely won't have any time to look at this until after the weekend. I'm still working towards getting the Pine64 back as a stable stratum-1 NTP server with a GPS-based reference clock. With Kernel PPS (requires non-tickless kernel) and proper offset adjustments this was working very well on 4.14.y (and earlier). Oh - and to add to how the RTC got brought into this discussion, though it is not really related: The kernel has a capability to keep the RTC in sync with its system time, primarily for use with NTP or other external service that will discipline the system time. When the symptom of the issue here manifests with the date jumping by a multiple of 95 years, the system time is no longer within the valid range for the RTC so any further attempts to synchronize the RTC from the system time result in failure: [28410.996865] sun6i-rtc 1f00000.rtc: rtc only supports year in range 1970 - 2033 The above message will only appear in the live kernel message buffer (aka 'dmesg') and on the system console, so it can be easily missed. Failures in setting the RTC (aka "hwclock") are only another indirect symptom of the issue and would be expected.
  8. Just for completeness and closure PR #1329 implemented this for 'sunxi' and 'sunxi64' in both NEXT and DEV. Other platforms already have this feature configured. There should be no longer be any need for additional automated or otherwise scripted enhancements to do a separate 'hwclock -s' after boot as the kernel will do this itself very early, where it should be done when possible.
  9. Again - the RTC and the timer in question are two separate entities and are unrelated. The issue here is not with the RTC but with a completely different counter in the SoC. The mainline fix for the timer issue here was formally added to the mainline in the 5.1.y branch for the A64. It has been pulled back to the 5.0.y and 4.19.y branches in various parts between the upstream mainline fork by megous used by Armbian as well as Armbian patches applied on top of that branch. The nightly builds and the packaged kernel updates have had these for a short while, and to see if the workaround from these has been applied and is active you can use: user@pine64:~$ dmesg | egrep -i errat [ 0.000000] arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1 The 'UNKNOWN1' erratum is the issue in question. There had been a previous fix that had been around for a long time and was part of what eventually became the formal mainline fix. It was the period between these two where this issue had reappeared but should be resolved now. Those are separate from enabling RTC use by the kernel during boot to set the system clock which is a different and independent timer. It should be noted that trying to use 'hwclock -s' after another active process (such as ntpd, chrony, or systemd-timesyncd) has initiated synchronization to the RTC from the system clock can cause issues. NTPd can normally recover but the others perhaps not so much.
  10. Yes I do have an RTC backup battery attached, but that really only applies when power is physically removed from the board. With just a 'reboot' the RTC remains powered. This particular timer issue also is not related to the RTC - it is a problem with a hardware timer (really just a counter) that is part of the SoC itself. When the symptoms manifest, the kernel (aka "system") time will jump by a multiple of 95 years and after that the kernel is no longer able to sync back to the hardware RTC which is independent. I had also submitted PR #1329 following Kernel config: Change RTC_HCTOSYS as default? which has since been merged. As a result there should be no need to otherwise explicitly use 'hwclock -s' to set the system clock from the RTC anymore. Instead during boot a message should be seen, before even init is spawned indicating the system clock has been set from the RTC: user@pine64:~$ dmesg | grep -w rtc [ 1.975054] sun6i-rtc 1f00000.rtc: registered as rtc0 [ 1.975070] sun6i-rtc 1f00000.rtc: RTC enabled [ 2.031287] rtc-ldo: supplied by regulator-dummy [ 4.079694] sun6i-rtc 1f00000.rtc: setting system clock to 2019-04-19T01:29:14 UTC (1555637354)
  11. Interestingly I've never had a successful run with Andre's test_timer: This board too is a 2GB PineA64+ from the initial Kickstarter campaign. Using the latest DEV pre-built kernel via 'apt upgrade', with the UNKNOWN1 erratum active: I haven't seen any of the known symptoms from this issue (IE: date jump by multiple of 95 years) as long as the erratum workaround is active, but the test_timer has never passed fully. For now I've chosen to leave this aside since I'm not seeing any other symptoms.
  12. On a Pine A64+ I've actually started to see frequent 'rcu_sched self-detected stall on CPU' again with recent builds since the switch to the new toolchain, without any other changes. This includes both the armbian nightlies as well as my own custom builds. I haven't tried to go back to the previous toolchain yet, nor have I seen any problems with either the system clock or the hardware clock. I have noticed that two other (unrelated) A53 errata are no longer activated on boot even though they are still both configured and built into the kernel: 843419 and 845719. Again the only difference appears to be the toolchain used to build the kernel.
  13. Currently all of the Armbian kernel defconfigs include the actual build configuration statically as part of the kernel. The standard Armbian build process also includes a copy of the .config file in /boot/config-<version>-<platform>. Are both required, or should the kernel configurations be updated to make Kernel .config support an LKM instead of static bind? True this is a minor item. The actual kernel configuration takes up a very small amount of memory, though on many SBCs with limited memory to start with every byte not used by the kernel can help. This would amount to changing from: CONFIG_IKCONFIG=y to: CONFIG_IKCONFIG=m About the only thing that might "break" would be any tools that use 'extract-ikconfig' or similar against the on-disk binary to extract the built-in .config, but with the kernel configuration already included in the linux-image-*.deb and installed under /boot as a separate text file this should not be needed. I can submit a PR for -DEV if this sounds reasonable for all of the existing defconfigs in config/kernel but wanted feedback on whether this would be useful or whether this should be left as-is.
  14. In your dmesg output (or /var/log/messages) do you see any lines similar to: arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1 If not you may want to update to the latest nightly build. There was a commit that removed the activation for this fix and the following nightly build had this issue return. That has since been corrected and the latest DEV build once again has the mitigation to avoid this issue.
  15. Presuming this is not a brand-new install, correct? How are you able to edit /etc/shadow? If you are already logged in as root you can just use 'passwd' to change the passwd. When running as root you do not need to provide the existing password to change a password. If you can login as a user with 'sudo' access, you can use 'sudo -i' to gain a root shell then follow the step above. If you have mounted your SD card on another linux instance, you should be able to use: passwd -R <path_to_mount_point> root <path_to_mount_point> is where you have your SD card mounted (not the full path to the passwd/shadow files)
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines