windysea's Content - Armbian Community Forums

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

An ntp stratum is not related to accuracy nor precision - it is simply an indication of how many "hops" a given NTP server is from a reference clock. A stratum-0 is a reference clock (IE: atomic clock, GPS receiver, etc). A stratum-1 is the NTP server directly using that reference clock for time synchronization. An SBC with a serial GPS indeed can be a stratum-1 (the GPS would be stratum-0), and there are many public postings on doing this. In fact the NTPsec team is doing "research" on this topic and has published documentation regarding this. The nature of the reference implementation of NTPd is specifically to maintain accurate time regardless of any hardware timers. Today's "50-cent" parts are still more stable than those orders-of-magnitude more expensive decades-ago when NTPd was first developed. Google's NTP project may use their own "atomic clocks", but their public NTP servers tend to be on the poor end with respect to jitter. They're intended to be "close-enough", stable, and highly-available. They are not intended to be highly accurate. Their public NTP servers, for instance, implement leap-smearing rather than advertise a leap-second (when appropriate). For this reason Google strongly recommends not mixing their public NTP servers in a configuration with other NTP sources (bad things can happen, and in fact have happened in the past). Google's NTP servers also are behind anycast load-balancers. While this improves availability and end-device configuration simplicity it actually degrades performance. In my own testing google's ntp servers typically have higher jitter than most of the larger NTP pool project pools, the latter of which are already commonly used as defaults in many OS distributions. Configuring and building a non-tickless kernel is required in order to enable kernel-pps (aka "hard pps"), which typically has far less jitter than "soft pps". However, doing so even with the latest 5.1.y (DEV) kernels results in an unstable platform where the issue noted in this thread will manifest fairly frequently. It may just be that A64-based SBCs are not suitable to host NTP reference clocks and stratum-1 NTP servers but earlier kernels did not seem to have this issue so it may just be that a previous mitigation got lost along the way.

Reset board immediatly!!

windysea replied to Vindieselwalker's topic in Off-topic

Beware that 'b' will literally trigger an immediate reboot without even attempting to sync the disks. That is possible, and even likely in some cases, to cause corruption. For the OP case this would not be an issue since it looks like the root filesystem is already trashed, though perhaps there may have been others? A sync may be triggered similarly, however, just before the reboot: root@host# echo s > /proc/sysirq-trigger root@host# echo b > /proc/sysirq-trigger The 's' can also be handy when /bin/sync is also missing, to help prevent corruption on additional filesystems that may be mounted when faced with forced poweroff/reboot. For the OP: If your disk is already overwritten then simply removing and re-applying power should not be an issue, unless you don't have physical access. If /bin/systemctl exists (it sounds like it may not here), you can try 'systemctl reboot --force'. That is the "modern" systemd replacement for 'reboot -f' Otherwise the step noted above using /proc/sysirq-trigger would be the next option, followed by just pulling the power.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

The fixes to date have all been community-provided based on observations and testing (IE: not based on anything from the SoC vendor). Allwinner has been silent on this specific issue, which actually isn't really surprising I did find a different fix from Allwinner for a different timer issue, which occurs when trying to write (rather than read) one of the affected timer registers. That fix is not part of the mainline (or armbian) kernels but does not immediately appear to be directly related, but that is part of what I hope to be able to look into.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

The mainline patch is almost identical to the previous patch used in alternate forks, including those used by Armbian. The mainline patch has only one change (other than the Kconfig symbol and matching macro): one less low-order bit is masked during the test for the condition when the problem has been seen to occur, which actually makes the mainline patch provide more "coverage" than the previous. The previous patch that had been applied to 4.19.y and 5.0.y kernels has similar behavior but was indeed working with 4.14.y and earlier, so there may be something else that has changed. I have not had the opportunity to continue looking into this just yet. It is also possible that the original fix and the subsequent mainline fix were not complete, which is actually noted in the comments for the various commits. Disabling a tickless kernel, for instance, seems to exacerbate the issue significantly though on the surface this would not be expected.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

You really shouldn't need to set a cron job to sync the system time back to the hardware clock since this is already a builtin kernel feature. It just appears that feature was disabled (broken) by something. When ntpd or systemd-timesyncd start they will set a flag in the kernel to enable this feature: ntpd will do so after it obtains sync but systemd-timesyncd does so almost immediately. Running 'hwclock -s' after this will in turn disable this feature again. After some time ntpd may discover this (if the kernel tells it) and try to reset the flag, but systemd-timesyncd will not. When using systemd-timesynd (the standard default out-of-the-box) you should never use hwclock -s. It only causes bad things to happen, and is why /etc/init.d/hwclock.sh and its many incantations explicitly check for running systemd and will exit without taking any action if so. Before systemd that standard script would run before ntpd and all was good. Then came systemd and that caused breakage so the extra checks were put in. There are many discussions on this elsewhere so there's no need to discuss further here: The summary is if you are using systemd do not run 'hwclock -s'. Ever. When using ntpd (and after systemd-timesyncd has been appropriately disabled/removed), 'hwclock -s' may be run once but only when it may be assured that this happens before ntpd is started. That isn't as simple as it sounds with systemd either . This is also discussed in many other forums so no need to rehash that here. I do have a proper implementation for a raspberry pi with an I2C-based RTC that requires this - it is doable but it is a bit more complex than simply putting a one-liner in /etc/rc.local In short, all current armbian kernels for platforms with an on-board RTC, including the PineA64(+), should have the proper configuration for the kernel to read the RTC and set the system time during its early startup, as is standard. This removes virtually any need to ever use 'hwclock -s' manually or during boot. Whether using systemd-timesyncd or ntpd the system time will already be correct, the appropriate daemon will begin synchronizing the system time from a reliable source, and will enable the kernel to periodically update the hardware clock from the system clock on its own as well (every 11 minutes IIRC). There should be no need to manually sync time in either direction. I am using pin PH9 for PPS on a pine64 since that is one that is interrupt-capable (not all are): user@pine64:~$ egrep pps_pin /boot/armbianEnv.txt param_pps_pin=PH9 Using a GPS as a reference clock without PPS is not very useful unless it is the only source of time. There is significant delay that must be accurately compensated for ("fudge time2" with the .20 NMEA GPS driver), plus there will always be quite a bit of jitter especially if the GPS output isn't reduced to the single sentence desired. I've found that typically the offsets are close to 1/2 second even with 115Kbaud. Using kernel ("hard") PPS can provide much lower jitter than NTP ("soft") PPS, but requires a custom kernel be built with this capability enabled. That in turn requires the kernel be configured at compile-time as non non-tickless since the two configurations are not currently compatible. A standard kernel may also be configured at boot time to run non-tickless by adding 'nohz=off' to the kernel command line arguments. Running non-tickless can help reduce jitter but will increase power usage (slightly) For now I am using NTP PPS and attempting to use non-tickless on the pine64. This gets good results on its own: I had spent some time dialing-in the offset for the PPS, since there is more overhead with NTP discipline over kernel discipline. The only issue is that I am hitting what appears to be the known timer issue on topic for this thread so that is where I want to try to look next. With 4.14 (or earlier) kernels this is not an issue and the same mitigation appears to work well there but does not seem so complete with 4.19 and later.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

If your hardware clock is not being synced from your system clock then something is broken. Runing 'hwclock -s' after systemd-timesyncd or ntpd has started is one of those things that will cause such breakage but there can be other reasons. Having an RTC isn't much value if it won't be kept decently accurate. Most RTCs are intended to be periodically updated by the host OS as they utilize low-cost and not very-high-accuracy components and design. As for using a Pine64 as an NTP server - it actually works very well. I've had this pine64 doing exactly that with KPPS with 4.14.y and earlier kernels with great success, and with a total power utilization of less than 3.5W. There is something with 4.19.y and later kernels that is not right, and trying to work with NTP is just one use case that is exposing this. This is perhaps not a "little issue" - again it is not itself a time/date problem but rather a problem when reading a hardware counter (one of many) in the SoC. That counter is used by the kernel for core functionality, of which tracking system time is just one use. If some are experiencing an issue while others are not, using the same kernel, then there is something about the difference in workloads that is exposing an un-mitigated issue that can be triggered at any time.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

I should probably put in an update. . . . After running stable for over a week I tried to disable a tickless kernel (standard modern default) to reduce jitter: user@pine64:~$ egrep nohz /boot/armbianEnv.txt extraargs=nohz=off user@pine64:~$ cat /proc/cmdline root=UUID=c80325d7-1d25-4151-85c3-47343b473fae rootwait rootfstype=ext4 console=ttyS0,115200 console=tty1 panic=10 consoleblank=0 loglevel=7 ubootpart=0f940383-01 usb-storage.quirks=0x2537:0x1066:u,0x2537:0x1068:u nohz=off cgroup_enable=memory swapaccount=1 Sadly this once again brought about instability. With a non-tickless kernel I start to see rcu_sched self-detected stalls such as I noted in DEV (5.0.y) on PineA64+ seeing occasional CPU stalls. At some point the system clock will jump by 95 years and a reboot ends up being required, potentially forcibly (bad things start to happen) Interestingly, with a non-tickless kernel Andy's test_timer shows considerably different results, confirming there is indeed still a problem despite the workaround being present and active: Unfortunately I likely won't have any time to look at this until after the weekend. I'm still working towards getting the Pine64 back as a stable stratum-1 NTP server with a GPS-based reference clock. With Kernel PPS (requires non-tickless kernel) and proper offset adjustments this was working very well on 4.14.y (and earlier). Oh - and to add to how the RTC got brought into this discussion, though it is not really related: The kernel has a capability to keep the RTC in sync with its system time, primarily for use with NTP or other external service that will discipline the system time. When the symptom of the issue here manifests with the date jumping by a multiple of 95 years, the system time is no longer within the valid range for the RTC so any further attempts to synchronize the RTC from the system time result in failure: [28410.996865] sun6i-rtc 1f00000.rtc: rtc only supports year in range 1970 - 2033 The above message will only appear in the live kernel message buffer (aka 'dmesg') and on the system console, so it can be easily missed. Failures in setting the RTC (aka "hwclock") are only another indirect symptom of the issue and would be expected.

Research Kernel config: Change RTC_HCTOSYS as default?

windysea replied to windysea's topic in Advanced users - Development

Just for completeness and closure PR #1329 implemented this for 'sunxi' and 'sunxi64' in both NEXT and DEV. Other platforms already have this feature configured. There should be no longer be any need for additional automated or otherwise scripted enhancements to do a separate 'hwclock -s' after boot as the kernel will do this itself very early, where it should be done when possible.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

Again - the RTC and the timer in question are two separate entities and are unrelated. The issue here is not with the RTC but with a completely different counter in the SoC. The mainline fix for the timer issue here was formally added to the mainline in the 5.1.y branch for the A64. It has been pulled back to the 5.0.y and 4.19.y branches in various parts between the upstream mainline fork by megous used by Armbian as well as Armbian patches applied on top of that branch. The nightly builds and the packaged kernel updates have had these for a short while, and to see if the workaround from these has been applied and is active you can use: user@pine64:~$ dmesg | egrep -i errat [ 0.000000] arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1 The 'UNKNOWN1' erratum is the issue in question. There had been a previous fix that had been around for a long time and was part of what eventually became the formal mainline fix. It was the period between these two where this issue had reappeared but should be resolved now. Those are separate from enabling RTC use by the kernel during boot to set the system clock which is a different and independent timer. It should be noted that trying to use 'hwclock -s' after another active process (such as ntpd, chrony, or systemd-timesyncd) has initiated synchronization to the RTC from the system clock can cause issues. NTPd can normally recover but the others perhaps not so much.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

Yes I do have an RTC backup battery attached, but that really only applies when power is physically removed from the board. With just a 'reboot' the RTC remains powered. This particular timer issue also is not related to the RTC - it is a problem with a hardware timer (really just a counter) that is part of the SoC itself. When the symptoms manifest, the kernel (aka "system") time will jump by a multiple of 95 years and after that the kernel is no longer able to sync back to the hardware RTC which is independent. I had also submitted PR #1329 following Kernel config: Change RTC_HCTOSYS as default? which has since been merged. As a result there should be no need to otherwise explicitly use 'hwclock -s' to set the system clock from the RTC anymore. Instead during boot a message should be seen, before even init is spawned indicating the system clock has been set from the RTC: user@pine64:~$ dmesg | grep -w rtc [ 1.975054] sun6i-rtc 1f00000.rtc: registered as rtc0 [ 1.975070] sun6i-rtc 1f00000.rtc: RTC enabled [ 2.031287] rtc-ldo: supplied by regulator-dummy [ 4.079694] sun6i-rtc 1f00000.rtc: setting system clock to 2019-04-19T01:29:14 UTC (1555637354)

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

Interestingly I've never had a successful run with Andre's test_timer: This board too is a 2GB PineA64+ from the initial Kickstarter campaign. Using the latest DEV pre-built kernel via 'apt upgrade', with the UNKNOWN1 erratum active: I haven't seen any of the known symptoms from this issue (IE: date jump by multiple of 95 years) as long as the erratum workaround is active, but the test_timer has never passed fully. For now I've chosen to leave this aside since I'm not seeing any other symptoms.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

On a Pine A64+ I've actually started to see frequent 'rcu_sched self-detected stall on CPU' again with recent builds since the switch to the new toolchain, without any other changes. This includes both the armbian nightlies as well as my own custom builds. I haven't tried to go back to the previous toolchain yet, nor have I seen any problems with either the system clock or the hardware clock. I have noticed that two other (unrelated) A53 errata are no longer activated on boot even though they are still both configured and built into the kernel: 843419 and 845719. Again the only difference appears to be the toolchain used to build the kernel.

rfc Kernel .config support as LKM instead of static bind?

windysea posted a topic in Advanced users - Development

Currently all of the Armbian kernel defconfigs include the actual build configuration statically as part of the kernel. The standard Armbian build process also includes a copy of the .config file in /boot/config-<version>-<platform>. Are both required, or should the kernel configurations be updated to make Kernel .config support an LKM instead of static bind? True this is a minor item. The actual kernel configuration takes up a very small amount of memory, though on many SBCs with limited memory to start with every byte not used by the kernel can help. This would amount to changing from: CONFIG_IKCONFIG=y to: CONFIG_IKCONFIG=m About the only thing that might "break" would be any tools that use 'extract-ikconfig' or similar against the on-disk binary to extract the built-in .config, but with the kernel configuration already included in the linux-image-*.deb and installed under /boot as a separate text file this should not be needed. I can submit a PR for -DEV if this sounds reasonable for all of the existing defconfigs in config/kernel but wanted feedback on whether this would be useful or whether this should be left as-is.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

In your dmesg output (or /var/log/messages) do you see any lines similar to: arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1 If not you may want to update to the latest nightly build. There was a commit that removed the activation for this fix and the following nightly build had this issue return. That has since been corrected and the latest DEV build once again has the mitigation to avoid this issue.

Don't know the root password!!!

windysea replied to Vindieselwalker's topic in Beginners

Presuming this is not a brand-new install, correct? How are you able to edit /etc/shadow? If you are already logged in as root you can just use 'passwd' to change the passwd. When running as root you do not need to provide the existing password to change a password. If you can login as a user with 'sudo' access, you can use 'sudo -i' to gain a root shell then follow the step above. If you have mounted your SD card on another linux instance, you should be able to use: passwd -R <path_to_mount_point> root <path_to_mount_point> is where you have your SD card mounted (not the full path to the passwd/shadow files)

Research Kernel config: Change RTC_HCTOSYS as default?

windysea replied to windysea's topic in Advanced users - Development

Reviewing the defconfigs currently provided, it looks like most boards already have CONFIG_RTC_HCTOSYS=y, along with the matching CONFIG_RTC_HCTOSYS_DEVICE=rtc0 so I would think this should be good to submit a PR against for sunxi64-dev (used for pine64 DEV). However I noticed that the pine64-default actually has CONFIG_RTC_HCTOSYS_DEVICE=rtc0 set but does not have CONFIG_RTC_HCTOSYS set. That would hint that it had been enabled and at some point later disabled, as the former doesn't get defined until the latter is set to 'y': linux-pine64-default.config:# CONFIG_RTC_HCTOSYS is not set linux-pine64-default.config:CONFIG_RTC_HCTOSYS_DEVICE="rtc0" linux-sunxi64-dev.config:# CONFIG_RTC_HCTOSYS is not set linux-sunxi64-next.config:# CONFIG_RTC_HCTOSYS is not set Before I try to submit a PR to change this defconfig, is there a particular reason this should not be done? At worst if the RTC is not able to be read the kernel should simply log a message and move along as if this option wasn't enabled. This is the only set of defconfigs (default/dev/next) that I found like this, which is why I'm curious.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

Yes that is the commit that is now part of the mainline that I referenced above. There are copies of it all over but you can also find it in Linus' own repository now as well as on the kernel.org repository. This has also been merged into the megous repo that is the source for the armbian builds. This has been running on my built DEV kernel for over 15 hours and has been solid so far. With an unpatched DEV kernel I had a date jump pretty consistently in less than four hours. With the NEXT kernel that has the patch for A64_UNSTABLE_TIMER the occurrences varied from a few hours to a few days. In other words this is the same known issue with a now more complete mitigation applied to the mainline kernel upstream. [Edit]: I've been running a 5.0.4 DEV kernel with the upstream fix for a bit over a day with no issues so far, but that is hardly definitive. The upstream solution adds logging if a stable timer read could not be obtained within 150 attempts (its bounding, which is also noted as 3 cycles/counter increments) so hopefully if there is a failure here it will be apparent via kernel logs. I've also submitted a PR for the defconfig update for DEV kernels which currently don't have the armbian '0081-arm64-arch_timer-Workaround-for-Allwinner-A64-timer-.patch' applied (by default) anyway.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

I noticed there is a new fix that appears to have been just brought back from 5.1.y to 5.0.y to replace the fix enabled by setting Kconfig 'SUN50I_A64_UNSTABLE_TIMER'. The replacement fix extends the check by an additional bit, adds additional coverage, and adds an additional mitigation for another register that was seen to be affected. The last item had been referenced in the SUN50I_A64_UNSTABLE_TIMER fix but was not addressed. The new Kconfig item from upstream is 'CONFIG_SUN50I_ERRATUM_UNKNOWN1' Not particularly important, but the SUN50I_A64_UNSTABLE_TIMER fix had been dropped in a recent previous commit but hadn't made it back to 5.0.y yet. This fix was still present in the kernels used in the armbian builds so that wouldn't have been the cause. I've built a new DEV (5.0.4) kernel with this latest update and will let it run for a while to see if the issue has been mitigated a little better.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

I haven't had a chance to try that tool yet but it looks like it is indeed to check for the instability in question here. This issue appears to be the one that should be resolved in the "fix" that is enabled with the earlier-referenced Kconfig item. When reading from the hardware architecture timer on the SoC due to instability some bits are erroneously set for that one cycle, but always in an identifiable pattern. The result is that instability can be (mis)interpreted as a jump approximately in a multiple of 95 years, specifically. The mitigation applied is to read from the timer multiple times in consecutive cycles if needed, as the problem is apparently only transient for a single cycle, and only return what should be a good value. This has been occurring for me every few days, and the clock has always jumped forward by 95 years. With a 4.14.y kernel I had never seen an issue, nor had I ever seen an issue with a 4.9.y kernel. I'm intrigued and wish I had more time to spend, but I'm still sticking with it to see if I can find what is happening. I noticed there were very recent commits related to handling the specific register(s) that are the culprit: CNTVCT and CNTPCT so I want to take a quick look at those to see if they could be related, and hopefully address, this particular issue.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

Because the "jump" is by 95 years it strongly suggests the known hardware issue. It is a random occurrence with the SoC, and the existing kernel patches provide an attempt to detect and avoid this bug by doing multiple reads (only a single read would be "bad" with this bug, but that's all it takes) There isn't anything that can be done to detect if this has been truly "fixed" - when this occurs it will be a jump (forward or backward) by 95 years which is immediately apparent (the date will suddenly be in the year 2019 + 95 = 2114 currently. Otherwise there is no way to predict when/if this might occur. The existing fixes have worked with previous kernels and do seem present here. It will now just take some deeper review to identify why they appear to no longer effective. The actual commits have some decent descriptions of the problem and the fixes put in place. The new question I have is why the system clock can't be later corrected if this is indeed a one-time random bad read, as is noted in the bug.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

Earlier today ntpd became unable to set the clock again. The system time does seem to have suddenly jumped: root@pine64:/sys/kernel/debug# date Thu May 17 16:29:30 EDT 2114 However the RTC (hwclock) appears to be OK: root@pine64:/# cat /sys/class/rtc/rtc0/{date,time} 2019-03-25 22:23:10 root@pine64:/proc/sys# hwclock --get 2019-03-25 18:23:12.526484-0400 There is something attempting to set the RTC but apparently using a bad value: [157688.666308] sun6i-rtc 1f00000.rtc: rtc only supports year in range 1970 - 2033 Attempts to set the system clock fail pretty consistently via either settimeofday() or clock_settime(): root@pine64:/proc/sys# strace date 032518322019 2>& 1 | egrep sett clock_settime(CLOCK_REALTIME, {tv_sec=1553553120, tv_nsec=0}) = -1 EINVAL (Invalid argument) settimeofday({tv_sec=1553553120, tv_usec=0}, NULL) = -1 EINVAL (Invalid argument) root@pine64:/proc/sys# strace hwclock --hctosys 2>& 1 | egrep sett settimeofday({tv_sec=1553553315, tv_usec=0}, {tz_minuteswest=240, tz_dsttime=0}) = -1 EINVAL (Invalid argument) write(2, "settimeofday() failed", 21settimeofday() failed) = 21 tv_sec looks good, so issue appears to be with kernel. According to man page EINVAL from clock_settime() indicates the specified clock is not supported. That's not right, so some source perusal would be needed for more detail. According to the man page EINVAL from settimeofday() indicates timezone (or something else) is invalid. TZ should be NULL in modern implementations, and is valid so this must be "something else". Not very helpful Setting the RTC appears to succeed (confirmed via strace): root@pine64:/proc# hwclock --set --date="$(hwclock)" root@pine64:/proc# echo $? 0 The next steps will need more debugging enabled in the kernel. I've confirmed the previous applicable kernel configs & errata have all been applied, but perhaps the previous timer-related fixes have become less than effective with the latest kernels? Edit: I had forgotten to note that the "jump" is indeed 95 years, which suggests the bug with the A64 arch_timer does appear to be the cause: root@pine64:/# cat /sys/class/rtc/rtc0/since_epoch && date +%s 1553558900 4556039097 root@pine64:/# expr 4556039097 - 1553558900 3002480197 root@pine64:/# expr 3002480197 / 31536000 95

DEV (5.0.y) on PineA64+ seeing occasional CPU stalls

windysea posted a topic in Advanced users - Development

I have been running a "stock" DEV kernel (5.0.y) on my PineA64+ while trying to wait for the previously noted (by others) date-setting issue to occur. "stock" being a 'compile.sh' using 'BRANCH="dev" using the PineA64 defconfig (BOARD="pine64" with no config changes) A date/time setting problem just happened a little while ago and as I was about to start to try to figure out where the kernel went wrong when I got a process crash with an apparent CPU stalled error: [154619.384626] rcu: INFO: rcu_sched self-detected stall on CPU [154619.390319] rcu: 2-...0: (1 GPs behind) idle=78a/1/0x4000000000000002 softirq=1062220/1062259 fqs=1174 [154619.399880] rcu: (t=266523 jiffies g=2266349 q=1232) [154619.405109] Task dump for CPU 2: [154619.408421] systemd R running task 0 1 0 0x00000002 [154619.415553] Call trace: [154619.418099] dump_backtrace+0x0/0x1b0 [154619.421849] show_stack+0x14/0x20 [154619.425252] sched_show_task+0x154/0x188 [154619.429259] dump_cpu_task+0x40/0x50 [154619.432919] rcu_dump_cpu_stacks+0xc4/0x104 [154619.437186] rcu_check_callbacks+0x694/0x760 [154619.441539] update_process_times+0x2c/0x58 [154619.445807] tick_sched_handle.isra.5+0x30/0x48 [154619.450420] tick_sched_timer+0x48/0x98 [154619.454339] __hrtimer_run_queues+0xe4/0x1f0 [154619.458695] hrtimer_interrupt+0xf4/0x2b0 [154619.462793] arch_timer_handler_phys+0x28/0x40 [154619.467322] handle_percpu_devid_irq+0x80/0x138 [154619.471936] generic_handle_irq+0x24/0x38 [154619.476029] __handle_domain_irq+0x5c/0xb0 [154619.480208] gic_handle_irq+0x58/0xa8 [154619.483954] el0_irq_naked+0x4c/0x54 I've been seeing these occasionally for random processes, but mostly with 'swapper'. This one is particularly problematic as 'systemd' itself has crashed. Without systemd, there is no equivalent 'init', and restarting a crashed systemd isn't always the best idea. . .but that's a different problem. I'm going to leave the above aside for now but I'll look to see if I can get the entire stack dump gets logged persistently (syslog to persistent storage) to look for any patterns, just in case anyone else is seeing these. Enabling RCU tracing may be helpful but that'll be for another time. The actual 'rcu' messages get logged currently, but not the call trace (something else to look into) In the meantime I'll try to dig into the date/time setting issue which I'll document in a separate thread once I have more.

March 25, 2019

Research Kernel config: Change RTC_HCTOSYS as default?

windysea posted a topic in Advanced users - Development

Some SBCs, such as PIneA64, provide a built-in RTC on-board while others do not. For those boards that do have a built-in RTC, it appears the kernel driver is configured as a builtin rather than as a module at least in some cases (I only checked a few). This in turn brings up a question: For boards that include a built-in RTC and also have the respective kernel driver configured as a built-in, should the kernel configuration to also set the system clock from that RTC (RTC_HCTOSYS) be enabled by default? There is a dependency that the RTC be built-in as the kernel will try to read from the RTC rather early, before any loadable modules would have been loaded. That would potentially preclude this from being a common default, but doesn't necessarily need to. For cases where this can be done, this has the advantage of not requiring separate user-space accommodations. For distributions based on systemd there are sadly a very large number of discussions and proposals on how to set the system time from RTC at boot time but also sadly there is no standard nor common solution. /lib/udev/hwclock-set explicitly exits without setting the system clock if systemd is found, and systemd will not read the RTC itself expecting to use a network time source (NTP) instead. Having the kernel set system time from RTC would happen a bit earlier than any user-land option which would result in consistent and correct timestamps on files created/modified during startup as well as in various logs. It may not be a perfect nor complete solution, but where possible does this seem like something that could/should be done?

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

I'm seeing other issues, such as not being able to configure a non-tickless kernel. If anything, using a constant-rate (non-tickless) should be more stable but with post-14.4.y it is wildly unstable.

A64 date/time clock issue

windysea replied to martinayotte's topic in Advanced users - Development

Thanks. This looks to be a busy weekend for me but I would like to start working on these a little deeper. There do seem to be timer issues post- 4.14.y and I haven't yet determined if that could be due to missing kernel configs or due to underlying kernel code changes. The former shouldn't be too hard to find. . .just tedious, but the latter might be more involved.

Sign In

Forums

Store

Crowdfunding

Applications

Events

Raffles

Community Map

Everything posted by windysea

Important Information