windysea

November 28, 2019

I haven't had much time to follow-up on this.

FWIW the upstream patches should have included an improved version of the patch above already. To clarify it is not a fix but rather a loop to read the hardware timer repeatedly until there are two consecutive reads that are the same. It does add a different amount of overhead than the fix it is replacing. For instance there must always be at least two reads.

I am wondering about the above solution being replaced since it is only a nine-bit mask, but it was found that this should probably be 11 bits.

I forget the reason that solution was not used as-is in the upstream but I do remember there was concern two reads close enough could read the same value before the hardware timer actually changed (IE: The same bad value could be read twice consecutively)

Now that we're a few versions newer than the last time I checked it probably warrants another review. FWIW I haven't seen the issue with the more recent kernels while previously I had one A64 that couldn't get through a half day.

Advanced users - Development · June 11, 2019

On 5/29/2019 at 12:37 AM, sfx2000 said:

To get really stable local time - one is not going to get stratum 0 on an SBC, period, even with GPS and PPS... most of these cheap SBC's use an XO with 30 to 50 PPM accuracy (this is a 50 cent chip) - this is good enough for ethernet and WIFi - I'm looking a part that is 50 ppb, which would normally be an OCXO (spendy), but this part is a MEMS device, and so far the data I've seen looks pretty good over time.

An ntp stratum is not related to accuracy nor precision - it is simply an indication of how many "hops" a given NTP server is from a reference clock. A stratum-0 is a reference clock (IE: atomic clock, GPS receiver, etc). A stratum-1 is the NTP server directly using that reference clock for time synchronization. An SBC with a serial GPS indeed can be a stratum-1 (the GPS would be stratum-0), and there are many public postings on doing this. In fact the NTPsec team is doing "research" on this topic and has published documentation regarding this.

The nature of the reference implementation of NTPd is specifically to maintain accurate time regardless of any hardware timers. Today's "50-cent" parts are still more stable than those orders-of-magnitude more expensive decades-ago when NTPd was first developed.

Google's NTP project may use their own "atomic clocks", but their public NTP servers tend to be on the poor end with respect to jitter. They're intended to be "close-enough", stable, and highly-available. They are not intended to be highly accurate. Their public NTP servers, for instance, implement leap-smearing rather than advertise a leap-second (when appropriate). For this reason Google strongly recommends not mixing their public NTP servers in a configuration with other NTP sources (bad things can happen, and in fact have happened in the past). Google's NTP servers also are behind anycast load-balancers. While this improves availability and end-device configuration simplicity it actually degrades performance.

In my own testing google's ntp servers typically have higher jitter than most of the larger NTP pool project pools, the latter of which are already commonly used as defaults in many OS distributions.

Configuring and building a non-tickless kernel is required in order to enable kernel-pps (aka "hard pps"), which typically has far less jitter than "soft pps". However, doing so even with the latest 5.1.y (DEV) kernels results in an unstable platform where the issue noted in this thread will manifest fairly frequently. It may just be that A64-based SBCs are not suitable to host NTP reference clocks and stratum-1 NTP servers but earlier kernels did not seem to have this issue so it may just be that a previous mitigation got lost along the way.

June 10, 2019

6 hours ago, pkral78 said:

It seems that upstream kernel (5.2rc4 at the point) has already the fix from Samuel Holland).

https://github.com/torvalds/linux/commit/c950ca8c35eeb32224a63adc47e12f9e226da241#diff-27e9d89068f8125140ea388216ae3140

Yes that fix should already be part of the existing kernels, however there are actually several timer-related issues on the A64 and it is not clear that they are all addressed even upstream.

Even with all of the fixes that have been published I still see the issue recur occasionally. There is a thread in the Development section where this is (or was) being discussed.

June 5, 2019

Thanks!

I removed the patch from my local 'userpatches' and built a new -dev kernel using the default kernel config (no changes) and no local patches or changes. The kernel booted with expected network connectivity and all looks good.

I'll verify tomorrow with the nightly builds as well.

June 5, 2019

@martinayotte - I was researching this one since I experienced the same when updating from 5.0.y to 5.1.y.

It looks like you had found similar and posted a query/comment to the upstream repo at https://github.com/megous/linux/commit/c797ad894874de33b2be4beeb1f5a1e6339a888d#comments

I had found the same as you - the drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c driver was missing a change made specifically in megous' branch for 5.0 but not carried to 5.1.

Did you intend to include this as a local patch or should I submit a pull request for a new patch to do this in the armbian tree?

It looks like 0003-net-stmmac-dwmac-sun8i-support-RGMII-modes-with-PHY-.patch from patch/kernel/sunxi-next should do it. . .

May 1, 2019

Beware that 'b' will literally trigger an immediate reboot without even attempting to sync the disks. That is possible, and even likely in some cases, to cause corruption. For the OP case this would not be an issue since it looks like the root filesystem is already trashed, though perhaps there may have been others?

A sync may be triggered similarly, however, just before the reboot:

root@host# echo s > /proc/sysirq-trigger
root@host# echo b > /proc/sysirq-trigger

The 's' can also be handy when /bin/sync is also missing, to help prevent corruption on additional filesystems that may be mounted when faced with forced poweroff/reboot.

For the OP: If your disk is already overwritten then simply removing and re-applying power should not be an issue, unless you don't have physical access.

If /bin/systemctl exists (it sounds like it may not here), you can try 'systemctl reboot --force'. That is the "modern" systemd replacement for 'reboot -f'

Otherwise the step noted above using /proc/sysirq-trigger would be the next option, followed by just pulling the power.

Advanced users - Development · May 1, 2019

The fixes to date have all been community-provided based on observations and testing (IE: not based on anything from the SoC vendor). Allwinner has been silent on this specific issue, which actually isn't really surprising

I did find a different fix from Allwinner for a different timer issue, which occurs when trying to write (rather than read) one of the affected timer registers. That fix is not part of the mainline (or armbian) kernels but does not immediately appear to be directly related, but that is part of what I hope to be able to look into.

Advanced users - Development · May 1, 2019

2 hours ago, sundbry said:

Unfortunately, the current mainline fix seems to be unreliable. I am running a fleet of these things (with a kernel built from source). The UNKOWN1 patch is enabled and loaded, but this is the second time I have seen this happen on this kernel. It happens on different devices too - the second time was a different one than the first time I saw it. I don't know if the patch was just a statistical cross-your-fingers patch, I'm going to review it now, but that's what it seems like.

The mainline patch is almost identical to the previous patch used in alternate forks, including those used by Armbian. The mainline patch has only one change (other than the Kconfig symbol and matching macro): one less low-order bit is masked during the test for the condition when the problem has been seen to occur, which actually makes the mainline patch provide more "coverage" than the previous.

The previous patch that had been applied to 4.19.y and 5.0.y kernels has similar behavior but was indeed working with 4.14.y and earlier, so there may be something else that has changed. I have not had the opportunity to continue looking into this just yet.

It is also possible that the original fix and the subsequent mainline fix were not complete, which is actually noted in the comments for the various commits. Disabling a tickless kernel, for instance, seems to exacerbate the issue significantly though on the surface this would not be expected.

Advanced users - Development · April 20, 2019

2 hours ago, Petee said:

created a once a day cron job to sync the system time to the hardware time.

[ - snip - ]

Understood. Is your PPS off now with the newer kernels? What pin on the Pine64 are you using for PPS?

You really shouldn't need to set a cron job to sync the system time back to the hardware clock since this is already a builtin kernel feature. It just appears that feature was disabled (broken) by something.

Spoiler


user@pine64:~$ sudo hwclock --compare
hw-time      system-time         freq-offset-ppm   tick
1555786702   1555786700.608212
1555786712   1555786710.609443               123      1
1555786722   1555786720.610197                99      1
1555786732   1555786730.610937                91      1
1555786742   1555786740.611641                86      1
1555786752   1555786750.612358                83      1
1555786762   1555786760.613084                81      1
^C

user@pine64:~$ sudo hwclock && date "+%F %T.%N"
2019-04-20 15:04:32.793358-0400
2019-04-20 15:04:32.615001945

When ntpd or systemd-timesyncd start they will set a flag in the kernel to enable this feature: ntpd will do so after it obtains sync but systemd-timesyncd does so almost immediately. Running 'hwclock -s' after this will in turn disable this feature again. After some time ntpd may discover this (if the kernel tells it) and try to reset the flag, but systemd-timesyncd will not.

When using systemd-timesynd (the standard default out-of-the-box) you should never use hwclock -s. It only causes bad things to happen, and is why /etc/init.d/hwclock.sh and its many incantations explicitly check for running systemd and will exit without taking any action if so. Before systemd that standard script would run before ntpd and all was good. Then came systemd and that caused breakage so the extra checks were put in. There are many discussions on this elsewhere so there's no need to discuss further here: The summary is if you are using systemd do not run 'hwclock -s'. Ever.

When using ntpd (and after systemd-timesyncd has been appropriately disabled/removed), 'hwclock -s' may be run once but only when it may be assured that this happens before ntpd is started. That isn't as simple as it sounds with systemd either . This is also discussed in many other forums so no need to rehash that here. I do have a proper implementation for a raspberry pi with an I2C-based RTC that requires this - it is doable but it is a bit more complex than simply putting a one-liner in /etc/rc.local

In short, all current armbian kernels for platforms with an on-board RTC, including the PineA64(+), should have the proper configuration for the kernel to read the RTC and set the system time during its early startup, as is standard. This removes virtually any need to ever use 'hwclock -s' manually or during boot. Whether using systemd-timesyncd or ntpd the system time will already be correct, the appropriate daemon will begin synchronizing the system time from a reliable source, and will enable the kernel to periodically update the hardware clock from the system clock on its own as well (every 11 minutes IIRC). There should be no need to manually sync time in either direction.

I am using pin PH9 for PPS on a pine64 since that is one that is interrupt-capable (not all are):

user@pine64:~$ egrep pps_pin /boot/armbianEnv.txt
param_pps_pin=PH9

Using a GPS as a reference clock without PPS is not very useful unless it is the only source of time. There is significant delay that must be accurately compensated for ("fudge time2" with the .20 NMEA GPS driver), plus there will always be quite a bit of jitter especially if the GPS output isn't reduced to the single sentence desired. I've found that typically the offsets are close to 1/2 second even with 115Kbaud.

Using kernel ("hard") PPS can provide much lower jitter than NTP ("soft") PPS, but requires a custom kernel be built with this capability enabled. That in turn requires the kernel be configured at compile-time as non non-tickless since the two configurations are not currently compatible. A standard kernel may also be configured at boot time to run non-tickless by adding 'nohz=off' to the kernel command line arguments. Running non-tickless can help reduce jitter but will increase power usage (slightly)

For now I am using NTP PPS and attempting to use non-tickless on the pine64. This gets good results on its own:

Spoiler


user@pine64:~$ /usr/local/sbin/ntptime && ntpq -c rv
ntp_gettime() returns code 0 (OK)
  time e065e417.a321a2e8  Sat, Apr 20 2019 14:31:51.637, (.637232256),
  maximum error 2000 us, estimated error 3 us, TAI offset 37
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 0.787 us, frequency 27.737 ppm, interval 1 s,
  maximum error 2000 us, estimated error 3 us,
  status 0x2001 (PLL,NANO),
  time constant 5, precision 1.000 us, tolerance 500 ppm,
associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync,
version="ntpd 4.2.8p13@1.3847-o Tue Apr 16 23:50:07 UTC 2019 (1)",
processor="aarch64", system="Linux/5.0.7-sunxi64", leap=00, stratum=1,
precision=-20, rootdelay=0.000, rootdisp=1.030, refid=PPS,
reftime=e065e415.391ac159  Sat, Apr 20 2019 14:31:49.223,
clock=e065e417.a556d15a  Sat, Apr 20 2019 14:31:51.645, peer=40755, tc=5,
mintc=3, offset=0.000800, frequency=27.737, sys_jitter=0.002519,
clk_jitter=0.003, clk_wander=0.000, tai=37, leapsec=201701010000,
expire=201912280000

I had spent some time dialing-in the offset for the PPS, since there is more overhead with NTP discipline over kernel discipline.

The only issue is that I am hitting what appears to be the known timer issue on topic for this thread so that is where I want to try to look next. With 4.14 (or earlier) kernels this is not an issue and the same mitigation appears to work well there but does not seem so complete with 4.19 and later.

Advanced users - Development · April 20, 2019

10 minutes ago, Petee said:

I am still fine here after one day of running:

The hardware clock is now not in sync with the system time after 24 hours:

root@ICS-Pine64:~# date
Sat Apr 20 06:44:53 CDT 2019
root@ICS-Pine64:~# hwclock
2019-04-20 06:44:57.138

Personally with these little issues I would not utilize the Pine64 as an NTP server with PPS. That is me though.

Currently running an NTP server here using BSD on an Intel CPU and it is working great.

If your hardware clock is not being synced from your system clock then something is broken. Runing 'hwclock -s' after systemd-timesyncd or ntpd has started is one of those things that will cause such breakage but there can be other reasons. Having an RTC isn't much value if it won't be kept decently accurate. Most RTCs are intended to be periodically updated by the host OS as they utilize low-cost and not very-high-accuracy components and design.

As for using a Pine64 as an NTP server - it actually works very well. I've had this pine64 doing exactly that with KPPS with 4.14.y and earlier kernels with great success, and with a total power utilization of less than 3.5W. There is something with 4.19.y and later kernels that is not right, and trying to work with NTP is just one use case that is exposing this.

This is perhaps not a "little issue" - again it is not itself a time/date problem but rather a problem when reading a hardware counter (one of many) in the SoC. That counter is used by the kernel for core functionality, of which tracking system time is just one use. If some are experiencing an issue while others are not, using the same kernel, then there is something about the difference in workloads that is exposing an un-mitigated issue that can be triggered at any time.

Advanced users - Development · April 19, 2019

I should probably put in an update. . . .

After running stable for over a week I tried to disable a tickless kernel (standard modern default) to reduce jitter:

user@pine64:~$ egrep nohz /boot/armbianEnv.txt
extraargs=nohz=off

user@pine64:~$ cat /proc/cmdline 
root=UUID=c80325d7-1d25-4151-85c3-47343b473fae rootwait rootfstype=ext4 console=ttyS0,115200 console=tty1 panic=10 consoleblank=0 loglevel=7 ubootpart=0f940383-01 usb-storage.quirks=0x2537:0x1066:u,0x2537:0x1068:u nohz=off  cgroup_enable=memory swapaccount=1

Sadly this once again brought about instability. With a non-tickless kernel I start to see rcu_sched self-detected stalls such as I noted in DEV (5.0.y) on PineA64+ seeing occasional CPU stalls. At some point the system clock will jump by 95 years and a reboot ends up being required, potentially forcibly (bad things start to happen)

Interestingly, with a non-tickless kernel Andy's test_timer shows considerably different results, confirming there is indeed still a problem despite the workaround being present and active:

Spoiler


user@pine64:~/src$ dmesg | egrep UNKNOWN
[    0.000000] arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1

user@pine64:~/src$ ./test_timer 
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
# time1: a266aaff9, time2: a266aae00, diff: -505
# time1: a26836ff8, time2: a26836e00, diff: -504
# time1: a26881ff9, time2: a26881e00, diff: -505
# time1: a26884ff9, time2: a26884e00, diff: -505
# time1: a26a67ff9, time2: a26a641ff, diff: -15866
# time1: a26c9bff9, time2: a26c9be00, diff: -505
# time1: a27085ff9, time2: a27085e00, diff: -505
# time1: a270a3ff9, time2: a270a3e00, diff: -505
# time1: a270c1ff9, time2: a270c1dff, diff: -506
# time1: a271e4ff9, time2: a271e4e00, diff: -505
# time1: a2726bff9, time2: a2726be00, diff: -505
# time1: a2733aff9, time2: a2733ae00, diff: -505
# time1: a27d99ff8, time2: a27d99e00, diff: -504
# time1: a27e62ff9, time2: a27e62e00, diff: -505
# time1: a27e89ff9, time2: a27e89e00, diff: -505
# time1: a2819eff9, time2: a2819ee00, diff: -505
# too many errors, stopping reports
not ok 2 native counter reads are monotonic # 41 errors
# min: -15866, avg: 6, max: 508423
# diffs: -660834, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -20792, -660833, -20791, -20791
# too many errors, stopping reports
not ok 3 Linux counter reads are monotonic # 33 errors
# min: -660834, avg: 510, max: 66197833
# core 0: counter value: 43968317728 => 1832 sec
# core 0: offsets: back-to-back: 12, b-t-b synced: 6, b-t-b w/ delay: 12
# core 1: counter value: 43968319238 => 1832 sec
# core 1: offsets: back-to-back: 8, b-t-b synced: 6, b-t-b w/ delay: 9
# core 2: counter value: 43968321292 => 1832 sec
# core 2: offsets: back-to-back: 9, b-t-b synced: 6, b-t-b w/ delay: 9
# core 3: counter value: 43968322532 => 1832 sec
# core 3: offsets: back-to-back: 10, b-t-b synced: 7, b-t-b w/ delay: 9
1..3

Unfortunately I likely won't have any time to look at this until after the weekend.

I'm still working towards getting the Pine64 back as a stable stratum-1 NTP server with a GPS-based reference clock. With Kernel PPS (requires non-tickless kernel) and proper offset adjustments this was working very well on 4.14.y (and earlier).

Oh - and to add to how the RTC got brought into this discussion, though it is not really related: The kernel has a capability to keep the RTC in sync with its system time, primarily for use with NTP or other external service that will discipline the system time. When the symptom of the issue here manifests with the date jumping by a multiple of 95 years, the system time is no longer within the valid range for the RTC so any further attempts to synchronize the RTC from the system time result in failure:

[28410.996865] sun6i-rtc 1f00000.rtc: rtc only supports year in range 1970 - 2033

The above message will only appear in the live kernel message buffer (aka 'dmesg') and on the system console, so it can be easily missed. Failures in setting the RTC (aka "hwclock") are only another indirect symptom of the issue and would be expected.

Advanced users - Development · April 19, 2019

Just for completeness and closure PR #1329 implemented this for 'sunxi' and 'sunxi64' in both NEXT and DEV. Other platforms already have this feature configured.

There should be no longer be any need for additional automated or otherwise scripted enhancements to do a separate 'hwclock -s' after boot as the kernel will do this itself very early, where it should be done when possible.

Advanced users - Development · April 19, 2019

Again - the RTC and the timer in question are two separate entities and are unrelated. The issue here is not with the RTC but with a completely different counter in the SoC.

The mainline fix for the timer issue here was formally added to the mainline in the 5.1.y branch for the A64. It has been pulled back to the 5.0.y and 4.19.y branches in various parts between the upstream mainline fork by megous used by Armbian as well as Armbian patches applied on top of that branch. The nightly builds and the packaged kernel updates have had these for a short while, and to see if the workaround from these has been applied and is active you can use:

user@pine64:~$ dmesg | egrep -i errat
[    0.000000] arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1

The 'UNKNOWN1' erratum is the issue in question.

There had been a previous fix that had been around for a long time and was part of what eventually became the formal mainline fix. It was the period between these two where this issue had reappeared but should be resolved now.

Those are separate from enabling RTC use by the kernel during boot to set the system clock which is a different and independent timer.

It should be noted that trying to use 'hwclock -s' after another active process (such as ntpd, chrony, or systemd-timesyncd) has initiated synchronization to the RTC from the system clock can cause issues. NTPd can normally recover but the others perhaps not so much.

Advanced users - Development · April 19, 2019

Yes I do have an RTC backup battery attached, but that really only applies when power is physically removed from the board. With just a 'reboot' the RTC remains powered.

This particular timer issue also is not related to the RTC - it is a problem with a hardware timer (really just a counter) that is part of the SoC itself. When the symptoms manifest, the kernel (aka "system") time will jump by a multiple of 95 years and after that the kernel is no longer able to sync back to the hardware RTC which is independent.

I had also submitted PR #1329 following Kernel config: Change RTC_HCTOSYS as default? which has since been merged. As a result there should be no need to otherwise explicitly use 'hwclock -s' to set the system clock from the RTC anymore. Instead during boot a message should be seen, before even init is spawned indicating the system clock has been set from the RTC:

user@pine64:~$ dmesg | grep -w rtc
[    1.975054] sun6i-rtc 1f00000.rtc: registered as rtc0
[    1.975070] sun6i-rtc 1f00000.rtc: RTC enabled
[    2.031287] rtc-ldo: supplied by regulator-dummy
[    4.079694] sun6i-rtc 1f00000.rtc: setting system clock to 2019-04-19T01:29:14 UTC (1555637354)

Advanced users - Development · April 19, 2019

Interestingly I've never had a successful run with Andre's test_timer:

Spoiler


user@pine64:~/src$ ./test_timer 
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
# time1: b525ffff9, time2: b52583dff, diff: -508410
# time1: b52effff9, time2: b52e83dff, diff: -508410
# time1: b54ba0ff9, time2: b54ba0e00, diff: -505
not ok 2 native counter reads are monotonic # 3 errors
# min: -508410, avg: 6, max: 15879
# diffs: -660875, -20792, -660834, -660875
not ok 3 Linux counter reads are monotonic # 4 errors
# min: -660875, avg: 484, max: 71084
# core 0: counter value: 48987577802 => 2041 sec
# core 0: offsets: back-to-back: 23, b-t-b synced: 6, b-t-b w/ delay: 17
# core 1: counter value: 48987579166 => 2041 sec
# core 1: offsets: back-to-back: 13, b-t-b synced: 6, b-t-b w/ delay: 8
# core 2: counter value: 48987581662 => 2041 sec
# core 2: offsets: back-to-back: 8, b-t-b synced: 6, b-t-b w/ delay: 9
# core 3: counter value: 48987582921 => 2041 sec
# core 3: offsets: back-to-back: 8, b-t-b synced: 7, b-t-b w/ delay: 12
1..3

This board too is a 2GB PineA64+ from the initial Kickstarter campaign. Using the latest DEV pre-built kernel via 'apt upgrade', with the UNKNOWN1 erratum active:

Spoiler


user@pine64:~/src$ dpkg --list linux-image-dev-sunxi64
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
ii  linux-image-de 5.79.190419  arm64        Linux kernel, version 5.0.7-sunxi

user@pine64:~/src$ cat /proc/version 
Linux version 5.0.7-sunxi64 (root@nightly) (gcc version 7.4.1 20181213 [linaro-7.4-2019.02 revision 56ec6f6b99cc167ff0c2f8e1a2eed33b1edc85d4] (Linaro GCC 7.4-2019.02)) #5.79.190419 SMP Thu Apr 18 12:08:24 CEST 2019

user@pine64:~/src$ dmesg | egrep -i errat
[    0.000000] arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1

I haven't seen any of the known symptoms from this issue (IE: date jump by multiple of 95 years) as long as the erratum workaround is active, but the test_timer has never passed fully. For now I've chosen to leave this aside since I'm not seeing any other symptoms.

April 17, 2019

The PineA64+ is actually based on an Allwinner A64.

Yes it sounds like you will want a more recent kernel. This thread may be of some help: A64 date/time clock issue

April 17, 2019

To circle back and follow-up on this thread. . .

As of today all of the Armbian default kernel configurations now include pps-gpio as a module by default so no custom kernel builds will be required. Simply configuring 'overlay=pps-gpio' in /boot/armbianEnv.txt will make this available. Optionally param_pps_* may be used to configure, such as pin. You'll need to wait for the next nightly builds if you don't want to actually build a kernel.

I had also identified the issue with the pin conflict (as I suspected above) and dug into the root cause further before submitting a PR to have it corrected. This is already in the existing nightly builds for -DEV so this is resolved as well. This appeared to only affect Allwinner A64 boards such as Pine 64 and should not be relevant for your Odroid C2.

I am currently using PPS once again on my PineA64+ successfully.

Advanced users - Development · April 16, 2019

On a Pine A64+ I've actually started to see frequent 'rcu_sched self-detected stall on CPU' again with recent builds since the switch to the new toolchain, without any other changes. This includes both the armbian nightlies as well as my own custom builds.

I haven't tried to go back to the previous toolchain yet, nor have I seen any problems with either the system clock or the hardware clock.

I have noticed that two other (unrelated) A53 errata are no longer activated on boot even though they are still both configured and built into the kernel: 843419 and 845719. Again the only difference appears to be the toolchain used to build the kernel.

Advanced users - Development · April 16, 2019

Currently all of the Armbian kernel defconfigs include the actual build configuration statically as part of the kernel. The standard Armbian build process also includes a copy of the .config file in /boot/config-<version>-<platform>.

Are both required, or should the kernel configurations be updated to make Kernel .config support an LKM instead of static bind?

True this is a minor item. The actual kernel configuration takes up a very small amount of memory, though on many SBCs with limited memory to start with every byte not used by the kernel can help.

This would amount to changing from:

CONFIG_IKCONFIG=y

to:

CONFIG_IKCONFIG=m

About the only thing that might "break" would be any tools that use 'extract-ikconfig' or similar against the on-disk binary to extract the built-in .config, but with the kernel configuration already included in the linux-image-*.deb and installed under /boot as a separate text file this should not be needed.

I can submit a PR for -DEV if this sounds reasonable for all of the existing defconfigs in config/kernel but wanted feedback on whether this would be useful or whether this should be left as-is.

Advanced users - Development · April 10, 2019

In your dmesg output (or /var/log/messages) do you see any lines similar to:

arch_timer: Enabling global workaround for Allwinner erratum UNKNOWN1

If not you may want to update to the latest nightly build. There was a commit that removed the activation for this fix and the following nightly build had this issue return. That has since been corrected and the latest DEV build once again has the mitigation to avoid this issue.

April 9, 2019

Presuming this is not a brand-new install, correct?

How are you able to edit /etc/shadow?

If you are already logged in as root you can just use 'passwd' to change the passwd. When running as root you do not need to provide the existing password to change a password.

If you can login as a user with 'sudo' access, you can use 'sudo -i' to gain a root shell then follow the step above.

If you have mounted your SD card on another linux instance, you should be able to use:

passwd -R <path_to_mount_point> root

<path_to_mount_point> is where you have your SD card mounted (not the full path to the passwd/shadow files)

March 29, 2019

You can build a new kernel with the option for now. The option via "make menuconfig" (done automatically by compile.sh if you set KERNEL_CONFIGURE=yes) is under Device Drivers -> PPS support. You'll want to enable 'PPS client using GPIO'

However I seem to be hitting a conflict when the pps-gpio module is loaded so I haven't submitted a PR for this to go back to enabled-by-default yet. I found some details that might be relevant and hope to get to this soon.

Advanced users - Development · March 27, 2019

Reviewing the defconfigs currently provided, it looks like most boards already have CONFIG_RTC_HCTOSYS=y, along with the matching CONFIG_RTC_HCTOSYS_DEVICE=rtc0 so I would think this should be good to submit a PR against for sunxi64-dev (used for pine64 DEV).

However I noticed that the pine64-default actually has CONFIG_RTC_HCTOSYS_DEVICE=rtc0 set but does not have CONFIG_RTC_HCTOSYS set. That would hint that it had been enabled and at some point later disabled, as the former doesn't get defined until the latter is set to 'y':

linux-pine64-default.config:# CONFIG_RTC_HCTOSYS is not set
linux-pine64-default.config:CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
linux-sunxi64-dev.config:# CONFIG_RTC_HCTOSYS is not set
linux-sunxi64-next.config:# CONFIG_RTC_HCTOSYS is not set

Before I try to submit a PR to change this defconfig, is there a particular reason this should not be done? At worst if the RTC is not able to be read the kernel should simply log a message and move along as if this option wasn't enabled.

This is the only set of defconfigs (default/dev/next) that I found like this, which is why I'm curious.

Advanced users - Development · March 27, 2019

Yes that is the commit that is now part of the mainline that I referenced above. There are copies of it all over but you can also find it in Linus' own repository now as well as on the kernel.org repository. This has also been merged into the megous repo that is the source for the armbian builds.

This has been running on my built DEV kernel for over 15 hours and has been solid so far. With an unpatched DEV kernel I had a date jump pretty consistently in less than four hours. With the NEXT kernel that has the patch for A64_UNSTABLE_TIMER the occurrences varied from a few hours to a few days.

In other words this is the same known issue with a now more complete mitigation applied to the mainline kernel upstream.

[Edit]:

I've been running a 5.0.4 DEV kernel with the upstream fix for a bit over a day with no issues so far, but that is hardly definitive. The upstream solution adds logging if a stable timer read could not be obtained within 150 attempts (its bounding, which is also noted as 3 cycles/counter increments) so hopefully if there is a failure here it will be apparent via kernel logs.

I've also submitted a PR for the defconfig update for DEV kernels which currently don't have the armbian '0081-arm64-arch_timer-Workaround-for-Allwinner-A64-timer-.patch' applied (by default) anyway.

Advanced users - Development · March 26, 2019

I noticed there is a new fix that appears to have been just brought back from 5.1.y to 5.0.y to replace the fix enabled by setting Kconfig 'SUN50I_A64_UNSTABLE_TIMER'. The replacement fix extends the check by an additional bit, adds additional coverage, and adds an additional mitigation for another register that was seen to be affected. The last item had been referenced in the SUN50I_A64_UNSTABLE_TIMER fix but was not addressed.

The new Kconfig item from upstream is 'CONFIG_SUN50I_ERRATUM_UNKNOWN1'

Not particularly important, but the SUN50I_A64_UNSTABLE_TIMER fix had been dropped in a recent previous commit but hadn't made it back to 5.0.y yet. This fix was still present in the kernels used in the armbian builds so that wouldn't have been the cause. I've built a new DEV (5.0.4) kernel with this latest update and will let it run for a while to see if the issue has been mitigated a little better.

Sign In

Forums

Store

Crowdfunding

Applications

Events

Raffles

Community Map

Posts posted by windysea

Important Information