Werner Posted October 30, 2020 Posted October 30, 2020 3 minutes ago, dancgn said: I tried to install it throu armbian-config, but it doesn't work. You don't have to use armbian-config for such tasks. You always can install stuff the usual way or however it is documented at the targeted software armbian-config might be a bit neclegted in the past due to lack of ressources
dancgn Posted October 30, 2020 Posted October 30, 2020 vor 48 Minuten schrieb Werner: You don't have to use armbian-config for such tasks. You always can install stuff the usual way or however it is documented at the targeted software armbian-config might be a bit neclegted in the past due to lack of ressources I understand, but there is no official docker image for arm64. So i tried it with armbian-config.
flower Posted October 30, 2020 Posted October 30, 2020 1 hour ago, dancgn said: I understand, but there is no official docker image for arm64. So i tried it with armbian-config. the official ones work with arm64 without any problems. only the reverse proxy doesnt.
dancgn Posted October 30, 2020 Posted October 30, 2020 (edited) vor 19 Minuten schrieb flower: the official ones work with arm64 without any problems. only the reverse proxy doesnt. Hm, so I have to google a little bit. Thx @flower standard_init_linux.go:211: exec user process caused "exec format error" This is what i get... Edited October 30, 2020 by dancgn error message
Victor B. Posted October 30, 2020 Posted October 30, 2020 On 10/26/2020 at 10:35 PM, gprovost said: You right. No excuse on our side, we are behind schedule and not up to expectation on the software maturity, maybe we should have stick LK4.4 (from rockchip) and forget about Linux mainline for now :-/ However we are still working at improving the stability and we are optimistic that very soon, it will get better. Right now as you know LK5.8.16 is for some reason (we still can't figure out) unstable vs 5.8.14 We also realized that OMV install was removing some tx offloading tweak. So a lot of little things here and there that we only discover along the way. BTW regarding this crash are you sure tx offload on eth1 was disable ? I respect your transparency and your vision, do you have a GitHub where I can contribute?
flower Posted October 30, 2020 Posted October 30, 2020 1 hour ago, dancgn said: Hm, so I have to google a little bit. Thx @flower standard_init_linux.go:211: exec user process caused "exec format error" This is what i get... try this one: https://github.com/nextcloud/docker/tree/master/.examples/docker-compose/with-nginx-proxy-self-signed-ssl/mariadb/fpm omgwtfssl doesnt work though. you have to do the cert stuff yourself. here is my reverse proxy which handles certs: https://github.com/flower1024/proxy (but you have to tweak it... it also does dyndns updates with spdyn.de and has a fixed Europe/Berlin Timezone. its not really built for sharing)
dancgn Posted October 30, 2020 Posted October 30, 2020 vor 3 Minuten schrieb flower: try this one: https://github.com/nextcloud/docker/tree/master/.examples/docker-compose/with-nginx-proxy-self-signed-ssl/mariadb/fpm omgwtfssl doesnt work though. you have to do the cert stuff yourself. here is my reverse proxy which handles certs: https://github.com/flower1024/proxy (but you have to tweak it... it also does dyndns updates with spdyn.de and has a fixed Europe/Berlin Timezone. its not really built for sharing) But, aren't that for nextcloud? I'm searching for a document Programm with full text search and ocr. I'm scanning the letters i'm recieving. Also i'm fighting with an own bitwarden-Server for passwords.
flower Posted October 30, 2020 Posted October 30, 2020 2 minutes ago, dancgn said: But, aren't that for nextcloud? I'm searching for a document Programm with full text search and ocr. I'm scanning the letters i'm recieving. Also i'm fighting with an own bitwarden-Server for passwords. oh sorry. i was confused and though you are talking about nc. 1
djurny Posted October 30, 2020 Posted October 30, 2020 Hi, After fixing the LED issue, I started to try out if snapraid is working. On the Helios4 snapraid ran into some issues due to the amount of files available on the snapraid "array"; 32bit addressing constraints caused snapraid to bork out regularly. No matter the snapraid configuration tweaking/trial & error applied, it kept on requiring more than 4GB of addressing space. After running "sync" and "scrub" for the first time on the Helios64, I noticed a more than comfortable amount of alleged ata I/O errors like below: ata1.00: failed command: READ FPDMA QUEUED Spoiler Oct 29 21:13:46 localhost kernel: [642537.722453] ata1.00: exception Emask 0x2 SAct 0x80018000 SErr 0x400 action 0x6 Oct 29 21:13:46 localhost kernel: [642537.723136] ata1.00: irq_stat 0x08000000 Oct 29 21:13:46 localhost kernel: [642537.723514] ata1: SError: { Proto } Oct 29 21:13:46 localhost kernel: [642537.723854] ata1.00: failed command: READ FPDMA QUEUED Oct 29 21:13:46 localhost kernel: [642537.724351] ata1.00: cmd 60/00:78:40:bf:22/02:00:1e:00:00/40 tag 15 ncq dma 262144 in Oct 29 21:13:46 localhost kernel: [642537.724351] res 40/00:f8:58:23:61/00:00:00:00:00/40 Emask 0x2 (HSM violation) Oct 29 21:13:46 localhost kernel: [642537.725771] ata1.00: status: { DRDY } Oct 29 21:13:46 localhost kernel: [642537.726120] ata1.00: failed command: READ FPDMA QUEUED Oct 29 21:13:46 localhost kernel: [642537.726685] ata1.00: cmd 60/00:80:40:c1:22/02:00:1e:00:00/40 tag 16 ncq dma 262144 in Oct 29 21:13:46 localhost kernel: [642537.726685] res 40/00:f8:58:23:61/00:00:00:00:00/40 Emask 0x2 (HSM violation) Oct 29 21:13:46 localhost kernel: [642537.728134] ata1.00: status: { DRDY } Oct 29 21:13:46 localhost kernel: [642537.728485] ata1.00: failed command: READ FPDMA QUEUED Oct 29 21:13:46 localhost kernel: [642537.728989] ata1.00: cmd 60/08:f8:58:23:61/00:00:00:00:00/40 tag 31 ncq dma 4096 in Oct 29 21:13:46 localhost kernel: [642537.728989] res 40/00:f8:58:23:61/00:00:00:00:00/40 Emask 0x2 (HSM violation) Oct 29 21:13:46 localhost kernel: [642537.730432] ata1.00: status: { DRDY } Oct 29 21:13:46 localhost kernel: [642537.730792] ata1: hard resetting link Oct 29 21:13:47 localhost kernel: [642538.206413] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Oct 29 21:13:47 localhost kernel: [642538.207993] ata1.00: configured for UDMA/133 Oct 29 21:13:47 localhost kernel: [642538.208300] sd 0:0:0:0: [sdb] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Oct 29 21:13:47 localhost kernel: [642538.208319] sd 0:0:0:0: [sdb] tag#15 Sense Key : 0x5 [current] Oct 29 21:13:47 localhost kernel: [642538.208336] sd 0:0:0:0: [sdb] tag#15 ASC=0x21 ASCQ=0x4 Oct 29 21:13:47 localhost kernel: [642538.208355] sd 0:0:0:0: [sdb] tag#15 CDB: opcode=0x88 88 00 00 00 00 00 1e 22 bf 40 00 00 02 00 00 00 Oct 29 21:13:47 localhost kernel: [642538.208373] blk_update_request: I/O error, dev sdb, sector 505593664 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 0 Oct 29 21:13:47 localhost kernel: [642538.209577] sd 0:0:0:0: [sdb] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Oct 29 21:13:47 localhost kernel: [642538.209595] sd 0:0:0:0: [sdb] tag#16 Sense Key : 0x5 [current] Oct 29 21:13:47 localhost kernel: [642538.209610] sd 0:0:0:0: [sdb] tag#16 ASC=0x21 ASCQ=0x4 Oct 29 21:13:47 localhost kernel: [642538.209626] sd 0:0:0:0: [sdb] tag#16 CDB: opcode=0x88 88 00 00 00 00 00 1e 22 c1 40 00 00 02 00 00 00 Oct 29 21:13:47 localhost kernel: [642538.209642] blk_update_request: I/O error, dev sdb, sector 505594176 op 0x0:(READ) flags 0x80700 phys_seg 63 prio class 0 Oct 29 21:13:47 localhost kernel: [642538.210826] ata1: EH complete After some searching around on the internet, it appeared that limiting SATA link speed, these errors can be prevented. Checking other server deployments, this behavior was also seen in a 8 disk mdadm RAID setup, whre [new] WD blue disks also show these READ FPDMA QUEUED erros, which disappeared after ata error handling starts to turn down SATA link speeds to 3Gbps. To test this out, I added the following to /boot/armbianEnv.txt: extraargs=libata.force=3.0 Upon rebooting the box, it appears that libata indeed limited the SATA link speed for all drives to 3Gbps: Oct 29 22:01:59 localhost kernel: [ 3.143259] ata1: FORCE: PHY spd limit set to 3.0Gbps Oct 29 22:01:59 localhost kernel: [ 3.143728] ata1: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010100 irq 238 Oct 29 22:01:59 localhost kernel: [ 3.143736] ata2: FORCE: PHY spd limit set to 3.0Gbps Oct 29 22:01:59 localhost kernel: [ 3.144192] ata2: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010180 irq 239 Oct 29 22:01:59 localhost kernel: [ 3.144199] ata3: FORCE: PHY spd limit set to 3.0Gbps Oct 29 22:01:59 localhost kernel: [ 3.144654] ata3: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010200 irq 240 Oct 29 22:01:59 localhost kernel: [ 3.144661] ata4: FORCE: PHY spd limit set to 3.0Gbps Oct 29 22:01:59 localhost kernel: [ 3.145115] ata4: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010280 irq 241 Oct 29 22:01:59 localhost kernel: [ 3.145122] ata5: FORCE: PHY spd limit set to 3.0Gbps Oct 29 22:01:59 localhost kernel: [ 3.145603] ata5: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010300 irq 242 Redoing the snapraid scrub, the READ FPDMA QUEUED errors indeed had disappeared. As the disks in the box are WD red HDDs, there is not really a point of having 6Gbps (~600MB/s) SATA linkspeed anyway, disk performance is rated at less than 300MB/s throughput. (Occasionally it tips sustained sequential reads around 130MiB/s for large files.) Note that YMMV. Groetjes,
fromport Posted October 31, 2020 Posted October 31, 2020 16 hours ago, usual user said: Out of curiosity, with mainline kernel and ondemand governor in place. Apply this: echo 40000 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/sampling_rate echo 465000 > /sys/devices/system/cpu/cpufreq/policy4/ondemand/sampling_rate Does it still crash in your use case? If it does not crash any longer I will explain what is going on. I just switched to "ondemand' and your settings: Quote sh -x /usr/local/bin/switchcpuondemand + cpufreq-set -c 0 -g ondemand + cpufreq-set -c 4 -g ondemand + echo 40000 + echo 465000 Now wait *UPDATE* Crashed within _minutes_ On the serial console Quote root@filer1:~# [70583.943650] BUG: spinlock lockup suspected on CPU#0, kswapd0/70 [70583.947652] lock: 0xffffffc0f7f3f200, .magic: dead4ead, .owner: swapper/5/0, .owner_cpu: 5 [70583.974491] BUG: spinlock lockup suspected on CPU#4, kworker/4:0/4392 [70583.977415] lock: 0xffffffc0f7f3f200, .magic: dead4ead, .owner: swapper/5/0, .owner_cpu: 5 [70583.978539] BUG: spinlock lockup suspected on CPU#2, kworker/2:1/12393 [70583.978545] lock: 0xffffffc0f7f3f200, .magic: dead4ead, .owner: swapper/5/0, .owner_cpu: 5
gprovost Posted October 31, 2020 Author Posted October 31, 2020 Hi All, we finally figured out the issue with the latest build (20.08.13) and actually it wasn't related to the Kernel. There was a silly mistake in Armbian hardware optimization script that ended up skipping applying all the required performance and stability tweak for Helios64. This has been fixed. Now we need to wait new image and dpkg are generated and propagated on the repos. Actually the instructions we gave on our latest blog post, were a bit pointless since it wasn't bound to kernel version We will post something on our blog as soon as new Armbian version is ready. @djurny Thank for the Snapraid feedback and suggested tweak on SATA speed to avoid I/O errors. We will to look that closely to confirm it's not rather an issue with your harness cable. SATA 1 port is the one with the longest cable could be look closely here. @fromport Yeah I think at the end it wasn't really a DVFS issue but just missing hw optimization tweak as stated above. For now stick to performance governor and once new armbian dpkg for Helios64 are ready and you upgraded then you can try to revert to on-demand governor (default settings) and let us now how it works. @eClapton Noted on the instructions / clarification required. We are improving our Wiki every week. Hot swap procedure is a good point ;-) Didn't have this one on our TODO list. We also welcome any contribution to your wiki : https://github.com/kobol-io/wiki @Dmitry Antipov Thanks for the pointer we will look at it. Doesn't seem anyone try to port rockchip RNG engine to mainline. I guess we could add it via patch to Armbiam @piter75 In a kind of similar topic people asked us how crypto engine works on RK3399, it's something we will address soon. @Jaques-Ludwig Yeah as pointed by @Werner you can refer to official install doc You can also look for some pointer to an old guide we did for our previous product, it's the same steps, just that the components version are older : https://wiki.kobol.io/helios4/nextcloud/ But anyhow would be great that the install via armbian-config works properky. Let's put that on the TODO list. @dancgn You should create a dedicated Armbian thread on Mayan install via armband-config since it's not really related to hardware. Would be easier to troubleshot other it's going to get lost here. You can also rise an issue on here with detailed logs. Armbian-config tool is in big need of contributors, so don't expect it will be fix so soon, maybe you can try to figure it out yourself and fix it in armbian-config ;-) 1
fromport Posted October 31, 2020 Posted October 31, 2020 @gprovostmy armbian is currenly "stuck" on legacy since I went back to 4.4 kernel. With armbian-config there is no 'switch' option to switch from legacy to current or dev. Any suggestion how I could switch so that I can install the new kernel when it becomes available?
Jaques-Ludwig Posted October 31, 2020 Posted October 31, 2020 (edited) Now I have installed Nextcloud 20. I installed it with Portainer. As Database I installed mariadb. All works perfect now and the installation was easy. Jaques-Ludwig Edited October 31, 2020 by Jaques-Ludwig
usual user Posted October 31, 2020 Posted October 31, 2020 9 hours ago, fromport said: Crashed within _minutes_ It was only a shot in the dark. I was inspired by this post. Attempting to trigger a next frequency change while a previous one is still in progress would have been a good explanation, as the crash appears to occur through dynamic frequency scaling. But having this configuration right is a good idea in any case. It's not always about it working or not. Most of the time, it's about not working as well as possible and all the small drawbacks add up in overall performance.
Igor Posted October 31, 2020 Posted October 31, 2020 Updated images and packages via apt that affect Helios64 https://github.com/armbian/build/pull/2292
ebin-dev Posted October 31, 2020 Posted October 31, 2020 5 hours ago, Igor said: Updated images and packages via apt that affect Helios64 Update to Armbian 20.08.21 Buster with Linux 5.8.17-rockchip64 was without any issues. The only little glitch I could find in syslog relates to Armbian hardware optimization: Oct 31 19:53:46 helios64 armbian-hardware-optimization[701]: /usr/lib/armbian/armbian-hardware-optimization: line 214: /proc/irq/$(awk -F":" "/eth0/ {print \$1}" </proc/interrupts | sed 's/\ //g')/smp_affinity: No such file or directory
Werner Posted October 31, 2020 Posted October 31, 2020 19 minutes ago, ebin-dev said: Update to Armbian 20.08.21 Buster with Linux 5.8.17-rockchip64 was without any issues. The only little glitch I could find in syslog relates to Armbian hardware optimization: Oct 31 19:53:46 helios64 armbian-hardware-optimization[701]: /usr/lib/armbian/armbian-hardware-optimization: line 214: /proc/irq/$(awk -F":" "/eth0/ {print \$1}" </proc/interrupts | sed 's/\ //g')/smp_affinity: No such file or directory Hm shouldn't be the optimizations for rockchip64 used here? https://github.com/armbian/build/blob/5b43356d17d4e28d4f0453cc65529e7d73dffc7f/packages/bsp/common/usr/lib/armbian/armbian-hardware-optimization#L214
piter75 Posted October 31, 2020 Posted October 31, 2020 1 hour ago, ebin-dev said: The only little glitch I could find in syslog relates to Armbian hardware optimization: Oct 31 19:53:46 helios64 armbian-hardware-optimization[701]: /usr/lib/armbian/armbian-hardware-optimization: line 214: /proc/irq/$(awk -F":" "/eth0/ {print \$1}" </proc/interrupts | sed 's/\ //g')/smp_affinity: No such file or directory There seems to be a race condition involving eth0 device discovery and armbian-hardware-optimize service. Restarting the service after logging in with: systemctl restart armbian-hardware-optimize does not exhibit this issue. 49 minutes ago, Werner said: Hm shouldn't be the optimizations for rockchip64 used here? Nope, the board is based on rk3399 and it uses separate optimisations - this is part of the ongoing rockchip64 / rk3399 mixup saga ;-)
wurmfood Posted October 31, 2020 Posted October 31, 2020 I'm having a problem where ata5 on my device isn't responding. From dmesg I get the following: [ 3.499599] ata1: SATA link down (SStatus 0 SControl 300) [ 3.811439] ata2: SATA link down (SStatus 0 SControl 300) [ 4.123422] ata3: SATA link down (SStatus 0 SControl 300) [ 4.435415] ata4: SATA link down (SStatus 0 SControl 300) [ 4.747416] ata5: SATA link down (SStatus 0 SControl 300) [ 5.461077] ata5: SATA link down (SStatus 0 SControl 300) [ 6.185068] ata5: SATA link down (SStatus 0 SControl 300) [ 6.913043] ata5: SATA link down (SStatus 0 SControl 300) [ 7.637054] ata5: SATA link down (SStatus 0 SControl 300) [ 8.361129] ata5: SATA link down (SStatus 0 SControl 300) [ 9.081132] ata5: SATA link down (SStatus 0 SControl 300) [ 9.805046] ata5: SATA link down (SStatus 0 SControl 300) [ 10.517028] ata5: SATA link down (SStatus 0 SControl 300) [ 11.237033] ata5: SATA link down (SStatus 0 SControl 300) [ 11.952977] ata5: SATA link down (SStatus 0 SControl 300) [ 12.677074] ata5: SATA link down (SStatus 0 SControl 300) That will keep going as long as I have a drive plugged into ata5. However, the same drive plugged into any other spot works just fine. To help narrow down the problem, I disconnected the SATA cables from spots 4 and 5 and switched them. Now spot 5 (plugged into 4) works. It looks like it's not a problem with the connection or the drive, but rather with the ata5 connection on the board. Any ideas on what I should do?
djurny Posted November 1, 2020 Posted November 1, 2020 Hi, Are there any plans to make a toddler-proof version of the front grille, that will cover the buttons? Currently I just applied some lofi containment by simply flipping the front grille so it covers the front panel. Perhaps some snap-in plexiglass for the panel cutout, with a little doorknob type of thing? Have not checked if the buttons can be disabled in software yet (https://wiki.kobol.io/helios64/button/), perhaps the PMIC can be programmed in user space? Groetjes,
357up Posted November 1, 2020 Posted November 1, 2020 9 hours ago, wurmfood said: I'm having a problem where ata5 on my device isn't responding. From dmesg I get the following: [ 3.499599] ata1: SATA link down (SStatus 0 SControl 300) [ 3.811439] ata2: SATA link down (SStatus 0 SControl 300) [ 4.123422] ata3: SATA link down (SStatus 0 SControl 300) [ 4.435415] ata4: SATA link down (SStatus 0 SControl 300) [ 4.747416] ata5: SATA link down (SStatus 0 SControl 300) [ 5.461077] ata5: SATA link down (SStatus 0 SControl 300) [ 6.185068] ata5: SATA link down (SStatus 0 SControl 300) [ 6.913043] ata5: SATA link down (SStatus 0 SControl 300) [ 7.637054] ata5: SATA link down (SStatus 0 SControl 300) [ 8.361129] ata5: SATA link down (SStatus 0 SControl 300) [ 9.081132] ata5: SATA link down (SStatus 0 SControl 300) [ 9.805046] ata5: SATA link down (SStatus 0 SControl 300) [ 10.517028] ata5: SATA link down (SStatus 0 SControl 300) [ 11.237033] ata5: SATA link down (SStatus 0 SControl 300) [ 11.952977] ata5: SATA link down (SStatus 0 SControl 300) [ 12.677074] ata5: SATA link down (SStatus 0 SControl 300) That will keep going as long as I have a drive plugged into ata5. However, the same drive plugged into any other spot works just fine. To help narrow down the problem, I disconnected the SATA cables from spots 4 and 5 and switched them. Now spot 5 (plugged into 4) works. It looks like it's not a problem with the connection or the drive, but rather with the ata5 connection on the board. Any ideas on what I should do? I'm having the same issue with SATA 2 port. Spoiler [ 3.995846] ata2: SATA link down (SStatus 0 SControl 300) [ 610.677957] ata2: SATA link down (SStatus 0 SControl 300) [ 616.008066] ata2: SATA link down (SStatus 0 SControl 300) [ 621.383941] ata2: SATA link down (SStatus 0 SControl 300) [ 8270.418638] ata2: SATA link down (SStatus 0 SControl 300) [ 8275.824876] ata2: SATA link down (SStatus 0 SControl 300) [ 8281.200684] ata2: SATA link down (SStatus 0 SControl 300) [10315.187401] ata2: SATA link down (SStatus 0 SControl 300) Disk is absolutely fine. Could the port be blown?
AurelianRQ Posted November 1, 2020 Posted November 1, 2020 Apparently Kobol team recommended to stay off the new kernels and to use their recommended ones. The problem is that every few days I get the Red System error led flashing and the box is not reachable so I have to force restart it so I can have access to it again so far I get this : while seeing the dmessages I get : [ 0.017590] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware [ 2.656869] Serial: AMBA driver [ 2.658516] cacheinfo: Unable to detect cache hierarchy for CPU 0 [ 2.670600] loop: module loaded [ 3.034951] pci 0000:00:00.0: PME# supported from D0 D1 D3hot [ 3.040546] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 3.040753] pci 0000:01:00.0: [197b:0585] type 00 class 0x010601 [ 3.040874] pci 0000:01:00.0: reg 0x10: initial BAR value 0x00000000 invalid [ 3.040882] pci 0000:01:00.0: reg 0x10: [io size 0x0080] [ 3.040917] pci 0000:01:00.0: reg 0x14: initial BAR value 0x00000000 invalid [ 3.040925] pci 0000:01:00.0: reg 0x14: [io size 0x0080] [ 3.040959] pci 0000:01:00.0: reg 0x18: initial BAR value 0x00000000 invalid [ 3.040966] pci 0000:01:00.0: reg 0x18: [io size 0x0080] [ 3.041000] pci 0000:01:00.0: reg 0x1c: initial BAR value 0x00000000 invalid [ 3.041007] pci 0000:01:00.0: reg 0x1c: [io size 0x0080] [ 3.041041] pci 0000:01:00.0: reg 0x20: initial BAR value 0x00000000 invalid [ 3.041048] pci 0000:01:00.0: reg 0x20: [io size 0x0080] [ 3.041083] pci 0000:01:00.0: reg 0x24: [mem 0x00000000-0x00001fff] [ 3.041118] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref] [ 3.041156] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 512 [ 3.047177] pci_bus 0000:01: busn_res: [bus 01-1f] end is updated to 01 [ 3.047208] pci 0000:00:00.0: BAR 14: assigned [mem 0xfa000000-0xfa0fffff] [ 3.047232] pci 0000:01:00.0: BAR 6: assigned [mem 0xfa000000-0xfa00ffff pref] [ 3.047242] pci 0000:01:00.0: BAR 5: assigned [mem 0xfa010000-0xfa011fff] [ 3.047262] pci 0000:01:00.0: BAR 0: no space for [io size 0x0080] [ 3.047270] pci 0000:01:00.0: BAR 0: failed to assign [io size 0x0080] [ 3.047278] pci 0000:01:00.0: BAR 1: no space for [io size 0x0080] [ 3.047285] pci 0000:01:00.0: BAR 1: failed to assign [io size 0x0080] [ 3.047292] pci 0000:01:00.0: BAR 2: no space for [io size 0x0080] [ 3.047299] pci 0000:01:00.0: BAR 2: failed to assign [io size 0x0080] [ 3.047306] pci 0000:01:00.0: BAR 3: no space for [io size 0x0080] [ 3.047313] pci 0000:01:00.0: BAR 3: failed to assign [io size 0x0080] [ 3.047320] pci 0000:01:00.0: BAR 4: no space for [io size 0x0080] [ 3.047327] pci 0000:01:00.0: BAR 4: failed to assign [io size 0x0080] [ 3.047338] pci 0000:00:00.0: PCI bridge to [bus 01] [ 9.989921] input: adc-keys as /devices/platform/adc-keys/input/input1 [ 10.059999] rk_gmac-dwmac fe300000.ethernet: IRQ eth_wake_irq not found [ 10.060011] rk_gmac-dwmac fe300000.ethernet: IRQ eth_lpi not found [ 10.060164] rk_gmac-dwmac fe300000.ethernet: PTP uses main clock [ 10.060327] rk_gmac-dwmac fe300000.ethernet: clock input or output? (input). [ 10.060339] rk_gmac-dwmac fe300000.ethernet: TX delay(0x28). [ 10.060349] rk_gmac-dwmac fe300000.ethernet: RX delay(0x20). [ 10.060364] rk_gmac-dwmac fe300000.ethernet: integrated PHY? (no). [ 10.060450] rk_gmac-dwmac fe300000.ethernet: cannot get clock clk_mac_speed [ 10.060458] rk_gmac-dwmac fe300000.ethernet: clock input from PHY [ 10.065534] rk_gmac-dwmac fe300000.ethernet: init for RGMII 11.571344] dw-apb-uart ff1a0000.serial: forbid DMA for kernel console [ 13.327166] usb 2-1.4: reset SuperSpeed Gen 1 USB device number 3 using xhci-hcd [ 13.730681] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: user_xattr,acl [ 13.875498] zram: Added device: zram0 [ 13.876283] zram: Added device: zram1 [ 13.876953] zram: Added device: zram2 [ 13.916911] zram1: detected capacity change from 0 to 1992331264 [ 14.049513] cdn-dp fec00000.dp: Direct firmware load for rockchip/dptx.bin failed with error -2 [ 14.049764] rockchip-drm display-subsystem: [drm] Cannot find any crtc or sizes [ 14.977397] Adding 1945632k swap on /dev/zram1. Priority:5 extents:1 across:1945632k SSFS [ 15.109777] zram0: detected capacity change from 0 to 52428800 [ 16.065460] cdn-dp fec00000.dp: Direct firmware load for rockchip/dptx.bin failed with error -2 [ 17.158086] systemd[1]: Started Armbian ZRAM config. [ 17.163966] systemd[1]: Starting Armbian memory supported logging... [ 17.224816] EXT4-fs (zram0): mounted filesystem without journal. Opts: discard [ 17.224855] ext4 filesystem being mounted at /var/log supports timestamps until 2038 (0x7fffffff) [ 19.275437] systemd[1]: Started Armbian memory supported logging. [ 19.280814] systemd[1]: Starting Journal Service... [ 19.430698] systemd[1]: Started Journal Service. [ 19.478887] systemd-journald[727]: Received request to flush runtime journal from PID 1 [ 19.603930] random: crng init done [ 19.603939] random: 7 urandom warning(s) missed due to ratelimiting [ 19.852787] cfg80211: Loading compiled-in X.509 certificates for regulatory database [ 19.857488] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' [ 19.858097] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2 [ 19.858111] cfg80211: failed to load regulatory.db [ 19.928741] rk_gmac-dwmac fe300000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=POLL) [ 19.934085] rk_gmac-dwmac fe300000.ethernet eth0: No Safety Features support found [ 19.934109] rk_gmac-dwmac fe300000.ethernet eth0: PTP not supported by HW [ 19.934122] rk_gmac-dwmac fe300000.ethernet eth0: configuring for phy/rgmii link mode [ 20.112591] r8152 2-1.4:1.0 eth1: carrier on [ 20.193411] cdn-dp fec00000.dp: Direct firmware load for rockchip/dptx.bin failed with error -2 [ 21.368123] NFSD: Using UMH upcall client tracking operations. [ 21.368136] NFSD: starting 90-second grace period (net f00000a1) [ 13.462038] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): netif_napi_add() called with weight 256 and many many others. I don't see network activity as well on the front panel, is that pending as well ? It would be nice to have something functional that does not crash . Is this a hardware issue or just software issue ? Running your image 5.8.14-rockchip64 #20.08.10 SMP PREEMPT Tue Oct 13 16:58:01 CEST 2020 aarch64 GNU/Linux
AurelianRQ Posted November 1, 2020 Posted November 1, 2020 On 10/19/2020 at 9:47 AM, gprovost said: We will do an announcement soon but we realized we made a design mistake on the Helios64 2.5GbE port (LAN2 | eth1) which makes it not perform well in 1000Mb/s mode, however no impact in 2500baseT mode. Regarding your iperf test, seems a bit funny. You sure your routing table is correct for traffic to go really to the expected interface since you are on the same subnet. Can you do the same tests, but for each test disconnect the cable of the interface you are not testing.... or at least use different subnet. Also can you check interface speed with ethtool (ethtool eth0 and ethtool eth1). Could we please know what is the design mistake and at least that means that we cannot use the second port ? so we have a 2.5G port and if we want to use it as 1Gbps it fails I assume, is there any fix for this or not ? as I'm currently using this as 1Gbps connection and it seems that every few days I get a kernel crash or something as the system error led gets blinking and I do have to keep on force restart it from the button to be able to have access to it. The funny part is that I have both connections on but once it fails I cannot reach to it from any network so completely dead .
AurelianRQ Posted November 1, 2020 Posted November 1, 2020 14 minutes ago, AurelianRQ said: Could we please know what is the design mistake and at least that means that we cannot use the second port ? so we have a 2.5G port and if we want to use it as 1Gbps it fails I assume, is there any fix for this or not ? as I'm currently using this as 1Gbps connection and it seems that every few days I get a kernel crash or something as the system error led gets blinking and I do have to keep on force restart it from the button to be able to have access to it. The funny part is that I have both connections on but once it fails I cannot reach to it from any network so completely dead . Just did a test and while on eth0 I get : [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-30.00 sec 3.18 GBytes 912 Mbits/sec 876 sender [ 5] 0.00-30.00 sec 3.18 GBytes 911 Mbits/sec receiver which is normal, same test on eth1 I get : [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-30.00 sec 403 MBytes 113 Mbits/sec 5610 sender [ 5] 0.00-30.00 sec 402 MBytes 113 Mbits/sec receiver And it seems between eth1 of both Helios 64 I get : [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-30.00 sec 348 MBytes 97.3 Mbits/sec 5817 sender [ 5] 0.00-30.00 sec 348 MBytes 97.2 Mbits/sec receiver which Is a total mess and I guess I cannot use this connection.
flower Posted November 1, 2020 Posted November 1, 2020 48 minutes ago, AurelianRQ said: which Is a total mess and I guess I cannot use this connection. you could get yourself a 2.5gbe switch (in case you need both connections, eg for a management net). they are not expensive anymore
bverhagen Posted November 1, 2020 Posted November 1, 2020 Hi all, Last week I rendered my device unbootable by upgrading to 20.08.13 (it is stuck in U-boot rather than in starting the kernel though). This is 'known territory' though. So starting again from a vanilla install, 20.08.10 gets stuck in U-boot too, with the following error: Spoiler In channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! change freq to 416MHz 0,1 Channel 0: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB Channel 1: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB 256B stride channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done change freq to 856MHz 1,0 ch 0 ddrconfig = 0x101, ddrsize = 0x40 ch 1 ddrconfig = 0x101, ddrsize = 0x40 pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD ddr_set_rate to 328MHZ ddr_set_rate to 666MHZ ddr_set_rate to 928MHZ channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done ddr_set_rate to 416MHZ, ctl_index 0 ddr_set_rate to 856MHZ, ctl_index 1 support 416 856 328 666 928 MHz, current 856MHz OUT Boot1: 2019-03-14, version: 1.19 CPUId = 0x0 ChipType = 0x10, 253 SdmmcInit=2 0 BootCapSize=100000 UserCapSize=14910MB FwPartOffset=2000 , 100000 mmc0:cmd5,20 SdmmcInit=0 0 BootCapSize=0 UserCapSize=14792MB FwPartOffset=2000 , 0 StorageInit ok = 66004 SecureMode = 0 SecureInit read PBA: 0x4 SecureInit read PBA: 0x404 SecureInit read PBA: 0x804 SecureInit read PBA: 0xc04 SecureInit read PBA: 0x1004 SecureInit read PBA: 0x1404 SecureInit read PBA: 0x1804 SecureInit read PBA: 0x1c04 SecureInit ret = 0, SecureMode = 0 atags_set_bootdev: ret:(0) GPT 0x3380ec0 signature is wrong recovery gpt... GPT 0x3380ec0 signature is wrong recovery gpt fail! LoadTrust Addr:0x4000 No find bl30.bin No find bl32.bin Load uboot, ReadLba = 2000 Load OK, addr=0x200000, size=0xded88 RunBL31 0x40000 NOTICE: BL31: v1.3(debug):42583b6 NOTICE: BL31: Built : 07:55:13, Oct 15 2019 NOTICE: BL31: Rockchip release version: v1.1 INFO: GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3 INFO: Using opteed sec cpu_context! INFO: boot cpu mask: 0 INFO: plat_rockchip_pmu_init(1190): pd status 3e INFO: BL31: Initializing runtime services WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE initialization. SMC`s destined for OPTEE will return SMC_UNK ERROR: Error initializing runtime service opteed_fast INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x200000 INFO: SPSR = 0x3c9 U-Boot 2020.07-armbian (Sep 23 2020 - 17:44:09 +0200) SoC: Rockchip rk3399 Reset cause: POR DRAM: 3.9 GiB PMIC: RK808 SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB MMC: mmc@fe320000: 1, sdhci@fe330000: 0 Loading Environment from MMC... unable to select a mode *** Warning - No block device, using default environment In: serial Out: serial Err: serial Model: Helios64 Revision: 1.2 - 4GB non ECC Net: eth0: ethernet@fe300000 scanning bus for devices... Hit any key to stop autoboot: 0 switch to partitions #0, OK mmc1 is current device Scanning mmc 1:1... Found U-Boot script /boot/boot.scr 3185 bytes read in 5 ms (622.1 KiB/s) ## Executing script at 00500000 Boot script loaded from mmc 1 166 bytes read in 5 ms (32.2 KiB/s) 14089265 bytes read in 599 ms (22.4 MiB/s) 27275776 bytes read in 1156 ms (22.5 MiB/s) libfdt fdt_check_header(): FDT_ERR_BADMAGIC No FDT memory address configured. Please configure the FDT address via "fdt addr <address>" command. Aborting! 2698 bytes read in 8 ms (329.1 KiB/s) Applying kernel provided DT fixup script (rockchip-fixup.scr) ## Executing script at 09000000 ## Loading init Ramdisk from Legacy Image at 06000000 ... Image Name: uInitrd Image Type: AArch64 Linux RAMDisk Image (gzip compressed) Data Size: 14089201 Bytes = 13.4 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ERROR: Did not find a cmdline Flattened Device Tree Loading Ramdisk to f5174000, end f5ee3bf1 ... OK FDT and ATAGS support not compiled in - hanging ### ERROR ### Please RESET the board ### Installing to either SD or the eMMC rendered the same issue. I had to revert to 20.08.4 for my device to properly boot again (from SD, as eMMC does not work yet for this release). Since the 5.8.14 kernel is considered stable, I followed https://blog.kobol.io/2020/10/27/helios64-software-issue/ to upgrade to that kernel from 20.08.4 (using armbian-config). Fearing I would draw in the bad 5.8.16 kernel, I did not update anything else from 20.08.4. Unfortunately, this once again rendered my device unbootable, with the same error message as above. So for me, 20.08.10 seems to be not stable too. The Helios64 wiki claims it is though and many users of this forum seem to have no issues with 20.08.10 either. After another fresh install of 20.08.4 - and no further upgrades - the Helios 64 is running stable for over a week now (it had been running on this version stable for a month previously too). So this finally gets me to my question: Does anyone have any idea what goes wrong here? Does anyone have any suggestions on how to debug this? Why is my device unable to boot a fresh image for the 'stable' 20.08.10 release? At no point in the lifetime of the NAS did I tinker with U-boot.
kirkusinnc Posted November 1, 2020 Posted November 1, 2020 Subject: Question on disk usage I completed my build of my Helios64 with 5 12tb WD drives in a Raid 5 array and got it up and running with only minor issues. (I had to manually install Open Media Vault as the box would appear to hang if I attempted to install using armbian-config. ) I've transferred over about 16TB of movies without issues and got Plex configured to use it. All is working well! The one issue I've seen is the box hard drive lights will show lots of blinking and activity for several hours when there is nobody or no one accessing the box at all. If I reboot, the lights/activity start up as soon as it reboots. But, most of the time the lights are solid lit when there is no activity as I would expect. Is there some kind of housekeeping being done periodically or other reason for the hard drives to be accessed? Kirk
flower Posted November 1, 2020 Posted November 1, 2020 12 minutes ago, kirkusinnc said: Subject: Question on disk usage I completed my build of my Helios64 with 5 12tb WD drives in a Raid 5 array and got it up and running with only minor issues. (I had to manually install Open Media Vault as the box would appear to hang if I attempted to install using armbian-config. ) I've transferred over about 16TB of movies without issues and got Plex configured to use it. All is working well! The one issue I've seen is the box hard drive lights will show lots of blinking and activity for several hours when there is nobody or no one accessing the box at all. If I reboot, the lights/activity start up as soon as it reboots. But, most of the time the lights are solid lit when there is no activity as I would expect. Is there some kind of housekeeping being done periodically or other reason for the hard drives to be accessed? Kirk probably your array is still building. what does cat /proc/mdstat show you?
RockBian Posted November 1, 2020 Posted November 1, 2020 On my system (kernel 5.8.17) a thread kworker/fusb302_wq was running, permanently eating 18% of one core, and causing the load average to be always at least 1.0. The module fusb30x could not be unloaded, rmmod just hung, although it's use count was zero. After blacklisting the module and rebooting the problem was solved. Now is the question: what did I disable? Some googling gave me this document: Quote The FUSB301 is a fullyautonomous Type-C controller optimized for <15W applications.The FUSB301 offers CClogic detection for Source Mode, Sink Mode, Dual Role Port Mode, accessory detection support, and dead battery support. The FUSB301 features anexternal switch pin (SS_SW) to enable an external USB Super Speed Switch without interrupting the processor. The FUSB301 features ultra low power during operation, and an ultra thin, 10-Lead TMLP package. So it has only to do with the USB-C port? The serial connection is still working fine. 1
TonyMac32 Posted November 1, 2020 Posted November 1, 2020 1 hour ago, RockBian said: what did I disable? power delivery through USB-C. Oh and mode switching. So if it set to the mode you want by default it should be ok, but the better question is why was it eating clock cycles
Recommended Posts