grunlab Posted December 13, 2019 Posted December 13, 2019 Hi Armbian Support Team, I'm running a Kubernetes cluster on top of 8 odroid HC1 cards (3 master nodes & 5 workers nodes) - Armbian used version is : adrien@bilbon:~/git/k8s$ kc get node master-01 -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master-01 Ready master 44d v1.17.0 192.168.0.101 <none> Debian GNU/Linux 10 (buster) 4.14.150-odroidxu4 docker://19.3.5 - I'm currently observing those following events at Kubernetes level: adrien@bilbon:~/git/k8s$ kc get events --all-namespaces NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE default 4m55s Warning FailedNodeAllocatableEnforcement node/master-01 Failed to update Node Allocatable Limits ["kubepods"]: failed to set supported cgroup subsystems for cgroup [kubepods]: failed to find subsystem mount for required subsystem: pids default 2m5s Warning FailedNodeAllocatableEnforcement node/master-02 Failed to update Node Allocatable Limits ["kubepods"]: failed to set supported cgroup subsystems for cgroup [kubepods]: failed to find subsystem mount for required subsystem: pids default 4m35s Warning FailedNodeAllocatableEnforcement node/master-03 Failed to update Node Allocatable Limits ["kubepods"]: failed to set supported cgroup subsystems for cgroup [kubepods]: failed to find subsystem mount for required subsystem: pids default 18s Warning FailedNodeAllocatableEnforcement node/worker-01 Failed to update Node Allocatable Limits ["kubepods"]: failed to set supported cgroup subsystems for cgroup [kubepods]: failed to find subsystem mount for required subsystem: pids ... - After a quick googling, it looks that the issue is linked to the fact that PIDS cgroup is not enabled at Armbian kernel level. Effectively, it is the case ... no pids cgroup found: adrien@master-01:~$ cat /proc/cgroups #subsys_name hierarchy num_cgroups enabled cpuset 5 29 1 cpu 2 69 1 cpuacct 2 69 1 blkio 8 69 1 memory 4 125 1 devices 3 69 1 freezer 7 29 1 net_cls 6 29 1 net_prio 6 29 1 - This issue has also been reported on RaspberryPI in the same Kubernetes context ... this issue has been solved by enabling PIDs cgroup at Raspbian kernel level: https://github.com/raspberrypi/linux/pull/2968 So, could it be possible to enable PIDs cgroup at kernel level in the next release of Armbian ? Thank you for your support, Regards Adrien
Igor Posted December 13, 2019 Posted December 13, 2019 24 minutes ago, Adrien Gruneisen said: So, could it be possible to enable PIDs cgroup at kernel level in the next release of Armbian ? https://github.com/armbian/build/commit/755388147d65a12d39b31898be29434f999700c8 It was the only config where this was not enabled Spoiler linux-imx6-current.config:CONFIG_CGROUP_PIDS=y linux-imx7d-legacy.config:CONFIG_CGROUP_PIDS=y linux-meson64-current.config:CONFIG_CGROUP_PIDS=y linux-meson64-dev.config:CONFIG_CGROUP_PIDS=y linux-meson64-legacy.config:CONFIG_CGROUP_PIDS=y linux-meson-current.config:CONFIG_CGROUP_PIDS=y linux-meson-dev.config:CONFIG_CGROUP_PIDS=y linux-mt7623-legacy.config:CONFIG_CGROUP_PIDS=y linux-mvebu64-current.config:CONFIG_CGROUP_PIDS=y linux-mvebu64-dev.config:CONFIG_CGROUP_PIDS=y linux-mvebu64-legacy.config:CONFIG_CGROUP_PIDS=y linux-mvebu-current.config:CONFIG_CGROUP_PIDS=y linux-mvebu-dev.config:CONFIG_CGROUP_PIDS=y linux-mvebu-legacy.config:CONFIG_CGROUP_PIDS=y linux-odroidn2-current.config:CONFIG_CGROUP_PIDS=y linux-odroidxu4-dev.config:CONFIG_CGROUP_PIDS=y linux-odroidxu4-legacy.config:# CONFIG_CGROUP_PIDS is not set linux-rk3399-legacy.config:CONFIG_CGROUP_PIDS=y linux-rockchip64-current.config:CONFIG_CGROUP_PIDS=y linux-rockchip64-dev.config:CONFIG_CGROUP_PIDS=y linux-rockchip64-legacy.config:CONFIG_CGROUP_PIDS=y linux-rockchip-current.config:CONFIG_CGROUP_PIDS=y linux-rockchip-legacy.config:CONFIG_CGROUP_PIDS=y linux-rockpis-legacy.config:CONFIG_CGROUP_PIDS=y linux-s5p6818-legacy.config:CONFIG_CGROUP_PIDS=y linux-sunxi64-current.config:CONFIG_CGROUP_PIDS=y linux-sunxi64-dev.config:CONFIG_CGROUP_PIDS=y linux-sunxi64-legacy.config:CONFIG_CGROUP_PIDS=y linux-sunxi-current.config:CONFIG_CGROUP_PIDS=y linux-sunxi-dev.config:CONFIG_CGROUP_PIDS=y linux-sunxi-legacy.config:CONFIG_CGROUP_PIDS=y
grunlab Posted December 13, 2019 Author Posted December 13, 2019 Hi Igor, Great :-) ... thank you so much ! Do you have an idea when the update will be available into the armbian repo ? Regards, Adrien
Igor Posted December 13, 2019 Posted December 13, 2019 Nightly builds in a few hours from now if compiled successfully ... Wrote on mobile
grunlab Posted December 16, 2019 Author Posted December 16, 2019 Hi Igor, The update is still not reflected into the repo ... is it normal ? adrien@master-01:~$ uname -a Linux master-01 4.14.150-odroidxu4 #1 SMP PREEMPT Mon Oct 28 07:56:57 CET 2019 armv7l GNU/Linux adrien@master-01:~$ date Mon Dec 16 20:50:42 CET 2019 adrien@master-01:~$ sudo apt update Hit:1 http://security.debian.org buster/updates InRelease Hit:3 http://repo.zabbix.com/zabbix/4.4/raspbian buster InRelease Hit:4 http://httpredir.debian.org/debian buster InRelease Hit:5 http://httpredir.debian.org/debian buster-updates InRelease Hit:6 http://httpredir.debian.org/debian buster-backports InRelease Hit:2 https://packages.cloud.google.com/apt kubernetes-xenial InRelease Hit:8 https://download.docker.com/linux/debian buster InRelease Hit:7 https://apt.armbian.com buster InRelease Reading package lists... Done Building dependency tree Reading state information... Done All packages are up to date. Thank you, Regards, Adrien
Igor Posted December 16, 2019 Posted December 16, 2019 10 minutes ago, grunlab said: is it normal Yes. I forget to mention that only beta repository is getting daily updates. apt.armbian.com -> beta.armbian.com
grunlab Posted December 16, 2019 Author Posted December 16, 2019 Ok, thank you for this clarification. I prefer to not use the beta repo and wait a bit more ... any idea when the update will be pushed to the stable repo ?
Igor Posted December 16, 2019 Posted December 16, 2019 15 minutes ago, grunlab said: any idea when the update will be pushed to the stable repo ? Updates are going out more or less this way: https://github.com/armbian/documentation/blob/master/docs/Process_Release-Model.md Without proper testing its gamble and that can costs us a lot.
Igor Posted December 16, 2019 Posted December 16, 2019 This particular kernel is being updated out of this order: https://github.com/armbian/upload/commits?author=belegdol so you can ask the guy, build and upload on your own ...
belegdol Posted December 17, 2019 Posted December 17, 2019 My Ubuntu VM is no longer booting to X since the last apt upgrade which makes preparing updates slightly more annoying than it should be, but I can have a go at rebuilding the kernel this evening. 2
belegdol Posted December 17, 2019 Posted December 17, 2019 (edited) I have good news and bad news. The good news is that apt upgrade has fixed my VM and that I was able to rebuild the kernel using the latest config from armbian/build git. The bad news is that odroid did not boot back up. I need to break out my UART cable and have a look what's up. ETA: it must have been intermittent network issues. I attached the cable and everything worked fine, I could ssh to the machine too. PR created: https://github.com/armbian/upload/pull/13 Edited December 17, 2019 by belegdol not booting was a false alarm it seems
Igor Posted December 17, 2019 Posted December 17, 2019 3 hours ago, belegdol said: it must have been intermittent network issues. Strangely @tkaiser is saying it doesn't work for him. Perhaps we need to build QA department ?
belegdol Posted December 18, 2019 Posted December 18, 2019 OK I would say roll the update back. I rebooted again to be sure and ssh is not responding again. Either plugging the UART has fixed the problem, or the longer power cycle needed to take off the cover to plug the UART in did. In your experience would you say it is possible the issue is caused by the cgroup itself? Or were there any other changes in the last two weeks which could be responsible? ETA: power cycle (no UART this time) has sorted the issue out for now. I will try this with UART connected tomorrow to see what is up - it could be that the device actually never reboots when requested, just kills the ssh.
Igor Posted December 18, 2019 Posted December 18, 2019 6 hours ago, belegdol said: OK I would say roll the update back. done.
belegdol Posted December 18, 2019 Posted December 18, 2019 So the device is booting, just not coming back up: Stopping User Manager for UID 0... Unmounting Mount shared folder yasy to /sharedfolders/aaa... [ OK ] Stopped target Graphical Interface. [ OK ] Stopped target RPC Port Mapper. Starting Beep before system shutdown... Unmounting Mount shared folder emu to /sharedfolders/bbb... Stopping watchdog daemon... Starting folder2ram systemd service... Stopping Authorization Manager... Stopping ACPI event daemon... Stopping Session 6 of user root. Unmounting Mount shared folder julian to /sharedfolders/ccc... Stopping pNFS block layout mapping daemon... Unmounting Mount shared folder Bild�…19 to /sharedfolders/Bilder2019... [ OK ] Stopped target Timers. [ OK ] Stopped Clean PHP session files every 30 mins. [ OK ] Stopped Trigger anacron every hour. [ OK ] Stopped Daily Cleanup of Temporary Directories. [ OK ] Stopped Daily apt upgrade and clean activities. [ OK ] Stopped Daily apt download activities. Unmounting Mount shared folder aust�…sch to /sharedfolders/austausch... [ OK ] Stopped pNFS block layout mapping daemon. [ OK ] Stopped ACPI event daemon. [ OK ] Stopped Authorization Manager. [ OK ] Stopped Session 6 of user root. [ OK ] Stopped User Manager for UID 0. [ OK ] Unmounted Mount shared folder aaa to /sharedfolders/aaa. [ OK ] Unmounted Mount shared folder bbb to /sharedfolders/bbb. [ OK ] Unmounted Mount shared folder ccc to /sharedfolders/ccc. [ OK ] Unmounted Mount shared folder Bilder2019 to /sharedfolders/Bilder2019. [ OK ] Unmounted Mount shared folder austausch to /sharedfolders/austausch. [ OK ] Stopped watchdog daemon. [ OK ] Stopped target Multi-User System. Stopping Initializes zram swaping... Stopping A high performance web server and a reverse proxy server... Stopping LSB: start or stop rrdcached... [ OK ] Stopped Generate the prelogin message. Stopping LSB: Start NTP daemon... [ OK ] Stopped The OpenMediaVault engine d�…on that processes the RPC request. Stopping LSB: Starts ProFTPD daemon... [ OK ] Stopped target Login Prompts. Stopping fast remote file copy program daemon... Stopping LSB: Advanced IEEE 802.11 management daemon... Stopping Unattended Upgrades Shutdown... Stopping Self Monitoring and Reporting Technology (SMART) Daemon... [ OK ] Stopped Postfix Mail Transport Agent. Stopping Postfix Mail Transport Agent (instance -)... Stopping Regular background program processing daemon... Stopping The PHP 7.0 FastCGI Process Manager... Stopping OpenBSD Secure Shell server... Stopping LSB: Set sysfs variables from /etc/sysfs.conf... Unmounting /srv/dev-disk-by-label-omv... [ OK ] Removed slice User Slice of root. Stopping Login Service... [ OK ] Unmounted /var/lib/docker/btrfs. [ OK ] Unmounted /var/folder2ram/var/log. [ OK ] Deactivated swap /dev/zram7. [ OK ] Deactivated swap /dev/zram6. [ OK ] Deactivated swap /dev/zram5. [ OK ] Deactivated swap /dev/zram4. [ OK ] Deactivated swap /dev/zram3. [ OK ] Deactivated swap /dev/zram2. [ OK ] Deactivated swap /dev/zram1. [ OK ] Deactivated swap /dev/zram0. [ OK ] Unmounted /var/lib/openmediavault/rrd. [ OK ] Unmounted /var/tmp. [ OK ] Unmounted /var/folder2ram/var/tmp. [ OK ] Unmounted /var/folder2ram/var/lib/openmediavault/rrd. [ OK ] Stopped System Logging Service. [ OK ] Stopped fast remote file copy program daemon. [ OK ] Stopped Login Service. [ OK ] Stopped Self Monitoring and Reporting Technology (SMART) Daemon. [ OK ] Stopped Regular background program processing daemon. [ OK ] Stopped The PHP 7.0 FastCGI Process Manager. [ OK ] Stopped Netatalk AFP fileserver for Macintosh clients. [ OK ] Stopped OpenBSD Secure Shell server. [ OK ] Stopped Docker Application Container Engine. [ OK ] Stopped Serial Getty on ttySAC2. [ OK ] Stopped Getty on tty1. [ OK ] Stopped Samba SMB Daemon. [ OK ] Started Beep before system shutdown. [ OK ] Stopped Initializes zram swaping. [ OK ] Stopped A high performance web server and a reverse proxy server. [ OK ] Stopped LSB: Advanced IEEE 802.11 management daemon. [ OK ] Stopped Unattended Upgrades Shutdown. [ OK ] Stopped Postfix Mail Transport Agent (instance -). [ OK ] Unmounted /srv/dev-disk-by-label-omv. [ OK ] Unmounted /var/lib/rrdcached. [ OK ] Unmounted /var/folder2ram/var/lib/rrdcached. [ OK ] Unmounted /var/spool. [ OK ] Unmounted /var/folder2ram/var/spool. [ OK ] Unmounted /var/lib/monit. [ OK ] Unmounted /var/folder2ram/var/lib/monit. [ OK ] Stopped Web Services on Devices (WSD) daemon. [ OK ] Unmounted /var/lib/php. [ OK ] Stopped LSB: Start NTP daemon. [ OK ] Stopped LSB: Start/stop sysstat's sadc. [ OK ] Unmounted /var/folder2ram/var/lib/php. [ OK ] Stopped File System Check on /dev/disk/by-label/omv. [ OK ] Removed slice system-postfix.slice. [ OK ] Stopped Samba NMB Daemon. [ OK ] Removed slice system-getty.slice. [ OK ] Stopped /etc/rc.local Compatibility. Stopping Permit User Sessions... [ OK ] Removed slice system-serial\x2dgetty.slice. Stopping containerd container runtime... Stopping Avahi mDNS/DNS-SD Stack... [ OK ] Unmounted /var/cache/samba. [ OK ] Unmounted /var/lib/netatalk/CNID. [ OK ] Unmounted /var/folder2ram/var/lib/netatalk/CNID. [ OK ] Unmounted /var/folder2ram/var/cache/samba. [ OK ] Stopped Avahi mDNS/DNS-SD Stack. [ OK ] Stopped containerd container runtime. [ OK ] Started folder2ram systemd service. [ OK ] Stopped LSB: Starts ProFTPD daemon. [ OK ] Stopped LSB: start or stop rrdcached. [ OK ] Stopped LSB: service and resource monitoring daemon. [ OK ] Stopped LSB: Set sysfs variables from /etc/sysfs.conf. [ OK ] Stopped Permit User Sessions. Stopping LSB: set CPUFreq kernel parameters... [ OK ] Stopped target System Time Synchronized. [ OK ] Stopped target Network is Online. [ OK ] Stopped Network Manager Wait Online. [ OK ] Stopped LSB: set CPUFreq kernel parameters. Stopping LSB: Load kernel modules needed to enable cpufreq scaling... [ OK ] Stopped LSB: Load kernel modules needed to enable cpufreq scaling. [ OK ] Stopped target Remote File Systems. [ OK ] Stopped target Remote File Systems (Pre). Stopping NFS server and services... [ OK ] Stopped target NFS client services. [ OK ] Stopped NFS server and services. Stopping NFSv4 ID-name mapping service... Stopping NFS Mount Daemon... [ OK ] Stopped NFSv4 ID-name mapping service. [ OK ] Stopped NFS Mount Daemon. [ OK ] Stopped target Network. Stopping Network Manager... Stopping Raise network interfaces... Unmounting RPC Pipe File System... [ OK ] Stopped Network Manager. [ OK ] Unmounted RPC Pipe File System. Stopping D-Bus System Message Bus... [ OK ] Stopped D-Bus System Message Bus. [ OK ] Stopped target Basic System. [ OK ] Stopped target Slices. [ OK ] Removed slice User and Session Slice. [ OK ] Stopped target Sockets. [ OK ] Closed Avahi mDNS/DNS-SD Stack Activation Socket. [ OK ] Closed ACPID Listen Socket. [ OK ] Closed Docker Socket for the API. [ OK ] Closed Syslog Socket. [ OK ] Stopped target Paths. [ OK ] Stopped ACPI Events Check. [ OK ] Closed D-Bus System Message Bus Socket. [ OK ] Stopped target System Initialization. [ OK ] Stopped target Encrypted Volumes. [ OK ] Stopped Dispatch Password Requests to Console Directory Watch. [ OK ] Stopped Forward Password Requests to Wall Directory Watch. Stopping Restore / save the current clock... Stopping Armbian memory supported logging... [ OK ] Stopped target Swap. Stopping Entropy daemon using the HAVEGE algorithm... Stopping Update UTMP about System Boot/Shutdown... [ OK ] Stopped Entropy daemon using the HAVEGE algorithm. [ OK ] Stopped Raise network interfaces. [ OK ] Stopped Restore / save the current clock. [ OK ] Stopped Update UTMP about System Boot/Shutdown. [ OK ] Stopped Apply Kernel Variables. [ OK ] Stopped Load Kernel Modules. Stopping Load/Save Random Seed... [ OK ] Stopped Create Volatile Files and Directories. [ OK ] Stopped Load/Save Random Seed. [ OK ] Unmounted /var/log. [ OK ] Unmounted /var/log.hdd. [ OK ] Stopped Armbian memory supported logging. Stopping Armbian ZRAM config... [ OK ] Stopped Armbian ZRAM config. [ OK ] Stopped target Local File Systems. Unmounting /boot... Unmounting /tmp... Unmounting /run/user/0... [ OK ] Unmounted /boot. [ OK ] Unmounted /tmp. [ OK ] Unmounted /run/user/0. [ OK ] Reached target Unmount All Filesystems. [ OK ] Stopped File System Check on /dev/d�…b0a55-56f1-4443-8cac-297e1181425c. [ OK ] Removed slice system-systemd\x2dfsck.slice. [ OK ] Stopped target Local File Systems (Pre). [ OK ] Stopped Remount Root and Kernel File Systems. Stopping Monitoring of LVM2 mirrors�…ng dmeventd or progress polling... [ OK ] Stopped Create Static Device Nodes in /dev. [ OK ] Reached target Shutdown. [ 126.745633] reboot: Re U-Boot 2017.05-armbian (Sep 19 2018 - 12:42:38 +0200) for ODROID-XU4 CPU: Exynos5422 @ 800 MHz Model: Odroid XU4 based on EXYNOS5422 Board: Odroid XU4 based on EXYNOS5422 Type: xu4 DRAM: 2 GiB MMC: EXYNOS DWMMC: 0, EXYNOS DWMMC: 1 MMC Device 0 ( SD ): 14.8 GiB Card did not respond to voltage select! mmc_init: -95, time 11 *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Net: No ethernet found. Press quickly 'Enter' twice to stop autoboot: 0 ** Unrecognized filesystem type ** 12489 bytes read in 19 ms (641.6 KiB/s) cfgload addr = 0x50000000, Loading boot.ini from ext4 0:1 /boot.ini cfgload: applying boot.ini... cfgload: setenv initrd_high "0xffffffff" cfgload: setenv fdt_high "0xffffffff" cfgload: setenv macaddr "00:1e:06:61:7a:55" cfgload: setenv rootdev "UUID=d0da7bbe-e3af-4588-8715-aa5c4478eb88" cfgload: setenv rootfstype "btrfs" cfgload: setenv console "both" cfgload: setenv verbosity "1" cfgload: if ext4load mmc 0:1 0x44000000 /boot/armbianEnv.txt || fatload mmc 0:1 0x44000000 armbianEnv.txt || ext4load mmc 0:1 0x440i ** File not found /boot/armbianEnv.txt ** ** Unrecognized filesystem type ** 94 bytes read in 14 ms (5.9 KiB/s) cfgload: if test "${console}" = "display" || test "${console}" = "both"; then setenv consoleargs "console=tty1"; fi cfgload: if test "${console}" = "serial" || test "${console}" = "both"; then setenv consoleargs "${consoleargs} console=ttySAC2,115i cfgload: setenv bootrootfs "${consoleargs} consoleblank=0 loglevel=${verbosity} panic=10 root=${rootdev} rootfstype=${rootfstype} r" cfgload: setenv vout "hdmi" cfgload: setenv cecenable "false" # false or true cfgload: setenv governor "performance" cfgload: setenv ddr_freq 825 cfgload: setenv HPD "true" cfgload: setenv hdmi_tx_amp_lvl "31" cfgload: setenv hdmi_tx_lvl_ch0 "3" cfgload: setenv hdmi_tx_lvl_ch1 "3" cfgload: setenv hdmi_tx_lvl_ch2 "3" cfgload: setenv hdmi_tx_emp_lvl "6" cfgload: setenv hdmi_clk_amp_lvl "31" cfgload: setenv hdmi_tx_res "0" cfgload: setenv hdmi_phy_control "hdmi_tx_amp_lvl=${hdmi_tx_amp_lvl} hdmi_tx_lvl_ch0=${hdmi_tx_lvl_ch0} hdmi_tx_lvl_ch1=${hdmi_tx_l" cfgload: ext4load mmc 0:1 0x40008000 /boot/zImage || fatload mmc 0:1 0x40008000 zImage || ext4load mmc 0:1 0x40008000 zImage ** File not found /boot/zImage ** ** Unrecognized filesystem type ** 5703032 bytes read in 573 ms (9.5 MiB/s) cfgload: ext4load mmc 0:1 0x42000000 /boot/uInitrd || fatload mmc 0:1 0x42000000 uInitrd || ext4load mmc 0:1 0x42000000 uInitrd ** File not found /boot/uInitrd ** ** Unrecognized filesystem type ** 6308138 bytes read in 620 ms (9.7 MiB/s) cfgload: if test "${board_name}" = "xu4"; then setenv fdtfile "exynos5422-odroidxu4.dtb"; fi cfgload: if test "${board_name}" = "xu3"; then setenv fdtfile "exynos5422-odroidxu3.dtb"; fi cfgload: if test "${board_name}" = "xu3l"; then setenv fdtfile "exynos5422-odroidxu3-lite.dtb"; fi cfgload: if test "${board_name}" = "hc1"; then setenv fdtfile "exynos5422-odroidhc1.dtb"; fi cfgload: if ext4load mmc 0:1 0x00000000 "/boot/.next" || fatload mmc 0:1 0x00000000 ".next" || ext4load mmc 0:1 0x00000000 ".next"i ** File not found /boot/.next ** ** Unrecognized filesystem type ** 0 bytes read in 8 ms (0 Bytes/s) Found mainline kernel configuration cfgload: ext4load mmc 0:1 0x44000000 /boot/dtb/${fdtfile} || fatload mmc 0:1 0x44000000 dtb/${fdtfile} || ext4load mmc 0:1 0x440000} ** File not found /boot/dtb/exynos5422-odroidhc1.dtb ** ** Unrecognized filesystem type ** 56387 bytes read in 67 ms (821.3 KiB/s) cfgload: fdt addr 0x44000000 cfgload: if test "${cecenable}" = "false"; then fdt rm /cec@101B0000; fi libfdt fdt_path_offset() returned FDT_ERR_NOTFOUND cfgload: setenv bootargs "${bootrootfs} ${videoconfig} smsc95xx.macaddr=${macaddr} governor=${governor} ${hdmi_phy_control} usb-sto" cfgload: dmc ${ddr_freq} cfgload: bootz 0x40008000 0x42000000 0x44000000 Kernel image @ 0x40008000 [ 0x000000 - 0x570578 ] ## Loading init Ramdisk from Legacy Image at 42000000 ... Image Name: uInitrd Image Type: ARM Linux RAMDisk Image (gzip compressed) Data Size: 6308074 Bytes = 6 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 44000000 Booting using the fdt blob at 0x44000000 Using Device Tree in place at 44000000, end 44010c42 Starting kernel ...
Igor Posted December 18, 2019 Posted December 18, 2019 Just now, belegdol said: So the device is booting, just not coming back up Yes. Maybe one day we will be able to setup something like this https://lavasoftware.org/about.html
belegdol Posted December 18, 2019 Posted December 18, 2019 This would be cool! I have now compared the minicom output from cold boot, it unfortunately is almost exactly the same. The only differences are is when time and speed of reading bytes are shown. Is there anything else to check? Never building another kernel update is not really an option...
grunlab Posted December 18, 2019 Author Posted December 18, 2019 Hi Igor, Belegdol, If i can help at some point (testing something or whatever), don't hesitate to ask me ! Regards, Adrien
belegdol Posted December 18, 2019 Posted December 18, 2019 I have tried investigating this further. First step was to purge ccache and other caches to exclude a compiliation issue. It did not help. I am now trying to disable cgroups pid to see if this is the culprit, but doing so causes the kernel build to fail. I am using the following command line: $ ./compile.sh KERNEL_ONLY=yes KERNEL_CONFIGURE=yes KERNEL_KEEP_CONFIG=no BOARD=odroidxu4 SUBREVISION=.1 Then, once configuration menu is reached, I disable cgroups. Same happens if I git revert the commit in question and try building with KERNEL_CONFIGURE=no: Makefile:1051: recipe for target 'net' failed [ error ] ERROR in function compile_kernel [ compilation.sh:382 ] [ error ] Kernel was not built [ @host ] [ o.k. ] Process terminated How can I make the errors more verbose?
Igor Posted December 18, 2019 Posted December 18, 2019 3 minutes ago, belegdol said: Makefile:1051: recipe for target 'net' failed Purge sources. Do you perhaps compile on some network drive?
Igor Posted December 18, 2019 Posted December 18, 2019 56 minutes ago, grunlab said: If i can help at some point 2-3 full time engineers would need several months to setup such system from where we are now (no automated testings at all). And someone has to learn the knowledge and lead this project. Booting some image when you have some time and telling that it doesn't work adds little value. We need automated testing facility which is very critical and demanding at major upgrades. Those are the time when I don't sleep well for a week or more.
belegdol Posted December 18, 2019 Posted December 18, 2019 13 minutes ago, Igor said: Purge sources. Do you perhaps compile on some network drive? I compile on my VMs main drive. Still no success with purging the sources: $ ./compile.sh KERNEL_ONLY=yes KERNEL_CONFIGURE=no KERNEL_KEEP_CONFIG=no BOARD=odroidxu4 SUBREVISION=.1 CLEAN_LEVEL=make,alldebs,images,cache,sources,extras This time the error is somewhere else though: Makefile:1051: recipe for target 'drivers' failed [ error ] ERROR in function compile_kernel [ compilation.sh:382 ] [ error ] Kernel was not built [ @host ] [ o.k. ] Process terminated
Igor Posted December 18, 2019 Posted December 18, 2019 14 minutes ago, belegdol said: Still no success with purging the sources: What if purging doesn't work properly and your sources are corrupted somehow? Remove cache manually. I can't recreate this problem. For me everything works on a cleanly installed Ubuntu Bionic server. 16 minutes ago, belegdol said: This time the error is somewhere else though Check errors in logs. They are in output/debug/compilation.log
belegdol Posted December 19, 2019 Posted December 19, 2019 I have now tried deleting cache/sources manually, still no dice. The actual error was: kernel/sched/fair.c:6215:12: warning: ‘cpu_util_wake’ defined but not used [-Wunused-function] static int cpu_util_wake(int cpu, struct task_struct *p) ^~~~~~~~~~~~~ drivers/hardkernel/ina231-i2c.c: In function ‘ina231_work’: drivers/hardkernel/ina231-i2c.c:106:60: warning: self-comparison always evaluates to false [-Wtautological-compare] if ((sensor->cur_uV > sensor->max_uV) || (sensor->cur_uA > sensor->cur_uA)) { ^ drivers/gpu/drm/exynos/exynos_hdmi.c:731:22: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘5656’ to ‘24’ [-Woverflow] 0x01, 0xD1, 0x29, 0x1618, 0x418, 0x190, 0xF5, 0xCF, ^~~~~~ drivers/gpu/drm/exynos/exynos_hdmi.c:731:30: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘1048’ to ‘24’ [-Woverflow] 0x01, 0xD1, 0x29, 0x1618, 0x418, 0x190, 0xF5, 0xCF, ^~~~~ drivers/gpu/drm/exynos/exynos_hdmi.c:731:37: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘400’ to ‘144’ [-Woverflow] 0x01, 0xD1, 0x29, 0x1618, 0x418, 0x190, 0xF5, 0xCF, ^~~~~ drivers/gpu/drm/exynos/exynos_hdmi.c:732:10: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘360’ to ‘104’ [-Woverflow] 0x8D, 0x168, 0xF5, 0xD8, 0x45, 0xA0, 0xAC, 0x80, ^~~~~ fs/proc/task_mmu.c: In function ‘show_smap’: fs/proc/task_mmu.c:764:7: warning: ‘last_vma’ may be used uninitialized in this function [-Wmaybe-uninitialized] bool last_vma; ^~~~~~~~ Aborted (core dumped) make[2]: *** [fs/reiserfs/fix_node.o] Error 134 make[2]: *** Deleting file 'fs/reiserfs/fix_node.o' make[1]: *** [fs/reiserfs] Error 2 make: *** [fs] Error 2 make: *** Waiting for unfinished jobs.... Aborted (core dumped) make[4]: *** [drivers/media/platform/exynos-gsc/gsc-core.o] Error 134 make[4]: *** Deleting file 'drivers/media/platform/exynos-gsc/gsc-core.o' make[3]: *** [drivers/media/platform/exynos-gsc] Error 2 make[2]: *** [drivers/media/platform] Error 2 make[2]: *** Waiting for unfinished jobs.... Aborted (core dumped) make[2]: *** [drivers/pinctrl/devicetree.o] Error 134 make[2]: *** Deleting file 'drivers/pinctrl/devicetree.o' make[1]: *** [drivers/pinctrl] Error 2 make[1]: *** Waiting for unfinished jobs.... Aborted (core dumped) make[3]: *** [drivers/phy/samsung/phy-samsung-usb2.o] Error 134 make[3]: *** Deleting file 'drivers/phy/samsung/phy-samsung-usb2.o' make[2]: *** [drivers/phy/samsung] Error 2 make[1]: *** [drivers/phy] Error 2 Aborted (core dumped) make[3]: *** [drivers/net/team/team_mode_roundrobin.o] Error 134 make[3]: *** Deleting file 'drivers/net/team/team_mode_roundrobin.o' make[2]: *** [drivers/net/team] Error 2 make[2]: *** Waiting for unfinished jobs.... Aborted (core dumped) make[3]: *** [drivers/net/usb/r8152.o] Error 134 make[3]: *** Deleting file 'drivers/net/usb/r8152.o' make[3]: *** Waiting for unfinished jobs.... Aborted (core dumped) make[3]: *** [drivers/net/usb/usbnet.o] Error 134 make[3]: *** Deleting file 'drivers/net/usb/usbnet.o' make[2]: *** [drivers/net/usb] Error 2 make[1]: *** [drivers/net] Error 2 Aborted (core dumped) make[2]: *** [net/netfilter/nft_range.o] Error 134 make[2]: *** Deleting file 'net/netfilter/nft_range.o' make[2]: *** Waiting for unfinished jobs.... make[1]: *** [net/netfilter] Error 2 make[1]: *** Waiting for unfinished jobs.... make[1]: *** [drivers/media] Error 2 make: *** [drivers] Error 2 make: *** [net] Error 2 After I started getting similar errors when trying to buikd unmodified git master I thought I was going crazy. Luckily it turned out that ccache was the culprit. With USE_CCACHE=no I was able to build both the pristine source and one with cgroups change reverted, no CLEAN_LEVEL increase or manual deletion of anything was needed. I have to go to the office now so testing of the kernels will have to wait until later but at least I have something to test now.
belegdol Posted December 20, 2019 Posted December 20, 2019 It unfortunately appears that reverting the pids change is not enough to fix reboot problems. I will try bisecting, hopefully the issue is in armbian git and not in one of the upstream ones...
belegdol Posted December 20, 2019 Posted December 20, 2019 I have done the full bisect run: git bisect start # bad: [6ec526eaf0dbd333349c8f1b517f090931ee0c6c] To run 32bit rustc you need enabled cp15 barrier emulation. (#1680) git bisect bad 6ec526eaf0dbd333349c8f1b517f090931ee0c6c # good: [7ebc310c9679ae3ebe22aac480a42f29b0a0281d] Merge branch 'master' of https://github.com/armbian/build git bisect good 7ebc310c9679ae3ebe22aac480a42f29b0a0281d # good: [da8cfe78c04b786c0ae967231891c86a0543248d] Disabled hs400 mode of roc-rk3399-pc's emmc (#1666) git bisect good da8cfe78c04b786c0ae967231891c86a0543248d # bad: [2e69b173bf957e1e54bce9c849d27dd796d7425d] Merge pull request #1673 from armbian/focal git bisect bad 2e69b173bf957e1e54bce9c849d27dd796d7425d # bad: [755388147d65a12d39b31898be29434f999700c8] Enable Kubernetes dependency git bisect bad 755388147d65a12d39b31898be29434f999700c8 # good: [efb5f68cc3064b1cf09678db95b2d9ff9484c39e] fix typo in fix-a64-timejump patch git bisect good efb5f68cc3064b1cf09678db95b2d9ff9484c39e # good: [9241c849b5e3fb19ae8e4814c9a46badb9a870a5] [Maintanace] Rootfs cache has been rebuild with new version. git bisect good 9241c849b5e3fb19ae8e4814c9a46badb9a870a5 # good: [1351bdfe3fdf30d7603cedfa1feccaf802718732] Espressobin: add missing/corrected RAM topology git bisect good 1351bdfe3fdf30d7603cedfa1feccaf802718732 # first bad commit: [755388147d65a12d39b31898be29434f999700c8] Enable Kubernetes dependency I am not entirely sure if it worked as intended - How can it say that 7553881 is the first bad commit if there are 130c23c and 8df3e98 between it and 1351bdf? In any case, I reverted the change once again and this time the machine does reboot properly. Not sure why this did not work earlier, maybe I got my kernels confused.
Igor Posted December 21, 2019 Posted December 21, 2019 On 12/20/2019 at 11:03 PM, belegdol said: is not enough to fix reboot problems Problem can also come from here as well:https://github.com/hardkernel/linux/commits/odroidxu4-4.14.y
belegdol Posted December 21, 2019 Posted December 21, 2019 1 hour ago, Igor said: Problem can also come from here as well:https://github.com/hardkernel/linux/commits/odroidxu4-4.14.y I think it is unlikely, as there were no commits to that tree between 5th December and now. I also did the entire bisect run within a few hours so changes to upstream trees, if any, would have been minimal. I think what most likely happened is that when I initially managed to compile the kernel with cgroups disabled, I must have somehow mixed up which kernel is installed. Unfortunately /var/log/apt/history.log has already rotated so the info what exactly happened is gone. ETA: hardkernel and memeka are currently working on getting 5.4 kernel to work so probably the best course of action would be to back out the cgroups change and revisit the issue once 5.4 is released by hardkernel.
belegdol Posted December 21, 2019 Posted December 21, 2019 CONFIG_CGROUP_PIDS is not enabled in hardkernel 4.14.y so maybe there is a reason for it: https://github.com/hardkernel/linux/blob/odroidxu4-4.14.y/arch/arm/configs/odroidxu4_defconfig
Recommended Posts