2 2
grunlab

Enable PIDs cgroup

Recommended Posts

Hi Armbian Support Team,

 

I'm running a Kubernetes cluster on top of 8 odroid HC1 cards (3 master nodes & 5 workers nodes)

 

- Armbian used version is :

 

adrien@bilbon:~/git/k8s$ kc get node master-01 -o wide
NAME        STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION       CONTAINER-RUNTIME
master-01   Ready    master   44d   v1.17.0   192.168.0.101   <none>        Debian GNU/Linux 10 (buster)   4.14.150-odroidxu4   docker://19.3.5

 

- I'm currently observing those following events at Kubernetes level:

 

adrien@bilbon:~/git/k8s$ kc get events --all-namespaces
NAMESPACE   LAST SEEN   TYPE      REASON                             OBJECT           MESSAGE
default     4m55s       Warning   FailedNodeAllocatableEnforcement   node/master-01   Failed to update Node Allocatable Limits ["kubepods"]: failed to set supported cgroup subsystems for cgroup [kubepods]: failed to find subsystem mount for required subsystem: pids
default     2m5s        Warning   FailedNodeAllocatableEnforcement   node/master-02   Failed to update Node Allocatable Limits ["kubepods"]: failed to set supported cgroup subsystems for cgroup [kubepods]: failed to find subsystem mount for required subsystem: pids
default     4m35s       Warning   FailedNodeAllocatableEnforcement   node/master-03   Failed to update Node Allocatable Limits ["kubepods"]: failed to set supported cgroup subsystems for cgroup [kubepods]: failed to find subsystem mount for required subsystem: pids
default     18s         Warning   FailedNodeAllocatableEnforcement   node/worker-01   Failed to update Node Allocatable Limits ["kubepods"]: failed to set supported cgroup subsystems for cgroup [kubepods]: failed to find subsystem mount for required subsystem: pids

...

 

- After a quick googling, it looks that the issue is linked to the fact that PIDS cgroup is not enabled at Armbian kernel level. Effectively, it is the case ... no pids cgroup found:

 

adrien@master-01:~$ cat /proc/cgroups
#subsys_name    hierarchy    num_cgroups    enabled
cpuset    5    29    1
cpu    2    69    1
cpuacct    2    69    1
blkio    8    69    1
memory    4    125    1
devices    3    69    1
freezer    7    29    1
net_cls    6    29    1
net_prio    6    29    1

 

- This issue has also been reported on RaspberryPI in the same Kubernetes context ... this issue has been solved by enabling PIDs cgroup at Raspbian kernel level:

https://github.com/raspberrypi/linux/pull/2968

 

So, could it be possible to enable PIDs cgroup at kernel level in the next release of Armbian ?

 

Thank you for your support,

Regards

Adrien

 

 

 

 

 

 

 

 

 

Share this post


Link to post
Share on other sites
24 minutes ago, Adrien Gruneisen said:

So, could it be possible to enable PIDs cgroup at kernel level in the next release of Armbian ?


https://github.com/armbian/build/commit/755388147d65a12d39b31898be29434f999700c8
It was the only config where this was not enabled :P
 

Spoiler

linux-imx6-current.config:CONFIG_CGROUP_PIDS=y
linux-imx7d-legacy.config:CONFIG_CGROUP_PIDS=y
linux-meson64-current.config:CONFIG_CGROUP_PIDS=y
linux-meson64-dev.config:CONFIG_CGROUP_PIDS=y
linux-meson64-legacy.config:CONFIG_CGROUP_PIDS=y
linux-meson-current.config:CONFIG_CGROUP_PIDS=y
linux-meson-dev.config:CONFIG_CGROUP_PIDS=y
linux-mt7623-legacy.config:CONFIG_CGROUP_PIDS=y
linux-mvebu64-current.config:CONFIG_CGROUP_PIDS=y
linux-mvebu64-dev.config:CONFIG_CGROUP_PIDS=y
linux-mvebu64-legacy.config:CONFIG_CGROUP_PIDS=y
linux-mvebu-current.config:CONFIG_CGROUP_PIDS=y
linux-mvebu-dev.config:CONFIG_CGROUP_PIDS=y
linux-mvebu-legacy.config:CONFIG_CGROUP_PIDS=y
linux-odroidn2-current.config:CONFIG_CGROUP_PIDS=y
linux-odroidxu4-dev.config:CONFIG_CGROUP_PIDS=y
linux-odroidxu4-legacy.config:# CONFIG_CGROUP_PIDS is not set
linux-rk3399-legacy.config:CONFIG_CGROUP_PIDS=y
linux-rockchip64-current.config:CONFIG_CGROUP_PIDS=y
linux-rockchip64-dev.config:CONFIG_CGROUP_PIDS=y
linux-rockchip64-legacy.config:CONFIG_CGROUP_PIDS=y
linux-rockchip-current.config:CONFIG_CGROUP_PIDS=y
linux-rockchip-legacy.config:CONFIG_CGROUP_PIDS=y
linux-rockpis-legacy.config:CONFIG_CGROUP_PIDS=y
linux-s5p6818-legacy.config:CONFIG_CGROUP_PIDS=y
linux-sunxi64-current.config:CONFIG_CGROUP_PIDS=y
linux-sunxi64-dev.config:CONFIG_CGROUP_PIDS=y
linux-sunxi64-legacy.config:CONFIG_CGROUP_PIDS=y
linux-sunxi-current.config:CONFIG_CGROUP_PIDS=y
linux-sunxi-dev.config:CONFIG_CGROUP_PIDS=y
linux-sunxi-legacy.config:CONFIG_CGROUP_PIDS=y

 

 

Share this post


Link to post
Share on other sites

Hi Igor,

 

Great :-) ... thank you so much !

Do you have an idea when the update will be available into the armbian repo ?

 

Regards,

Adrien

Share this post


Link to post
Share on other sites

Hi Igor,

 

The update is still not reflected into the repo ... is it normal ?

 

adrien@master-01:~$ uname -a
Linux master-01 4.14.150-odroidxu4 #1 SMP PREEMPT Mon Oct 28 07:56:57 CET 2019 armv7l GNU/Linux
adrien@master-01:~$ date
Mon Dec 16 20:50:42 CET 2019
adrien@master-01:~$ sudo apt update
Hit:1 http://security.debian.org buster/updates InRelease
Hit:3 http://repo.zabbix.com/zabbix/4.4/raspbian buster InRelease                                                                                  
Hit:4 http://httpredir.debian.org/debian buster InRelease                                                                    
Hit:5 http://httpredir.debian.org/debian buster-updates InRelease                                                               
Hit:6 http://httpredir.debian.org/debian buster-backports InRelease                                                             
Hit:2 https://packages.cloud.google.com/apt kubernetes-xenial InRelease                                                                    
Hit:8 https://download.docker.com/linux/debian buster InRelease                                                                            
Hit:7 https://apt.armbian.com buster InRelease                        
Reading package lists... Done     
Building dependency tree       
Reading state information... Done
All packages are up to date.

 

Thank you,

Regards,

Adrien

Share this post


Link to post
Share on other sites

Ok, thank you for this clarification.

I prefer to not use the beta repo and wait a bit more ... any idea when the update will be pushed to the stable repo ?

 

Share this post


Link to post
Share on other sites

My Ubuntu VM is no longer booting to X since the last apt upgrade which makes preparing updates slightly more annoying than it should be, but I can have a go at rebuilding the kernel this evening.

Share this post


Link to post
Share on other sites

I have good news and bad news. The good news is that apt upgrade has fixed my VM and that I was able to rebuild the kernel using the latest config from armbian/build git. The bad news is that odroid did not boot back up. I need to break out my UART cable and have a look what's up.

 

ETA: it must have been intermittent network issues. I attached the cable and everything worked fine, I could ssh to the machine too. PR created:

https://github.com/armbian/upload/pull/13

Edited by belegdol
not booting was a false alarm it seems

Share this post


Link to post
Share on other sites

OK I would say roll the update back. I rebooted again to be sure and ssh is not responding again. Either plugging the UART has fixed the problem, or the longer power cycle needed to take off the cover to plug the UART in did.

In your experience would you say it is possible the issue is caused by the cgroup itself? Or were there any other changes in the last two weeks which could be responsible?

ETA: power cycle (no UART this time) has sorted the issue out for now. I will try this with UART connected tomorrow to see what is up - it could be that the device actually never reboots when requested, just kills the ssh.

Share this post


Link to post
Share on other sites

So the device is booting, just not coming back up:

Stopping User Manager for UID 0...
         Unmounting Mount shared folder yasy to /sharedfolders/aaa...
[  OK  ] Stopped target Graphical Interface.
[  OK  ] Stopped target RPC Port Mapper.
         Starting Beep before system shutdown...
         Unmounting Mount shared folder emu to /sharedfolders/bbb...
         Stopping watchdog daemon...
         Starting folder2ram systemd service...
         Stopping Authorization Manager...
         Stopping ACPI event daemon...
         Stopping Session 6 of user root.
         Unmounting Mount shared folder julian to /sharedfolders/ccc...
         Stopping pNFS block layout mapping daemon...
         Unmounting Mount shared folder Bild�…19 to /sharedfolders/Bilder2019...
[  OK  ] Stopped target Timers.
[  OK  ] Stopped Clean PHP session files every 30 mins.
[  OK  ] Stopped Trigger anacron every hour.
[  OK  ] Stopped Daily Cleanup of Temporary Directories.
[  OK  ] Stopped Daily apt upgrade and clean activities.
[  OK  ] Stopped Daily apt download activities.
         Unmounting Mount shared folder aust�…sch to /sharedfolders/austausch...
[  OK  ] Stopped pNFS block layout mapping daemon.
[  OK  ] Stopped ACPI event daemon.
[  OK  ] Stopped Authorization Manager.
[  OK  ] Stopped Session 6 of user root.
[  OK  ] Stopped User Manager for UID 0.
[  OK  ] Unmounted Mount shared folder aaa to /sharedfolders/aaa.
[  OK  ] Unmounted Mount shared folder bbb to /sharedfolders/bbb.
[  OK  ] Unmounted Mount shared folder ccc to /sharedfolders/ccc.
[  OK  ] Unmounted Mount shared folder Bilder2019 to /sharedfolders/Bilder2019.
[  OK  ] Unmounted Mount shared folder austausch to /sharedfolders/austausch.
[  OK  ] Stopped watchdog daemon.
[  OK  ] Stopped target Multi-User System.
         Stopping Initializes zram swaping...
         Stopping A high performance web server and a reverse proxy server...
         Stopping LSB: start or stop rrdcached...
[  OK  ] Stopped Generate the prelogin message.
         Stopping LSB: Start NTP daemon...
[  OK  ] Stopped The OpenMediaVault engine d�…on that processes the RPC request.
         Stopping LSB: Starts ProFTPD daemon...
[  OK  ] Stopped target Login Prompts.
            Stopping fast remote file copy program daemon...
         Stopping LSB: Advanced IEEE 802.11 management daemon...
         Stopping Unattended Upgrades Shutdown...
         Stopping Self Monitoring and Reporting Technology (SMART) Daemon...
[  OK  ] Stopped Postfix Mail Transport Agent.
         Stopping Postfix Mail Transport Agent (instance -)...
         Stopping Regular background program processing daemon...
         Stopping The PHP 7.0 FastCGI Process Manager...
         Stopping OpenBSD Secure Shell server...
         Stopping LSB: Set sysfs variables from /etc/sysfs.conf...
         Unmounting /srv/dev-disk-by-label-omv...
[  OK  ] Removed slice User Slice of root.
         Stopping Login Service...
[  OK  ] Unmounted /var/lib/docker/btrfs.
[  OK  ] Unmounted /var/folder2ram/var/log.
[  OK  ] Deactivated swap /dev/zram7.
[  OK  ] Deactivated swap /dev/zram6.
[  OK  ] Deactivated swap /dev/zram5.
[  OK  ] Deactivated swap /dev/zram4.
[  OK  ] Deactivated swap /dev/zram3.
[  OK  ] Deactivated swap /dev/zram2.
[  OK  ] Deactivated swap /dev/zram1.
[  OK  ] Deactivated swap /dev/zram0.
[  OK  ] Unmounted /var/lib/openmediavault/rrd.
[  OK  ] Unmounted /var/tmp.
[  OK  ] Unmounted /var/folder2ram/var/tmp.
[  OK  ] Unmounted /var/folder2ram/var/lib/openmediavault/rrd.
[  OK  ] Stopped System Logging Service.
[  OK  ] Stopped fast remote file copy program daemon.
[  OK  ] Stopped Login Service.
[  OK  ] Stopped Self Monitoring and Reporting Technology (SMART) Daemon.
[  OK  ] Stopped Regular background program processing daemon.
[  OK  ] Stopped The PHP 7.0 FastCGI Process Manager.
[  OK  ] Stopped Netatalk AFP fileserver for Macintosh clients.
[  OK  ] Stopped OpenBSD Secure Shell server.
[  OK  ] Stopped Docker Application Container Engine.
[  OK  ] Stopped Serial Getty on ttySAC2.
[  OK  ] Stopped Getty on tty1.
[  OK  ] Stopped Samba SMB Daemon.
[  OK  ] Started Beep before system shutdown.
[  OK  ] Stopped Initializes zram swaping.
[  OK  ] Stopped A high performance web server and a reverse proxy server.
[  OK  ] Stopped LSB: Advanced IEEE 802.11 management daemon.
[  OK  ] Stopped Unattended Upgrades Shutdown.
[  OK  ] Stopped Postfix Mail Transport Agent (instance -).
[  OK  ] Unmounted /srv/dev-disk-by-label-omv.
[  OK  ] Unmounted /var/lib/rrdcached.
[  OK  ] Unmounted /var/folder2ram/var/lib/rrdcached.
[  OK  ] Unmounted /var/spool.
[  OK  ] Unmounted /var/folder2ram/var/spool.
[  OK  ] Unmounted /var/lib/monit.
[  OK  ] Unmounted /var/folder2ram/var/lib/monit.
[  OK  ] Stopped Web Services on Devices (WSD) daemon.
[  OK  ] Unmounted /var/lib/php.
[  OK  ] Stopped LSB: Start NTP daemon.
[  OK  ] Stopped LSB: Start/stop sysstat's sadc.
[  OK  ] Unmounted /var/folder2ram/var/lib/php.
[  OK  ] Stopped File System Check on /dev/disk/by-label/omv.
[  OK  ] Removed slice system-postfix.slice.
[  OK  ] Stopped Samba NMB Daemon.
[  OK  ] Removed slice system-getty.slice.
[  OK  ] Stopped /etc/rc.local Compatibility.
         Stopping Permit User Sessions...
[  OK  ] Removed slice system-serial\x2dgetty.slice.
         Stopping containerd container runtime...
         Stopping Avahi mDNS/DNS-SD Stack...
[  OK  ] Unmounted /var/cache/samba.
[  OK  ] Unmounted /var/lib/netatalk/CNID.
[  OK  ] Unmounted /var/folder2ram/var/lib/netatalk/CNID.
[  OK  ] Unmounted /var/folder2ram/var/cache/samba.
[  OK  ] Stopped Avahi mDNS/DNS-SD Stack.
[  OK  ] Stopped containerd container runtime.
[  OK  ] Started folder2ram systemd service.
[  OK  ] Stopped LSB: Starts ProFTPD daemon.
[  OK  ] Stopped LSB: start or stop rrdcached.
[  OK  ] Stopped LSB: service and resource monitoring daemon.
[  OK  ] Stopped LSB: Set sysfs variables from /etc/sysfs.conf.
[  OK  ] Stopped Permit User Sessions.
         Stopping LSB: set CPUFreq kernel parameters...
[  OK  ] Stopped target System Time Synchronized.
[  OK  ] Stopped target Network is Online.
[  OK  ] Stopped Network Manager Wait Online.
[  OK  ] Stopped LSB: set CPUFreq kernel parameters.
         Stopping LSB: Load kernel modules needed to enable cpufreq scaling...
[  OK  ] Stopped LSB: Load kernel modules needed to enable cpufreq scaling.
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Remote File Systems (Pre).
         Stopping NFS server and services...
[  OK  ] Stopped target NFS client services.
[  OK  ] Stopped NFS server and services.
         Stopping NFSv4 ID-name mapping service...
         Stopping NFS Mount Daemon...
[  OK  ] Stopped NFSv4 ID-name mapping service.
[  OK  ] Stopped NFS Mount Daemon.
[  OK  ] Stopped target Network.
         Stopping Network Manager...
         Stopping Raise network interfaces...
         Unmounting RPC Pipe File System...
[  OK  ] Stopped Network Manager.
[  OK  ] Unmounted RPC Pipe File System.
         Stopping D-Bus System Message Bus...
[  OK  ] Stopped D-Bus System Message Bus.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Slices.
[  OK  ] Removed slice User and Session Slice.
[  OK  ] Stopped target Sockets.
[  OK  ] Closed Avahi mDNS/DNS-SD Stack Activation Socket.
[  OK  ] Closed ACPID Listen Socket.
[  OK  ] Closed Docker Socket for the API.
[  OK  ] Closed Syslog Socket.
[  OK  ] Stopped target Paths.
[  OK  ] Stopped ACPI Events Check.
[  OK  ] Closed D-Bus System Message Bus Socket.
[  OK  ] Stopped target System Initialization.
[  OK  ] Stopped target Encrypted Volumes.
[  OK  ] Stopped Dispatch Password Requests to Console Directory Watch.
[  OK  ] Stopped Forward Password Requests to Wall Directory Watch.
         Stopping Restore / save the current clock...
         Stopping Armbian memory supported logging...
[  OK  ] Stopped target Swap.
         Stopping Entropy daemon using the HAVEGE algorithm...
         Stopping Update UTMP about System Boot/Shutdown...
[  OK  ] Stopped Entropy daemon using the HAVEGE algorithm.
[  OK  ] Stopped Raise network interfaces.
[  OK  ] Stopped Restore / save the current clock.
[  OK  ] Stopped Update UTMP about System Boot/Shutdown.
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped Load Kernel Modules.
         Stopping Load/Save Random Seed...
[  OK  ] Stopped Create Volatile Files and Directories.
[  OK  ] Stopped Load/Save Random Seed.
[  OK  ] Unmounted /var/log.
[  OK  ] Unmounted /var/log.hdd.
[  OK  ] Stopped Armbian memory supported logging.
         Stopping Armbian ZRAM config...
[  OK  ] Stopped Armbian ZRAM config.
[  OK  ] Stopped target Local File Systems.
         Unmounting /boot...
         Unmounting /tmp...
         Unmounting /run/user/0...
[  OK  ] Unmounted /boot.
[  OK  ] Unmounted /tmp.
[  OK  ] Unmounted /run/user/0.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped File System Check on /dev/d�…b0a55-56f1-4443-8cac-297e1181425c.
[  OK  ] Removed slice system-systemd\x2dfsck.slice.
[  OK  ] Stopped target Local File Systems (Pre).
[  OK  ] Stopped Remount Root and Kernel File Systems.
         Stopping Monitoring of LVM2 mirrors�…ng dmeventd or progress polling...
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Reached target Shutdown.
[  126.745633] reboot: Re

U-Boot 2017.05-armbian (Sep 19 2018 - 12:42:38 +0200) for ODROID-XU4

CPU:   Exynos5422 @ 800 MHz
Model: Odroid XU4 based on EXYNOS5422
Board: Odroid XU4 based on EXYNOS5422
Type:  xu4
DRAM:  2 GiB
MMC:   EXYNOS DWMMC: 0, EXYNOS DWMMC: 1
MMC Device 0 ( SD ): 14.8 GiB
Card did not respond to voltage select!
mmc_init: -95, time 11
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
Press quickly 'Enter' twice to stop autoboot:  0 
** Unrecognized filesystem type **
12489 bytes read in 19 ms (641.6 KiB/s)
cfgload addr = 0x50000000, Loading boot.ini from ext4 0:1 /boot.ini
cfgload: applying boot.ini...
cfgload: setenv initrd_high "0xffffffff"
cfgload: setenv fdt_high "0xffffffff"
cfgload: setenv macaddr "00:1e:06:61:7a:55"
cfgload: setenv rootdev "UUID=d0da7bbe-e3af-4588-8715-aa5c4478eb88"
cfgload: setenv rootfstype "btrfs"
cfgload: setenv console "both"
cfgload: setenv verbosity "1"
cfgload: if ext4load mmc 0:1 0x44000000 /boot/armbianEnv.txt || fatload mmc 0:1 0x44000000 armbianEnv.txt || ext4load mmc 0:1 0x440i
** File not found /boot/armbianEnv.txt **
** Unrecognized filesystem type **
94 bytes read in 14 ms (5.9 KiB/s)
cfgload: if test "${console}" = "display" || test "${console}" = "both"; then setenv consoleargs "console=tty1"; fi
cfgload: if test "${console}" = "serial" || test "${console}" = "both"; then setenv consoleargs "${consoleargs} console=ttySAC2,115i
cfgload: setenv bootrootfs "${consoleargs} consoleblank=0 loglevel=${verbosity} panic=10 root=${rootdev} rootfstype=${rootfstype} r"
cfgload: setenv vout "hdmi"
cfgload: setenv cecenable "false" # false or true
cfgload: setenv governor "performance"
cfgload: setenv ddr_freq 825
cfgload: setenv HPD "true"
cfgload: setenv hdmi_tx_amp_lvl  "31"
cfgload: setenv hdmi_tx_lvl_ch0      "3"
cfgload: setenv hdmi_tx_lvl_ch1      "3"
cfgload: setenv hdmi_tx_lvl_ch2      "3"
cfgload: setenv hdmi_tx_emp_lvl      "6"
cfgload: setenv hdmi_clk_amp_lvl     "31"
cfgload: setenv hdmi_tx_res      "0"
cfgload: setenv hdmi_phy_control "hdmi_tx_amp_lvl=${hdmi_tx_amp_lvl} hdmi_tx_lvl_ch0=${hdmi_tx_lvl_ch0} hdmi_tx_lvl_ch1=${hdmi_tx_l"
cfgload: ext4load mmc 0:1 0x40008000 /boot/zImage || fatload mmc 0:1 0x40008000 zImage || ext4load mmc 0:1 0x40008000 zImage
** File not found /boot/zImage **
** Unrecognized filesystem type **
5703032 bytes read in 573 ms (9.5 MiB/s)
cfgload: ext4load mmc 0:1 0x42000000 /boot/uInitrd || fatload mmc 0:1 0x42000000 uInitrd || ext4load mmc 0:1 0x42000000 uInitrd
** File not found /boot/uInitrd **
** Unrecognized filesystem type **
6308138 bytes read in 620 ms (9.7 MiB/s)
cfgload: if test "${board_name}" = "xu4"; then setenv fdtfile "exynos5422-odroidxu4.dtb"; fi
cfgload: if test "${board_name}" = "xu3"; then setenv fdtfile "exynos5422-odroidxu3.dtb"; fi
cfgload: if test "${board_name}" = "xu3l"; then setenv fdtfile "exynos5422-odroidxu3-lite.dtb"; fi
cfgload: if test "${board_name}" = "hc1"; then setenv fdtfile "exynos5422-odroidhc1.dtb"; fi
cfgload: if ext4load mmc 0:1 0x00000000 "/boot/.next" || fatload mmc 0:1 0x00000000 ".next"  || ext4load mmc 0:1 0x00000000 ".next"i
** File not found /boot/.next **
** Unrecognized filesystem type **
0 bytes read in 8 ms (0 Bytes/s)
Found mainline kernel configuration
cfgload: ext4load mmc 0:1 0x44000000 /boot/dtb/${fdtfile} || fatload mmc 0:1 0x44000000 dtb/${fdtfile} || ext4load mmc 0:1 0x440000}
** File not found /boot/dtb/exynos5422-odroidhc1.dtb **
** Unrecognized filesystem type **
56387 bytes read in 67 ms (821.3 KiB/s)
cfgload: fdt addr 0x44000000
cfgload: if test "${cecenable}" = "false"; then fdt rm /cec@101B0000; fi
libfdt fdt_path_offset() returned FDT_ERR_NOTFOUND
cfgload: setenv bootargs "${bootrootfs} ${videoconfig} smsc95xx.macaddr=${macaddr} governor=${governor} ${hdmi_phy_control} usb-sto"
cfgload: dmc ${ddr_freq}
cfgload: bootz 0x40008000 0x42000000 0x44000000
Kernel image @ 0x40008000 [ 0x000000 - 0x570578 ]
## Loading init Ramdisk from Legacy Image at 42000000 ...
   Image Name:   uInitrd
   Image Type:   ARM Linux RAMDisk Image (gzip compressed)
   Data Size:    6308074 Bytes = 6 MiB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 44000000
   Booting using the fdt blob at 0x44000000
   Using Device Tree in place at 44000000, end 44010c42

Starting kernel ...

 

Share this post


Link to post
Share on other sites

This would be cool!

I have now compared the minicom output from cold boot, it unfortunately is almost exactly the same. The only differences are is when time and speed of reading bytes are shown. Is there anything else to check? Never building another kernel update is not really an option...

Share this post


Link to post
Share on other sites

Hi Igor, Belegdol,

 

If i can help at some point (testing something or whatever), don't hesitate to ask me !

 

Regards,

Adrien

Share this post


Link to post
Share on other sites

I have tried investigating this further. First step was to purge ccache and other caches to exclude a compiliation issue. It did not help.

I am now trying to disable cgroups pid to see if this is the culprit, but doing so causes the kernel build to fail. I am using the following command line:

$ ./compile.sh KERNEL_ONLY=yes KERNEL_CONFIGURE=yes KERNEL_KEEP_CONFIG=no BOARD=odroidxu4 SUBREVISION=.1

Then, once configuration menu is reached, I disable cgroups. Same happens if I git revert the commit in question and try building with KERNEL_CONFIGURE=no:

Makefile:1051: recipe for target 'net' failed
[ error ] ERROR in function compile_kernel [ compilation.sh:382 ]
[ error ] Kernel was not built [ @host ]
[ o.k. ] Process terminated 

How can I make the errors more verbose?

Share this post


Link to post
Share on other sites
56 minutes ago, grunlab said:

If i can help at some point


2-3 full time engineers would need several months to setup such system from where we are now (no automated testings at all). And someone has to learn the knowledge and lead this project. Booting some image when you have some time and telling that it doesn't work adds little value. We need automated testing facility which is very critical and demanding at major upgrades. Those are the time when I don't sleep well for a week or more.

Share this post


Link to post
Share on other sites
13 minutes ago, Igor said:


Purge sources. Do you perhaps compile on some network drive?

I compile on my VMs main drive. Still no success with purging the sources:

$ ./compile.sh KERNEL_ONLY=yes KERNEL_CONFIGURE=no KERNEL_KEEP_CONFIG=no BOARD=odroidxu4 SUBREVISION=.1 CLEAN_LEVEL=make,alldebs,images,cache,sources,extras

This time the error is somewhere else though:

Makefile:1051: recipe for target 'drivers' failed
[ error ] ERROR in function compile_kernel [ compilation.sh:382 ]
[ error ] Kernel was not built [ @host ]
[ o.k. ] Process terminated 

 

Share this post


Link to post
Share on other sites
14 minutes ago, belegdol said:

Still no success with purging the sources:


What if purging doesn't work properly and your sources are corrupted somehow? Remove cache manually.

 

I can't recreate this problem. For me everything works on a cleanly installed Ubuntu Bionic server.

 

16 minutes ago, belegdol said:

This time the error is somewhere else though

 

Check errors in logs. They are in output/debug/compilation.log

Share this post


Link to post
Share on other sites

I have now tried deleting cache/sources manually, still no dice. The actual error was:

kernel/sched/fair.c:6215:12: warning: ‘cpu_util_wake’ defined but not used [-Wunused-function]
 static int cpu_util_wake(int cpu, struct task_struct *p)
            ^~~~~~~~~~~~~
drivers/hardkernel/ina231-i2c.c: In function ‘ina231_work’:
drivers/hardkernel/ina231-i2c.c:106:60: warning: self-comparison always evaluates to false [-Wtautological-compare]
   if ((sensor->cur_uV > sensor->max_uV) || (sensor->cur_uA > sensor->cur_uA)) {
                                                            ^
drivers/gpu/drm/exynos/exynos_hdmi.c:731:22: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘5656’ to ‘24’ [-Woverflow]
    0x01, 0xD1, 0x29, 0x1618, 0x418, 0x190, 0xF5, 0xCF,
                      ^~~~~~
drivers/gpu/drm/exynos/exynos_hdmi.c:731:30: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘1048’ to ‘24’ [-Woverflow]
    0x01, 0xD1, 0x29, 0x1618, 0x418, 0x190, 0xF5, 0xCF,
                              ^~~~~
drivers/gpu/drm/exynos/exynos_hdmi.c:731:37: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘400’ to ‘144’ [-Woverflow]
    0x01, 0xD1, 0x29, 0x1618, 0x418, 0x190, 0xF5, 0xCF,
                                     ^~~~~
drivers/gpu/drm/exynos/exynos_hdmi.c:732:10: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘360’ to ‘104’ [-Woverflow]
    0x8D, 0x168, 0xF5, 0xD8, 0x45, 0xA0, 0xAC, 0x80,
          ^~~~~
fs/proc/task_mmu.c: In function ‘show_smap’:
fs/proc/task_mmu.c:764:7: warning: ‘last_vma’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  bool last_vma;
       ^~~~~~~~
Aborted (core dumped)
make[2]: *** [fs/reiserfs/fix_node.o] Error 134
make[2]: *** Deleting file 'fs/reiserfs/fix_node.o'
make[1]: *** [fs/reiserfs] Error 2
make: *** [fs] Error 2
make: *** Waiting for unfinished jobs....
Aborted (core dumped)
make[4]: *** [drivers/media/platform/exynos-gsc/gsc-core.o] Error 134
make[4]: *** Deleting file 'drivers/media/platform/exynos-gsc/gsc-core.o'
make[3]: *** [drivers/media/platform/exynos-gsc] Error 2
make[2]: *** [drivers/media/platform] Error 2
make[2]: *** Waiting for unfinished jobs....
Aborted (core dumped)
make[2]: *** [drivers/pinctrl/devicetree.o] Error 134
make[2]: *** Deleting file 'drivers/pinctrl/devicetree.o'
make[1]: *** [drivers/pinctrl] Error 2
make[1]: *** Waiting for unfinished jobs....
Aborted (core dumped)
make[3]: *** [drivers/phy/samsung/phy-samsung-usb2.o] Error 134
make[3]: *** Deleting file 'drivers/phy/samsung/phy-samsung-usb2.o'
make[2]: *** [drivers/phy/samsung] Error 2
make[1]: *** [drivers/phy] Error 2
Aborted (core dumped)
make[3]: *** [drivers/net/team/team_mode_roundrobin.o] Error 134
make[3]: *** Deleting file 'drivers/net/team/team_mode_roundrobin.o'
make[2]: *** [drivers/net/team] Error 2
make[2]: *** Waiting for unfinished jobs....
Aborted (core dumped)
make[3]: *** [drivers/net/usb/r8152.o] Error 134
make[3]: *** Deleting file 'drivers/net/usb/r8152.o'
make[3]: *** Waiting for unfinished jobs....
Aborted (core dumped)
make[3]: *** [drivers/net/usb/usbnet.o] Error 134
make[3]: *** Deleting file 'drivers/net/usb/usbnet.o'
make[2]: *** [drivers/net/usb] Error 2
make[1]: *** [drivers/net] Error 2
Aborted (core dumped)
make[2]: *** [net/netfilter/nft_range.o] Error 134
make[2]: *** Deleting file 'net/netfilter/nft_range.o'
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [net/netfilter] Error 2
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [drivers/media] Error 2
make: *** [drivers] Error 2
make: *** [net] Error 2

After I started getting similar errors when trying to buikd unmodified git master I thought I was going crazy. Luckily it turned out that ccache was the culprit. With USE_CCACHE=no I was able to build both the pristine source and one with cgroups change reverted, no CLEAN_LEVEL increase or manual deletion of anything was needed.

I have to go to the office now so testing of the kernels will have to wait until later but at least I have something to test now.

Share this post


Link to post
Share on other sites

It unfortunately appears that reverting the pids change is not enough to fix reboot problems. I will try bisecting, hopefully the issue is in armbian git and not in one of the upstream ones...

Share this post


Link to post
Share on other sites

I have done the full bisect run:

git bisect start
# bad: [6ec526eaf0dbd333349c8f1b517f090931ee0c6c] To run 32bit rustc you need enabled cp15 barrier emulation. (#1680)
git bisect bad 6ec526eaf0dbd333349c8f1b517f090931ee0c6c
# good: [7ebc310c9679ae3ebe22aac480a42f29b0a0281d] Merge branch 'master' of https://github.com/armbian/build
git bisect good 7ebc310c9679ae3ebe22aac480a42f29b0a0281d
# good: [da8cfe78c04b786c0ae967231891c86a0543248d] Disabled hs400 mode of roc-rk3399-pc's emmc (#1666)
git bisect good da8cfe78c04b786c0ae967231891c86a0543248d
# bad: [2e69b173bf957e1e54bce9c849d27dd796d7425d] Merge pull request #1673 from armbian/focal
git bisect bad 2e69b173bf957e1e54bce9c849d27dd796d7425d
# bad: [755388147d65a12d39b31898be29434f999700c8] Enable Kubernetes dependency
git bisect bad 755388147d65a12d39b31898be29434f999700c8
# good: [efb5f68cc3064b1cf09678db95b2d9ff9484c39e] fix typo in fix-a64-timejump patch
git bisect good efb5f68cc3064b1cf09678db95b2d9ff9484c39e
# good: [9241c849b5e3fb19ae8e4814c9a46badb9a870a5] [Maintanace] Rootfs cache has been rebuild with new version.
git bisect good 9241c849b5e3fb19ae8e4814c9a46badb9a870a5
# good: [1351bdfe3fdf30d7603cedfa1feccaf802718732] Espressobin: add missing/corrected RAM topology
git bisect good 1351bdfe3fdf30d7603cedfa1feccaf802718732
# first bad commit: [755388147d65a12d39b31898be29434f999700c8] Enable Kubernetes dependency

I am not entirely sure if it worked as intended - How can it say that 7553881 is the first bad commit if there are 130c23c and 8df3e98 between it and 1351bdf? In any case, I reverted the change once again and this time the machine does reboot properly. Not sure why this did not work earlier, maybe I got my kernels confused.

Share this post


Link to post
Share on other sites
1 hour ago, Igor said:


Problem can also come from here as well:
https://github.com/hardkernel/linux/commits/odroidxu4-4.14.y

I think it is unlikely, as there were no commits to that tree between 5th December and now.  I also did the entire bisect run within a few hours so changes to upstream trees, if any, would have been minimal.

I think what most likely happened is that when I initially managed to compile the kernel with cgroups disabled, I must have somehow mixed up which kernel is installed. Unfortunately /var/log/apt/history.log has already rotated so the info what exactly happened is gone.

 

ETA: hardkernel and memeka are currently working on getting 5.4 kernel to work so probably the best course of action would be to back out the cgroups change and revisit the issue once 5.4 is released by hardkernel.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
2 2