Jump to content

MGLRU patches to bring down kswapd cpu usage


jock

Recommended Posts

This looks very interesting, kswapd is essentially a cpu hungry process in nearly most common tasks that move memory around, glad to hear there is interesting work on to bring down cpu usage and make page swapping better.

 

I would like to bring the patches in for some kernel and see if things would improve for light/medium desktop workloads.

Anyone has some experience about?

Link to comment
Share on other sites

i have this on my radar and my todo list for a while, but did not yet get to testing it - it looks very primising for lower end systems as the ones we are often dealing with here, especially i assume it will give a big push in useability for 4gb systems (i.e. enough ram to really make something out of it)

Link to comment
Share on other sites

Well for what I see, even cpu-intensive threads which do a lot of memory pressure will keep kswapd kernel thread very busy, no matter the amount of memory the board has.

My first thought goes to heavily taxed network routing jobs with gigabit ethernet interfaces, for example.

 

In fact I often see kswapd very busy on regular x86 machines too.

Link to comment
Share on other sites

Well, some initial tests didn't provide an acceptable result.

Applying the MGLRU v9 patch to rk322x family, which is the lowest end of rockchip offers and way common to get tv boxes/boards with 1GB of RAM, results in unusable system due to frequent crashes due to memory and filesystem mismanagement.

 

I guess the patch has never been really tested on armhf targets...

Link to comment
Share on other sites

@jock - after running the patches successfully for a while on aarch64 i gave it a try on 32bit arm too and failed in a similar way as you did: those patches seem to simply not work on 32bit arm - i sent an email to the patch series author and will let you know once i'll get a response ...

Link to comment
Share on other sites

@hexdump Thanks a lot! I later checked out the patch series and in the description armhf architecture is effectively omitted, so probably is untested and not expected to work. Anyway I don't know how it could enter into mainline if it is allowed to compile on armhf but cause heavy issues: the kconfig file should at least be patched to allow compilation only on amd64 and arm64 architectures.

 

I hope they will provide fix and tests for armhf too - hopefully x86 32 bit too. Those 32 bit architectures are going to benefit a lot from this patch since most memory-constrained devices are 32 bit only.

Link to comment
Share on other sites

@jock - i already sent you a pm about this, but meanwhile i tested it to be working for me, so maybe its better to post it here as well in case others are interested too: i got a response from the mglru patch series author and a patch which seems to fix the problem on 32bit arm for me (tested on rk3288 with 2gb ram so far) - the patch can be found here: https://github.com/hexdump0815/kernel-extra-patches/blob/main/multi-gen-lru/v11/v11-15-extra-patch-from-author-with-armv7l-fix.diff

 

good luck and best wishes - hexdump

Link to comment
Share on other sites

46 minutes ago, hexdump said:

@jock- just fyi: there is now an updated version of the extra patch from the patch series author against mglru v11: https://github.com/hexdump0815/kernel-extra-patches/commit/87ba91b3503f95d625ab5eed403e47e65986fd89 still got some warnings with the old version

Thanks a lot, ended up that yesterday I tested kernel v5.18.0 on rk322x with the old version of the extra patch compiling the whole debian mesa packages ecosystem with success. The box was sporting just 1gb of ram, 512mb of zram swap space and 2gb of extra USB HDD swap file.

The conditions were absolutely heavy and unhealthy, but the whole packages rebuilding from sources finally completed without errors, even after extreme swapping and hours of compilation time. The system was always responsive to SSH shells, which is a great achievement by itself!

 

 

Link to comment
Share on other sites

8 hours ago, hexdump said:

ittle update: v12 of the patches is out

Thank you for the service to track the progress. Now I can pick up MGLRU again for the next 5.19.0-rc3 build.
I have been using MGLRU patches since the first post of @jock. I have a habit of opening many pages in tabs in my browser. I use this as a resubmission function for pages I want to follow for a short time. Open tabs seem to have the habit of requesting more and more memory, even if they are not actively viewed. This leads to the fact that sooner or later memory swapping begins. When swapping starts, all swap space is used up very quickly and the system responds very laggy. My current workaround is to close and reopen the browser to free up the used memory. I then only reactivate the tabs of most interest, but over time I reaktivate more tabs and the game starts again. With the MGLRU feature in place, the situation seems to improve. Memory swapping seems to kick in way later and if swapping, filling up the swap space feels way slower.

 

8 hours ago, jock said:

I would like to let the rk3318 people test it on aarch64 too very soon

If you want, you can use my kernel build for a quick test. Because it is generic built, it should work for your devices. You probably need to use a suitable DTB, but this should not be a problem as long as it obeys the mainline bindings. You can put it alongside your current running kernel and decide at boot time which to run. If this experiment with my kernel fails in the end, at least you've learned how to keep as many kernels in place at the same time as your persistent space allows. See this thread for more details. For the first test you can use the 5.18.0-0.rc3 build offered there and then switch to my upcoming 5.19.0-rc3 build if suitable. Btw, my kernel is at mainline 5.18.0+ media subsystem wise.

Link to comment
Share on other sites

@usual user thanks for the offer: it was really straightforward to apply the whole bundled patchset to rockchip and rk322x families that I guess it is just a matter of copying the patch from a directory to another and recompile to get a fresh kernel with the working feature even for rockchip64 family.

In case of issues, I will surely ask for help 😉

Link to comment
Share on other sites

@hexdumppointed me to this discussion -- thank you for all the testing, much appreciated!

 

If you have MGLRU related questions, please feel free to shoot me emails.

 

The following option occasionally causes problems, so please set it to zero. The analyses from Ubuntu, Debian and a few others I'm too lazy to quote :)

$ cat /proc/sys/vm/watermark_boost_factor
0

 

I'll submit a fix later today and hopefully it'll be in v5.20.

Link to comment
Share on other sites

@yuzhaogoogle Managed to build some v5.18 kernels and give patches v11 a shot, but actually v5.18 seems to have some glitchy behaviour on rockchip devices.

I tried both on rk3399 (without MGLRU patches) and rk3318 (with MGLRU) patches and have had severe mmc controller issues.

 

Anyway, after many trials and freezes, I got some stack traces that seems to be related to MGLRU.

I attach the full dmesg.log with a stack trace that later turned out as a kernel crash (also attached), along with kernel map for debug.

 

Since v5.18 looks faulty by itself on these devices I don't know if it is worthy to check the logs until I get something stable without MGLRU.

 

update: I double checked and found that a suspect patch was actually making my system very unstable, so these logs and errors are, with high probability, caused by the other patch and not related to MGLRU at all.

armbian-kernel-crash.txz

Link to comment
Share on other sites

Hi @jock

 

According to the following warning from dmesg.log, it seems the first version of the fix @hexdump posted was used -- unfortunately it's also buggy (sorry)...

[ 1235.795803] ------------[ cut here ]------------
[ 1235.795827] WARNING: CPU: 2 PID: 55 at mm/vmscan.c:4464 lru_gen_look_around+0x3fc/0x728

 

I do have the latest MGLRU backported to v5.18, and you can apply it by

git fetch https://linux-mm.googlesource.com/page-reclaim refs/changes/17/1617/2 && git cherry-pick FETCH_HEAD~14..FETCH_HEAD

 

I'm also trying to attach the patch file here but it seems I'm too new to be allowed to attach files. Please feel free to send me an email if you need the patch file or anything else.

 

From kernel_panic.log, it seems the bad thing already happened before

root@rk3318-box:~# [ 9596.639183] BUG: Bad page state in process kswapd0  pfn:1d8b6
[ 9596.640943] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000a60

 

Usually the Bad page state warning appears when there is a double-free or free-after-use memory corruption. It'd be helpful if there is a way to grab the log preceding this error.

 

Alternatively, if we can't fix the v5.18 baseline kernel, I can further backport MGLRU to v5.17.

Edited by yuzhaogoogle
Link to comment
Share on other sites

@yuzhaogoogle thanks a lot for the answer; I finally I managed to recompile the kernel v5.18 with v11 patch as-is (without the additional fix mentioned by @hexdump).

 

I removed an offending patch that was in the armbian patch set and the system is now just stable as I expected it to be; just doing the usual desktop business for a while (browsing on half dozen firefox tabs, streaming music via bluetooth, moving files via samba, a couple of terminals, ...) and keeping the device busy for a whole night resulted into perfect stability so far, clean dmesg and 662 megabytes of swap file usage (zram) out of 1gb of swap available.

 

I will look forward to contact you for a the new patchset backported to v5.18 with latest fixes, I think the forum community will be happy to test and give feedback 😉

Link to comment
Share on other sites

On 6/22/2022 at 9:17 AM, yuzhaogoogle said:

Alternatively, if we can't fix the v5.18 baseline kernel, I can further backport MGLRU to v5.17.

Hi Yu,

 

If you could setup a git repository and backport latest MGLRU to at least:

 

- Last stable kernel (5.18.10 at the moment)

- Last LTS kernel (5.15.53 at the moment)

 

it would be really beneficial for people who want to test MLGRU.

 

Regards.

Link to comment
Share on other sites

@hartraft - there are two kernel options for mglru: CONFIG_LRU_GEN=y and CONFIG_LRU_GEN_ENABLED=y - the first is to have mglru built into the kernel and the second is to have it enabled by default - if they are not in your kernel config it might be required to rebuild the kernel with them (or at least the first) enabled

Link to comment
Share on other sites

@hartraft Currently, I can assure you that the tinkerboard with armbian edge kernel (6.1) receives MGLRU compiled and enabled by default because I maintain the rk3288 (rockchip 32 bit) family.

The other two boards are not under my maintenance so I can't say anything about, but you could check in the config sample file in your /boot directory to see if a kernel is compiled with the options pointed by @hexdump

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines