blue-v Posted May 19, 2023 Posted May 19, 2023 (edited) Dear all, I have a strage problem here with handling large files and I ran out of ideas. System: Odroid C4 (running from SD-card) OS: 5.7.15-meson64 #20.08 SMP PREEMPT PRETTY_NAME="Debian GNU/Linux 10 (buster)" Attached to the system are a few USB disks: 3x 8TB, 1x 4TB, 1x 3TB, 1x 2TB All filesystems: ext4 (except the remote NAS filesystem, which is ZFS - see below) 3 of these disks are encrypted via truecrypt 7.1 On the 3 TB disk (ext4, unencrypted) there is a zipped dd-image of a harddisk. That file has about 760 GB size Now I tried to copy this file to a different location. 1st try: rsync to a NAS filer (ZFS, OmniOS, NFSv4) -> The rsync process goes to "D" state after a couple of GB 2nd try: scp to the same NAS -> same result. Copies a couple of gigabytes and then freezes 3rd to 5th try: mounted the NAS via NFSv4 and used rsync, scp and cp to the mountpoint -> same result 6th and 7th try: Tried to copy to a local (truecrypted) 8TB disk using rsync and cp -> same result All these attemts stopped after transferring a couple of gigabytes and then the involved processes showed "defunct" state. The transferred size is different every time and is between 40 an 200 GBytes. What I tried to pinpoint the problem: Tried to read the big file: - md5sum <big_imagefile> -> works! - dd if=<file> of=/dev/null bs=100M -> works! Tried to write a big file: - dd if=/dev/zero of=<file_on_8TB_truecrypted_disk> bs=100M count=10000 -> works! Tried to copy using dd: - dd if=<the_image_file> of=<file_on_truecrypted_disk> bs=100M -> works! Checksum is ok. So I finally managed to copy the file to a different location using dd. But why is cp, scp and rsync failing? Additional information: During the process I did several experiments with the swapspace (disabled, used swapfile on SD-card, etc) nothing helped. In the logfiles there is not the smallest hint to a problem. Also observing the memory during copy using "free" command did not show any unusual state. Systemlimits: root@odroidc4:/var# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 11363 max locked memory (kbytes, -l) 65536 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 11363 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited No overclocking or something. Just a plain Armbian installation. No funny "tuning". I also have the same problem with some other applications. For example rmlint, for eliminating duplicates, runs for a while (30 minutes?) and then freezes. After that I connected the USB disk to a different Linux system and rmlint worked fine. Fun fact: I'm pretty sure I originally copied the long file from a "full size" Linux system via rsync to the Odroid a couple of months ago. Any ideas what the problem could be? Thank you! Lothar Edited May 19, 2023 by blue-v Typo 0 Quote
Igor Posted May 19, 2023 Posted May 19, 2023 27 minutes ago, blue-v said: Just a plain Armbian installation. No funny "tuning". 27 minutes ago, blue-v said: OS: 5.7.15-meson64 #20.08 SMP PREEMPT Almost 3 years old build is way out of any "warranty" Did you try the same with latest image? 0 Quote
blue-v Posted May 19, 2023 Author Posted May 19, 2023 No, I did not (yet). But even for an "old" system, copying large files should not be a problem I think. Or asked the other way round: Is there a known fix in the newer versions that addresses this problem? Lothar 0 Quote
SteeMan Posted May 19, 2023 Posted May 19, 2023 1 hour ago, blue-v said: Or asked the other way round: Is there a known fix in the newer versions that addresses this problem? This question is irrelevant. No one really knows the answer. 1 hour ago, Igor said: Did you try the same with latest image? What Igor is really saying, is if you want anyone to spend their time helping with this, you need to reproduce on current code. If this was a common problem, it would already be fixed as many others would report something like this. So it is either something due to your older code, or something specific to your environment. Those are difficult to diagnose, and when you are asking people to volunteer their time to help you, they are only going to do that, if it is a reasonable to do and you have done everything to narrow the scope of the problem to something reproduceable by someone else. My first thought is based on the lack of information provided, but is a common cause of mysterious errors, and that is power issues. How is all of this hardware powered? SBCs are notorious for having poor power supplies and under load have voltage drops that cause mysterious problems (especially with usb devices drawing some of that power). 0 Quote
blue-v Posted May 19, 2023 Author Posted May 19, 2023 Dear SteeMan, thank you for your suggestion. I'm aware of the problems reproducing the problem with an old OS. And I will upgrade and check as soon as possible. My hope was that someone has had a similar effect and could provide some hints. The power topic is a nice idea! Just what I was looking for: A new idea. I'll check. The system is not in reach for me at the moment, but it will be in a few hours. As soon as there are new conclusions, I'll report. Thank you! Lothar 0 Quote
Igor Posted May 19, 2023 Posted May 19, 2023 42 minutes ago, blue-v said: My hope was that someone has had a similar effect and could provide some hints. That is legitimate hope, but with latest and greatest OS its hard enough. Probably nobody on the planet is running kernel you run. Next. Armbian is installed on certain % of Odroid devices. A small % of their users understands what you are talking about, very small % of those has time to listen, a few might answer (with general hint as such) and terrible small can actually give a usable hint (in decent time) or, which is close to impossible, leave everything and dedicate a whole week to resolve this problem. For kernel that is completely outdated, not 99, but 100% of people will look the other way. I hope problem you are describing was fixed long time ago. If not, remember there are a lot of bugs, wishes, ideas and very little people. Good luck! 0 Quote
blue-v Posted May 19, 2023 Author Posted May 19, 2023 Igor, you lost that bet - I know about at least one other installation. Don't ask me how I know 🙂 Anyway. I now upgraded to bullseye 6.1.11-meson64 #23.02.2 SMP PREEMPT. And the problem... ... seems to be gone! The C4 copied the large file to the NAS using rsync via NFS in the first attempt. I will do some more testing the next days. Also a run of rmlint, which never worked before on big directories. Thank you for your help! Lothar 0 Quote
Igor Posted May 20, 2023 Posted May 20, 2023 11 hours ago, blue-v said: I now upgraded to bullseye 6.1.11-meson64 #23.02.2 SMP PREEMPT. And the problem... ... seems to be gone! Yes, exactly this is what we hope you to do 1st, before even thinking to complain about. In several years many problems are fixed, so going once again on this path again, just for you, because you don't run latest kernel ... is plain stupid. I know you agree. We are also pre-programmed - if you don't run latest kernel = ignore that user / report as we are already way to small to address issues on latest kernel and there is nothing we can do about. We have to at least try to prevent users to waste time in such grotesque manner. That's the point of - "update first!" 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.