Igor Posted September 2, 2016 Share Posted September 2, 2016 Current main Armbian build hardware is a desktop based "server" running Ubuntu Xenial - i7 4790 / 16 / SSD. We already made lots's of optimisations within the build process but for 140 images I still need up to 14 hours where kernels were already cached. How is it possible to cut this time down significantly - to few hours, perhaps on a server grade hardware and further code optimisations? Currently I don't have access to any much better hardware to just run "build all" and see. I am looking for specific ideas for a build server upgrade and sponsored rent / exchange for advertisement. I was already advised by a friend, who knows something about server hardware: "Just get an 2 x E5-2667v4 and an ASUS board with onboard M2 ssd" ... but this is (far) out of the reach and rather stupid to own such a costly device for those rear events. 1 Link to comment Share on other sites More sharing options...
tkaiser Posted September 2, 2016 Share Posted September 2, 2016 Let's have a look at the non-hardware approach first: macbookpro-tk:boards tk$ cat * | awk -F"=" '/^LINUXFAMILY/ {print $2}' | sort | uniq -c | sort -n -r 14 sun7i 12 sun8i 3 sun4i 2 udoo 2 s500 1 toradex 1 sun9i 1 sun6i 1 sun5i 1 pine64 1 odroidxu4 1 odroidc2 1 odroidc1 1 marvell 1 cubox By building one image per $LINUXFAMILY and customizing the image with all board specific stuff prior to closing the image (we already talked about that back in April) we reduce 43 to 15 (that is a third). With Xenial as recommended build host we're also on 4.4 and can safely use btrfs so having btrfs dedup capabilties in mind the build-all script can create for example all 14 sun7i OS images separately from a sun7i base image each being 2 GB in size with storage requirements of just 2 GB + a few MB that contain the real differences (use of snapshots and/or dedup). I don't find the issue/thread now but Zador had a few objections against this that have to be considered. But if we can tweak the build process at this point by relying on modern fs features a reduction to 40% build time should be possible on the same hardware. Link to comment Share on other sites More sharing options...
Tido Posted September 2, 2016 Share Posted September 2, 2016 Btrfs RAID 5/6 Code Found To Be Very Unsafe & Will Likely Require A Rewrite It turns out the RAID5 and RAID6 code for the Btrfs file-system's built-in RAID support is faulty and users should not be making use of it if you care about your data. just in case you have RAID on the server.. Link to comment Share on other sites More sharing options...
Igor Posted September 2, 2016 Author Share Posted September 2, 2016 I am not sure we would benefit much with changing fs - most time is lost in packing / unpacking and headers compiling. I am just about to optimise / speed up that. But doing this differently, yes we can save some time. Some images are different only on bootloader, which takes no time to change, but images still needs to be packed. Link to comment Share on other sites More sharing options...
tkaiser Posted September 2, 2016 Share Posted September 2, 2016 just in case you have RAID on the server.. Thank you for poisoning/hijacking this thread with FUD. Even mdraid's raid6 code was good for data losses (undetected for over 5 years), this is not about RAID in general (since btrfs on a hardware RAID is not affected), I'm using raid-0 with btrfs on my machine happily to speed up things and guess what: we're neither that dumb as you think nor do we risk 'data loss' at this point since if you would've read what you've linked to you would have realized that the bug is contained in btrfs scrub routines. We're talking here about speeding up image creation, that means using a btrfs feature to temporarely store a huge amount of data without the need to have that much physical storage. No one will be dumb enough to run then a scrub! So please stop posting links to clickbait sites. Thank you. I am not sure we would benefit much with changing fs - most time is lost in packing / unpacking and headers compiling. I am just about to optimise / speed up that. But wouldn't that change if this is done just once instead of 14 times for sun7i for example? Link to comment Share on other sites More sharing options...
Igor Posted September 2, 2016 Author Share Posted September 2, 2016 But wouldn't that change if this is done just once instead of 14 times for sun7i for example? This might bring (some) refactoring ... what if we just remove compression out of the process and use a light one for end image? Link to comment Share on other sites More sharing options...
tkaiser Posted September 2, 2016 Share Posted September 2, 2016 This might bring (some) refactoring ... what if we just remove compression out of the process and use a light one for end image? Sure, refactoring would be necessary and as Zador pointed out back then our limited ressources are better spent on other stuff. But since we know Zador in the meantime a bit maybe the many many tweaks he did to the build system in the meantime already ease such a 'per LINUXFAMILY' switch? Haven't looked into. I think I now understand what you're referring to with 'packing / unpacking' (compression): rootfs and final image? Regarding the first btrfs with transparent file compression might also help (let btrfs do the job with light LZO compression or change script to skip compression entirely?), regarding the latter a switch to a lighter compression (reverting back to ZIP which increases archive size by 30%?) would help of course. But in case this is the bottleneck is it possible to add the compression process as part of the image upload process to mirror.igorpecovnik.com? We use ZIP-streaming with 2 webapps which greatly improved download speeds and CPU ressource utilization since instead of creating one large archive first and then letting the browser download it, the compression is part of the download process which starts immediately and limits also CPU ressources. AFAIK 7zip supports also stdout so this approach might work in upload direction too? Link to comment Share on other sites More sharing options...
Igor Posted September 2, 2016 Author Share Posted September 2, 2016 I gain about two hours with reducing compression ratio of final images and images are only slightly bigger. Now I am trying the same on rootfs cache. This direct upload might be o.k. but it's not practical. I finish things here, check and sometime things breaks ... I rather double check and run an upload script. The upload time was not measured in the build numbers above ... I got a 10Mbit up-link so do the math Link to comment Share on other sites More sharing options...
tkaiser Posted September 2, 2016 Share Posted September 2, 2016 Well, sure, maximum compression might take 4 times longer to gain 4 percent less archive size so please go with improved settings. Regarding upload script. We have many workflows running at customers where print servers have to feed hotfolders on other servers. Since various of these hotfolder implementations suck (starting to process files too early) we use everywhere scripts that stream the file to be processed next to the hotfolder and move it inside as part of a post-processing script. If you've to deal with a 10Mbit uplink maybe automatic upload with compression applied at this stage to a hidden area (http://mirror.igorpecovnik.com/not_ready/ containing an index file hiding the contents) and moving from there one directory below manually when you've finished tests is an option if disk space on mirror.igorpecovnik.com allows it? Link to comment Share on other sites More sharing options...
Tido Posted September 2, 2016 Share Posted September 2, 2016 this is not about RAID in general You are totally right Thomas. As it is written, black on white: Btrfs RAID 5/6 Code Found To Be Very.. So your comment to my comment was useless bullshit. One more time. Link to comment Share on other sites More sharing options...
rodolfo Posted September 2, 2016 Share Posted September 2, 2016 Keep it simple. Rented virtual servers are cheap and efficient for peak loads and excellent bandwith at a fraction of the costs for homebrew. Link to comment Share on other sites More sharing options...
zador.blood.stained Posted September 2, 2016 Share Posted September 2, 2016 Maybe we should find bottlenecks first? On my build host, for example, it's definitely single-threaded CPU power (possibly combined with relatively low DRAM throughput), which results in spending noticeable time on single-threaded CPU heavy tasks like compiling headers in chroot or packing .deb files with tons of files, (linux-image, linux-headers). I'm not talking about compiling in extras-buildpkgs, because this takes ~5 hours per platform (first time compilation with empty ccache) for me because qemu based emulation is terribly inefficient. @Tido We are not talking about RAID yet, especially about RAID 5/6. 1 or 2 good SSDs should have enough throughput to not be a bottleneck for Armbian build process IMO. Link to comment Share on other sites More sharing options...
Igor Posted September 2, 2016 Author Share Posted September 2, 2016 @rodolfoI made a brief investigation on rentals but if we go for many dedicated core(s) this also cost a fortune - perhaps I was not looking for a right deal? Our case is 97% idle and that 3% we need full horse power. And it must be way faster than current setup, otherwise this makes no sense. @zador Kernel compilation is o.k., while packaging is single threaded. Headers - we could leave install to user if we can't optimise - just this part eats roughly one minute. What can be done on quemu? Link to comment Share on other sites More sharing options...
zador.blood.stained Posted September 2, 2016 Share Posted September 2, 2016 @zador Kernel compilation is o.k., while packaging is single threaded. Yes, I was talking about packaging part. Headers - we could leave install to user if we can't optimise - just this part eats roughly one minute. You mean don't install headers package by default? This would help with build time, but forum may be flooded with posts from users who can't compile their drivers, and since we use separate headers packages for each $LINUXFAMILY and $BRANCH, generic Debian/Ubuntu instructions don't apply which further confuses people. What can be done on quemu? Not much unfortunately, Trying to bypass chroot and use cross-compilation (for kernel headers or for extra packages) may create additional problems and will require extensive testing. Sure, refactoring would be necessary and as Zador pointed out back then our limited ressources are better spent on other stuff. But since we know Zador in the meantime a bit maybe the many many tweaks he did to the build system in the meantime already ease such a 'per LINUXFAMILY' switch? Haven't looked into. I don't see to much advantages in "moving cache boundary" and archiving rootfs with preinstalled packages; using BTRFS snapshots instead of rootfs cache may improve build time, but creates additional dependency problems for users that still use Trusty build host and who have ext4 filesystem by default. I think I now understand what you're referring to with 'packing / unpacking' (compression): rootfs and final image? Regarding the first btrfs with transparent file compression might also help (let btrfs do the job with light LZO compression or change script to skip compression entirely?), regarding the latter a switch to a lighter compression (reverting back to ZIP which increases archive size by 30%?) would help of course. 7-zip on normal / fast settings may still show better results than zip, this needs to be tested on our specific data (OS images). Link to comment Share on other sites More sharing options...
zador.blood.stained Posted September 2, 2016 Share Posted September 2, 2016 Current main Armbian build hardware is a desktop based "server" running Ubuntu Xenial - i7 4790 / 16 / SSD. We already made lots's of optimisations within the build process but for 140 images I still need up to 14 hours where kernels were already cached. Hm, I think that CPU with 4 cores may be able to run 2 containers in parallel with fixed CPU affinity (2 cores per container) and proper memory limit. This may cut down several hours from total build time on single threaded tasks. Link to comment Share on other sites More sharing options...
tkaiser Posted September 2, 2016 Share Posted September 2, 2016 Hm, I think that CPU with 4 cores may be able to run 2 containers in parallel with fixed CPU affinity (2 cores per container) and proper memory limit. This may cut down several hours from total build time on single threaded tasks. That's a great idea! And maybe instead of fixed CPU affinity using even overcomittment of CPU ressources and 'timed' build starts will improve things even more. But this would of course require some sort of master node and inter machine communitation... Back in the days when graphical industry used PostScript we implemented conversion to PDF the same way. Adobe's cheap Distiller version for Windows ran single-threaded and multi-threaded 'Distiller Server' was very expensive. 12 Distiller-VMs on an ESXi host with 8 CPU cores performed the best. Link to comment Share on other sites More sharing options...
Igor Posted September 2, 2016 Author Share Posted September 2, 2016 Hm, I think that CPU with 4 cores may be able to run 2 containers in parallel with fixed CPU affinity (2 cores per container) and proper memory limit. This may cut down several hours from total build time on single threaded tasks. Good idea! You mean by running two (why not four or more if we up a hw once?) instances of the script in parallel with some simple job scheduler? Link to comment Share on other sites More sharing options...
zador.blood.stained Posted September 2, 2016 Share Posted September 2, 2016 You mean by running two (why not four or more if we up a hw once?) instances of the script in parallel with some simple job scheduler? My initial thought was about something like systemd-nspawn in template or ephemeral mode with bind and tmpfs mounts like here, but if we properly split tasks between instances and modify debootstrap to use unique names for each cache file, we may not need containers at all. Right now there are some tasks that use non-unique file names (like debootstrap) or some tasks where you need read/write locks (like one task could try to install armbian-firmware deb file that is still in packaging process by another task) Link to comment Share on other sites More sharing options...
Igor Posted September 2, 2016 Author Share Posted September 2, 2016 Skip sources checking would also save some time. Anyway, those optimisations can wait after the release. Link to comment Share on other sites More sharing options...
Igor Posted September 4, 2016 Author Share Posted September 4, 2016 Lowering compression ratio saved almost 3h in total. 1 Link to comment Share on other sites More sharing options...
Igor Posted September 4, 2016 Author Share Posted September 4, 2016 If we have those unique: $CACHEDIR/sdcard $CACHEDIR/armada_default_jessie_no $tmpraw -> "armada_default_jessie_no.raw" which gets final name the same way. Anything else - without detail code inspection ? https://github.com/igorpecovnik/lib/commit/194b730d5c3dd6a8b52462df33ee788ac2a5a182 Probably some better way could be found? Link to comment Share on other sites More sharing options...
tkaiser Posted September 4, 2016 Share Posted September 4, 2016 Hmm... if multiple instances of the script can run in parallel maybe instead of firing up more than one script dealing with a subset of all boards is not necessary if we're able to send the compression process in the background? So make an own function compress_image from these lines and call nice compress_image & when the whole process is started from build-all.sh (in this case output to stdout could als be supressed and while finalized image will be compressed the script already walks through the single-threaded steps with the next)? Edit: After comparing 7-zip compression times with -mx=3 vs -mx=9 and given that Igor only saves 3 hours by changing compression parameters sending compression to the background would bring just another saved hour. But it would be easy implementing a new function that simply uses 'mktemp -d' to create a temporary directory, move $filename ${version}.raw armbian.txt inside, does a cd inside and then signing and compression there. And build_all.sh just would've to create /tmp/.build_all to switch behaviour. Link to comment Share on other sites More sharing options...
zador.blood.stained Posted September 4, 2016 Share Posted September 4, 2016 @Igor If you want to touch this now (before the release), maybe we should start by splitting build-all.sh into 3 stages: 1. Build all kernels, u-boots and board support packages If we properly define CLEAN_LEVEL, this stage can be easily run in parallel 2. Build all externel packages (EXTERNAL_NEW=compile) This stage can run 2 containers in parallel, and I found one possible way to speed up compilation that needs to be investigated and tested later 3. Build all images If we have those unique: $CACHEDIR/sdcard $CACHEDIR/armada_default_jessie_no $tmpraw -> "armada_default_jessie_no.raw" which gets final name the same way. Anything else - without detail code inspection ? 1. We need to make whole $CACHEDIR unique to avoid conflicts, optionally split rootfs cache directory from other firectories 2. We need to install read/write locks on rootfs cache files, so two debootstraps for one $LINUXFAMILY won't conflict 3. We need to check if bind mounts work well in this situation (we are using this to install .deb files and unpack desktop overlays inside the chroot) ... to be continued... Link to comment Share on other sites More sharing options...
zador.blood.stained Posted September 5, 2016 Share Posted September 5, 2016 @Igor You should probably delay the release to include u-boot 2016.09 (scheduled release date 12 sep): http://forum.armbian.com/index.php/topic/1945-orange-pi-plus-2e/#entry15002 Link to comment Share on other sites More sharing options...
Igor Posted September 13, 2016 Author Share Posted September 13, 2016 For me it would be best to put out an update within one week (19.9.) or week after 26. sunxi - libvdpau, any luck? Image creation script in parallel looks like some work but doable. Let's slowly move to that direction. One thought regarding hardware: upgrade to 64GB RAM and doubling the CPU cores? If running everything on RAM and having only output/images on a solid state drive. How much extra time we could save this way? Link to comment Share on other sites More sharing options...
zador.blood.stained Posted September 13, 2016 Share Posted September 13, 2016 sunxi - libvdpau, any luck? I did a fresh recompilation of everything after latest changes, all seems to work for me (at least for Jesie target). In addition compilation time for me was reduced from >10 hours to <6 hours, which is a noticeable improvement. In case you still have errors, please post or PM console logs. Image creation script in parallel looks like some work but doable. Let's slowly move to that direction. Yes, but preferrably after the release, not before. To reduce build time you could disable auto building of H3 mainline images, since IMO there is no reason to update them yet. One thought regarding hardware: upgrade to 64GB RAM and doubling the CPU cores? While some tasks like compression or compilation will benefi from more cores, other tasks are still not parallelized. If running everything on RAM and having only output/images on a solid state drive. How much extra time we could save this way? I think it's better to see if storage is the problem, at least by analyzing %iowait stat or looking at htop CPU usage bars with "Detailed CPU Time" option enabled. Link to comment Share on other sites More sharing options...
Igor Posted September 13, 2016 Author Share Posted September 13, 2016 - o.k. will start clean recompilation right away. - yes, not within release. I should say "within few months" ... this is my pain only Hardware, yes I will do more research on this. Link to comment Share on other sites More sharing options...
tkaiser Posted September 13, 2016 Share Posted September 13, 2016 - o.k. will start clean recompilation right away. In case you have not started yet... sending the last task to the background by using a new sign_and_compress function would also speed up execution (since when 7z is busy then next single-threaded tasks already start). At this stage the image is already closed so it's save. And all that's needed would be an 'export BUILD_ALL=TRUE' in build-all.sh: http://pastebin.com/d2AScqxk diff --git a/debootstrap-ng.sh b/debootstrap-ng.sh index 256e273..056d7fb 100644 --- a/debootstrap-ng.sh +++ b/debootstrap-ng.sh @@ -456,20 +456,35 @@ create_image() # stage: write u-boot write_uboot $LOOP - cp $CACHEDIR/sdcard/etc/armbian.txt $CACHEDIR/ - # unmount /boot first, rootfs second, image file last sync [[ $BOOTSIZE != 0 ]] && umount -l $CACHEDIR/mount/boot [[ $ROOTFS_TYPE != nfs ]] && umount -l $CACHEDIR/mount losetup -d $LOOP - mv $CACHEDIR/tmprootfs.raw $CACHEDIR/${version}.raw - cd $CACHEDIR/ + if [ ${BUILD_ALL} = TRUE ]; then + TEMP_DIR="$(mktemp -d $CACHEDIR/${version}.XXXXXX)" + cp $CACHEDIR/sdcard/etc/armbian.txt "${TEMP_DIR}/" + mv "$CACHEDIR/tmprootfs.raw" "${TEMP_DIR}/${version}.raw" + cd "${TEMP_DIR}/" + nice -n 19 sign_and_compress & + else + cp $CACHEDIR/sdcard/etc/armbian.txt $CACHEDIR/ + mv $CACHEDIR/tmprootfs.raw $CACHEDIR/${version}.raw + cd $CACHEDIR/ + sign_and_compress + fi +} ############################################################################# +# sign_and_compress +# +# signs and compresses the image +# +sign_and_compress() +{ # stage: compressing or copying image file if [[ $COMPRESS_OUTPUTIMAGE != yes ]]; then - mv -f $CACHEDIR/${version}.raw $DEST/images/${version}.raw + mv -f ${version}.raw $DEST/images/${version}.raw display_alert "Done building" "$DEST/images/${version}.raw" "info" else display_alert "Signing and compressing" "Please wait!" "info" @@ -488,8 +503,12 @@ create_image() zip -FSq $filename ${version}.raw armbian.txt *.asc sha256sum fi rm -f ${version}.raw *.asc armbian.txt sha256sum - local filesize=$(ls -l --b=M $filename | cut -d " " -f5) - display_alert "Done building" "$filename [$filesize]" "info" + if [ ${BUILD_ALL} = TRUE ]; then + cd .. && rmdir "${TEMP_DIR}" + else + local filesize=$(ls -l --b=M $filename | cut -d " " -f5) + display_alert "Done building" "$filename [$filesize]" "info" + fi fi } ############################################################################# Edit: Using 'nice -n 19 sign_and_compress &' in BUILD_ALL=TRUE case would even be better of course. Link to comment Share on other sites More sharing options...
Igor Posted September 13, 2016 Author Share Posted September 13, 2016 Thanks. I am recompiling debs first ... I'll include this when building all. Link to comment Share on other sites More sharing options...
zador.blood.stained Posted September 13, 2016 Share Posted September 13, 2016 In case you have not started yet... sending the last task to the background by using a new sign_and_compress function would also speed up execution (since when 7z is busy then next single-threaded tasks already start). At this stage the image is already closed so it's save. And all that's needed would be an 'export BUILD_ALL=TRUE' in build-all.sh: http://pastebin.com/yLRzugcF ... Edit: Using 'nice -n 19 sign_and_compress &' in BUILD_ALL=TRUE case would even be better of course. Sending compression to background is a good idea, but I would prefer to refactor this first. In addition I think you can't nice bash functions, you need to nice either zip/7za process or do some magic with "declare -f" Link to comment Share on other sites More sharing options...
Recommended Posts