Nick Posted March 21, 2016 Posted March 21, 2016 Are clusters able to do anything useful yet? I'm vaguely aware of open stack and I know that googles server farms are full of consumer PCs all networked together and even people making RasPi clusters of varying sizes, but are they any good for normal uses? For example if I bought 10 OPi PC's and stuck them in a case with an Ethernet switch and a hard drive could I build Armbian really really quickly (typing make -j40 would be nice ;-) or would it ultimately just get bogged down and do nothing?
zador.blood.stained Posted March 22, 2016 Posted March 22, 2016 For example if I bought 10 OPi PC's and stuck them in a case with an Ethernet switch and a hard drive could I build Armbian really really quickly (typing make -j40 would be nice ;-) or would it ultimately just get bogged down and do nothing? For kernel/u-boot compilation you could use distcc. Debootstrap/image creation process doesn't look cluster-friendly, so having more than one device won't help here. And keep in mind that 10 OPi PC should build kernel faster than one OPi PC if you set up things properly, but it probably won't be faster than more or less modern x86/x64 PC, especially with SSD and ccache. Also keep in mind that you probably want to leave 1 core unused to process network and storage I/O interrupts. 1
tkaiser Posted March 22, 2016 Posted March 22, 2016 Are clusters able to do anything useful yet? Sure, with powerful nodes and when the task is 'cluster friendly'. As Zador already said: to build Armbian faster get a decent x86 box, use the right tools (ccache) and identify bottlenecks (I/O for some tasks --> SSD). Depending on what you do (kernel compilations or trying out a variety of new images) the bottlenecks might differ and also the areas to invest in (more CPU cores vs. faster SSD for example). ARM clusters might be nice to play with or to learn basic stuff using inexpensive nodes or to learn 'active benchmarking' (identifying/understanding bottlenecks, improving setup/software instead of throwing additional cluster nodes in to compensate a wrong setup). For real work they're too slow for most use cases (that might differ if you can make use of OpenCL and such stuff and use the powerful GPUs that are present even in some lowend SoCs for example i.MX6 -- then you get a cluster that has an excellent performance per watt ratio. But trying this with slow A7/A53 cores is just a waste of time/ressources or fooling yourself -- the latter being reason number 1 why people set up RPi clusters and such stuff)
Nick Posted March 22, 2016 Author Posted March 22, 2016 That's why I like posting here, within a few hours I've discovered two great new projects (new to me) distcc and OpenCL. It's unlikely that I would go down the Pi cluster root certainly not to the x10 route anyway, the post was in part inspired by another article I was read. I'm using debootstrap-ng so I am making use of ccache and to be honest, the uboot and kernel compilations are pretty quick on my desktop PC anyway despite being in a virtual machine. At the moment I musing over how I might replace my Armbian VM (the other inspiration for the post), the main problem I have right now isn't really compilation as such it's disk I/O. For space reasons the VM image is hosted on the NAS drive via a Gigabit Ethernet connection. My options are simply invest in a larger SSD for my desktop, which isn't a problem however it is only Armbian that is driving this requirement, replace the NAS drive or build a dedicated compilation box out of an old laptop I have kicking around. At some point in the future I'm going to replace the NAS drive anyway, as it's old and underpowered in lots of ways, the question is do I just buy something generic off the shelf that has decent performance, or do I try and build something. My existing NAS drive already performs extra features, for example it's hosting apt-cacher to speed up Armbian building (and target development) so I wondered about having it build Armbian as well. If I'm honest, I'm thinking of just buying a bigger SSD for my desktop and being done with it, but any thoughts would be appreciated.
zador.blood.stained Posted March 22, 2016 Posted March 22, 2016 I'm using debootstrap-ng so I am making use of ccache and to be honest, the uboot and kernel compilations are pretty quick on my desktop PC anyway despite being in a virtual machine. At the moment I musing over how I might replace my Armbian VM (the other inspiration for the post), the main problem I have right now isn't really compilation as such it's disk I/O. For space reasons the VM image is hosted on the NAS drive via a Gigabit Ethernet connection. ccache (compiler cache) does not depend on debootstrap version, while apt-cacher (packages cache) currently implemented only for ng. My options are simply invest in a larger SSD for my desktop, which isn't a problem however it is only Armbian that is driving this requirement, replace the NAS drive or build a dedicated compilation box out of an old laptop I have kicking around. At some point in the future I'm going to replace the NAS drive anyway, as it's old and underpowered in lots of ways, the question is do I just buy something generic off the shelf that has decent performance, or do I try and build something. If I'm honest, I'm thinking of just buying a bigger SSD for my desktop and being done with it, but any thoughts would be appreciated. Buying an SSD and installing Ubuntu on it natively (for dual-boot) may give the best performance you can achieve without separate build host. As for laptop - it would probably lack raw CPU power and may simply overheat - effective cooling and laptops are rarely combined in a proper way.
tkaiser Posted March 22, 2016 Posted March 22, 2016 Regarding stuff on a NAS: If it's a bit more intelligent NAS (I hope you use FreeNAS or something and not proprietary bullsh*t? ) then switching to SAN mode might speed up things like an Armbian build a lot. NAS = SMB/NFS for example SAN = iSCSI for example (the build host just uses the remote share/LUN as a block device with a local filesystem on top which makes a difference regarding use of FS caches/buffers that can speed up a few operations 100 times or even more) Switching from NAS to SAN not only increases random I/O (or let's better say frequent write/read operations on a large bunch of small files) a lot but also helps if the network connection is the bottleneck by using transparent filesystem compression. If you put an uncompressed desktop Armbian image on a NAS (~2GB) and you've a GbE connection only then sequential read/write transfer speed won't exceed ~100MB/s. If you do the same with an iSCSI LUN and btrfs with compression=zlib on top sequential transfer speeds will be 2.5 - 3 times faster. And in case both NAS and build host have 2 NICs you might be able to double the speed again by using I/O multipathing (not possible with NAS and link aggregration). So as usual: 'work smarter not harder' is the most important factor (but using a fast SSD will do even better ) 1
Nick Posted March 22, 2016 Author Posted March 22, 2016 ccache (compiler cache) does not depend on debootstrap version, while apt-cacher (packages cache) currently implemented only for ng. Good to know, though for me personally I haven't had a need to go back to debootstrap, the ng version works perfectly for me. Buying an SSD and installing Ubuntu on it natively (for dual-boot) may give the best performance you can achieve without separate build host. As for laptop - it would probably lack raw CPU power and may simply overheat - effective cooling and laptops are rarely combined in a proper way. Quite possibly, I did just run Armbian natively on my desktop at first, but then there were issues such as /tmp permissions and simply the fact that I was running alot of code as root that made me nervous. So I set up the VM. As for the laptop, I would strip it down to just the motherboard and stick it in an old 19" rack mount case so cooling could be addressed, but then there is still the problem of yet another box running 24/7 or having to come up with some sort of power switch / wake on lan etc. Sadly my motherboard has issues with 2 hard drives, it insists on booting from the wrong one regardless of settings I think I'll just get another SSD, or maybe a whole new PC
zador.blood.stained Posted March 22, 2016 Posted March 22, 2016 Regarding stuff on a NAS: If it's a bit more intelligent NAS (I hope you use FreeNAS or something and not proprietary bullsh*t? ) then switching to SAN mode might speed up things like an Armbian build a lot. I believe he has some kind of Synology Netgear NAS, it should support iSCSI but it probably has only one LAN port.
Nick Posted March 22, 2016 Author Posted March 22, 2016 @tkaiser Yes it is proprietary, it's a Netgear Ready NAS Duo (the Arm V7 version). It was second hand from eBay when I bought it. Though as a simple NAS box in the strictest meaning of the words it's not bad at all. It is also very open, with full root SSH access etc. TBH If someone asked me to recommend a cheap NAS box for home or office I would be happy to recommend it for general day to day use. Now onto more unusual uses such as compiling Armbian etc. Regarding stuff on a NAS: If it's a bit more intelligent NAS (I hope you use FreeNAS or something and not proprietary bullsh*t? ) then switching to SAN mode might speed up things like an Armbian build a lot. NAS = SMB/NFS for example SAN = iSCSI for example (the build host just uses the remote share/LUN as a block device with a local filesystem on top which makes a difference regarding use of FS caches/buffers that can speed up a few operations 100 times or even more) That I didn't know about... FreeNAS looks interesting, I'm guessing it needs a little more than a H3 though I'll certainly do some more research into iSCSI, it looks like the Ready NAS supports it as well, so I might have a play with that first to get my head around it. Edit @zador.blood.stained yes only 1 lan port, but then so does my PC at the moment...
tkaiser Posted March 22, 2016 Posted March 22, 2016 I believe he has some kind of Synology Netgear NAS, it should support iSCSI but it probably has only one LAN port. Anyway, as long as it supports iSCSI reliably a superiour way to combine a VM (hosted on Windows, Linux or OS X) with iSCSI storage is define a local datastore to be used for /boot and / assign the VM as much CPU cores as possible install Ubuntu Xenial there setup iSCSI from within the Ubuntu VM create a btrfs FS on the iSCSI LUN with compression=zlib move the Armbian build environment to this place This might speed things up a lot since in situations where I/O has been the bottleneck before now local filesystem semantics improve random I/O and sequential transfer speeds explode since CPU cores jump in by compressing the data on the fly. Since the different steps in an Armbian build are either CPU or I/O intensive you won't loose performance when CPU cycles are 'wasted' improving sequential transfers between SAN and VM. 1
Nick Posted March 22, 2016 Author Posted March 22, 2016 Thank you very much, I'll try that with the ReadyNAS tonight and let you know how I get on
tkaiser Posted March 22, 2016 Posted March 22, 2016 Thank you very much, I'll try that with the ReadyNAS tonight and let you know how I get on Two remarks: The 'Use Xenial as build host' recommendation is necessary to be able to use btrfs (since Xenial uses kernel 4.4.x and btrfs code lives inside the kernel so you don't want to use btrfs on anything with kernel version less than 4.x!) but a bit problematic since Xenial is still labeled experimental and you get GCC 5.3 there which might break some builds but not the ones you're interested in (H3 -- all my builds the last weeks were done on Xenial with GCC 5.2/5.3) The main difference between NAS and SAN is the use case: You use NAS if you want to share files between a couple of different hosts/users. That's what NAS protocols like SMB, NFS, AFP and so on are designed for (includes locking semantics, on-the-fly conversion between different encodings and other stuff). All this is not needed when it's about using one single 'share' by one host. This is where even the most primitve SAN implementations can shine. With iSCSI an amount of storage you defined as LUN on your filer will be used exclusevily by your build host (be it physical or virtual). As a block device so the filesystem in question (and all the local caching/locking) happens solely on your host and the filer isn't involved in many cases at all since Linux is quite good at caching and preventing actual writing to disk --> in your case a slow network transfer). This alone is more effective compared to the VM image being shared by NAS protocols. But the most interesting part is to be able to use transparent filesystem compression since with this you overcome the GbE limitation easily (exchanging slower transfer speeds with higher CPU utilization). In most cases it's more of a 'cheap NAS box' limitation since these devices normally aren't able to saturate the link. If your NAS box is able to reach 60 MB/s by using btrfs with compression=zlib on top of the iSCSI LUN you're able to improve that to 150 MB/s or above (depends on the data sent over the wire which in case of Armbian builds is highly compressible)
Nick Posted March 22, 2016 Author Posted March 22, 2016 The 'Use Xenial as build host' recommendation is necessary Noted, Xenial is downloading now (2Mbps ADSl means it takes a while ) The main difference between NAS and SAN is the use case: You use NAS if you want to share files between a couple of different hosts/users. That's what NAS protocols like SMB, NFS, AFP and so on are designed for (includes locking semantics, on-the-fly conversion between different encodings and other stuff). All this is not needed when it's about using one single 'share' by one host. This is where even the most primitve SAN implementations can shine. Sharing with one host is fine for the moment at least, especially if I can get sub 10sec on the Armbian build timer
Nick Posted March 22, 2016 Author Posted March 22, 2016 Anyway, as long as it supports iSCSI reliably a superiour way to combine a VM (hosted on Windows, Linux or OS X) with iSCSI storage is define a local datastore to be used for /boot and / Out of interest, how much does Xenial need for boot and root, is 1 or 2BG enough?
tkaiser Posted March 22, 2016 Posted March 22, 2016 Out of interest, how much does Xenial need for boot and root, is 1 or 2BG enough? No idea, I chose the desktop image to be able to compare 'desktop usage' between a fast PC and the SBCs we're dealing with. And there it's more like 'use 10 GB at least': root@armbian:/# df -h | grep "/$" /dev/mapper/ubuntu--vg-root 94G 27G 63G 30% / root@armbian:/# du -sh /var/git 17G /var/git root@armbian:/# du -sh /usr/local 2,7G /usr/local
tkaiser Posted March 22, 2016 Posted March 22, 2016 On topic: The just announced NanoPi M1 (H3 based) for $11 would be a perfect candidate for clustering. Since the PCB is rather small with 50x64mm and since shipping for a single device costs $10 only bulk orders would make some sense. But more importantly the PCB layout would allow to design a cluster tube with an internal diameter of approx. 80 mm and 7 NanoPi's in each row overlapping so that Ethernet and the 2 USB ports are accessible from the outside and you can design a cooling solution with one strong but silent 80 mm fan on the tube's bottom. Each row of 8 NanoPi's each has a height of 68mm due to the Micro and the single USB receptacles blocking each other so you would end up with a cluster containing of 28 nodes with 4 rows at a height of 30cm. With heatsinks and optimised airflow all nodes would stay cool and the cluster would be almost silent (the combination of these 2 features would separate this cluster design from more common ones that all show wasted space and a broken thermal design, eg. Picocluster). Now add a 150W PSU, a 32 port switch, a huge amount of cabling, 28 small SD cards for bootloader/kernel and one GbE capable Banana Pi as cluster master node (rootfs on NFS), put all this in a nice tube enclosure with a handle on top ("cluster to go [tm]")... just to realize that this design will be outperformed by a multi core i7 solution easily
Nick Posted March 22, 2016 Author Posted March 22, 2016 They look nice, sadly they are more expensive to get to the UK than Orange Pi PCs and I couldn't confirm if that was the 512 or 1GB version either That said, I've run out of OPi PCs at the moment, I'm waiting for more to come from China. I do like your cluster to go idea though, especially the pipe with a big fan at one end I think I'll test out iSCSI first though ;-)
tkaiser Posted March 22, 2016 Posted March 22, 2016 I also like these cluster ideas and start to scribble a design from time to time (just to throw it away later since the whole idea is a bit... useless). You could build nice boxes 30x30x30cm in size using this 'thermal tube' design but you would waste so much space with cabling and for a contained desktop switch that it's just moronic. If you want to learn cluster stuff then 3 nodes are already enough. And if you really want to pack a huge amount of ARM cores together then not using SBCs (cabling problem) but either SoMs on suitable base boards (containing Ethernet switch ICs and only 1, 2 or 3 GBit connections to the outside) or using even better interconnects that do not miss fast network/storage too. Then you get 384 fast Cortex-A57 ARMv8 cores on 2 rack units for example. This low-end clustering approach would only be useful for specific use cases IMO (eg. using batteries, solar cells and passive cooling for example)
Nick Posted March 22, 2016 Author Posted March 22, 2016 I think you are right about the throwing them away as useless unless you have a particular application in mind. It's a nice idea, but like most things reality gets in the way ;-) I have iSCSI running, 42 mins for the first compile (not including downloading of source code), 6 minutes for a subsequent build. I'm pretty sure cached builds were taking around 19 minutes over NFS, so a huge improvement thank you. Not really for here, but it came up during the clean build of Armbian, uboot fails to build unless you change line 238 of configuration.sh from BOOTCONFIG="orangepi_h3_defconfig" to BOOTCONFIG="orangepi_pc_defconfig".
tkaiser Posted March 22, 2016 Posted March 22, 2016 6 minutes for a subsequent build. I'm pretty sure cached builds were taking around 19 minutes over NFS [...] uboot fails to build unless you change line 238 of configuration.sh Hmm did this patch fail? Since otherwise it's a mystery to me. And 6 min. vs. 19 isn't that amazing. Just 3 times faster. When we start to optimise server/storage stuff at customers I'm only satisfied if we exceed the 500%
Nick Posted March 22, 2016 Author Posted March 22, 2016 Hmm did this patch fail? Since otherwise it's a mystery to me. And 6 min. vs. 19 isn't that amazing. Just 3 times faster. When we start to optimise server/storage stuff at customers I'm only satisfied if we exceed the 500% That patch isn't being included, sorry I should have said that I'm building dev not default always helps when you have all of the information 500% is great, but I'll take anything right now. Of course, feel free to come over and tweak it for me, I'll PM you my address
Recommended Posts