squid4t2

Members

View Profile See their activity

Posts
2
Joined
April 10, 2017
Last visited
December 28, 2021

Reputation Activity

squid4t2 reacted to tkaiser in Some storage benchmarks on SBCs June 23, 2017

Now a real world example showing where you end up with the usual 'benchmarking gone wrong' approach. Imagine you need to set up a web server for static contents only. 100 GB of pure text files (not that realistic but just to show you how important it is to look closer). The server will be behind a leased line limited to 100 Mbits/sec. Which SBC to choose?

The usual benchmark approaches tell you to measure sequential transfer speeds and nothing else (which is OK for streaming use cases when DVD images several GB each in size are provided but which is absolutely useless when we're talking about accessing 100 GB of small files in a random fashion -- the 'web server use case'). Then the usual benchmark tells you to measure throughput with iperf (web serving small files is about latency, that's quite the opposite) and some silly moronic stuff to measure how fast your web server is using the loopback interface of your server and test tool on the same machine not testing network at all (how does that translate to any real world web server usage? Exactly: not at all).

If we rely on the passive benchmarking numbers and have in mind that we have to serve 100 GB at a reasonable cost we end up thinking about an externally connected HDD and a board with GbE (since iperf numbers look many times faster than Fast Ethernet) and a board that shows the highest page request numbers testing on the local machine (when the whole 'benchmark' turns into a multi-threaded CPU test and has nothing to do with web serving at all). Please don't laugh, but that's how usual SBC comparisons deal with this.

So you choose from the list above as storage implementation an external 500 GB HDD since USB performance looks ok-ish with all boards (+30 MB/s), and NanoPi M3 since iperf numbers look nice (GbE) and most importantly it will perform the best on the loopback interface since it has the most and the fastest CPU cores.

This way you end up with a really slow implementation since accessing files is more or less only random IO. The usual 2.5" notebook HDD you use on the USB port achieves less than 100 IOPS (see above result for USB HDD on Banana Pro with UASP incapable enclosure). By looking at iperf performance on the GbE interface you also overlooked that your web server is bottlenecked by the leased line to 100 Mbits/sec anyway.

What to do? Use HTTP transport stream compression since text documents show a compression ratio of more than 1:3, many even 1:10, (every modern web server and most browsers support this). With this activated NanoPi now reads the text documents from disk and compresses it on the fly and based on a 1:3 compression ratio we can stream 300 Mbits/sec through our 100 Mbits/sec line. Initially accessing files is still slow as hell (lowest random IO performance possible by choosing USB HDD) but at least once the file has been read from disk it can saturate the leased line.

So relying on passive benchmarking we chose a combination of devices (NanoPi M3 + 500 GB HDD) that costs +100$ considering also shipping/taxes and is slow as hell for the use case in question.

If we stop relying on passive benchmarking, really look at our use case and switch on our brain we can not only save a lots of money but also improve performance by magnitudes. With an active benchmarking approach we identify the bottlenecks first:
Leased line with 100 Mbits/sec only: we need to use HTTP content-stream compression to overcome this limitation Random access to many files: we need to take care of random IO more than sequential transfer speeds We need to tune our network settings to make the most out of the sitiuation. Being able to use the most recent kernel version is important! We're on a SBC and have to take care of CPU ressources: so we use a web server with minimum ressources and should find a way to avoid reading uncompressed contents from disk to immediately compress it on the fly since this wastes CPU ressources So let's take an approach that would look horribly slow in the usual benchmarks but improves performance a lot: An Orange Pi One together with a Samsung EVO 64 GB as hardware, mainline kernel + btrfs + nginx + gzip_static configuration. Why and how does this work?
Orange Pi One has only Fast Ethernet and not GbE. Does this matter? Nope, since our leased line is limited to 100 Mbits/sec anyway we know that the cheap EVO/EVO+ with 32/64 GB perform excellent when it's about random reads. At 4K we get 875 IOPS (3500 KB/s, see comparison of results), that's 8 times faster than using an external USB HDD we use pre-compressed contents: that means a cron job compresses each and every of our static files and creates a compressed version with .gz suffix, if nginx communicates with browsers capable of that it delivers the already compressed contents directly (no CPU cylces wasted, if we configure nginx with sendfile option not even time in userspace wasted since the kernel shoves the file directly to the network interface!). Combine the sequential read limitation of SD cards on most boards (~23MB/s) with an 1:3 compression ratio and you end up at ~70MB/s with this trick. Twice as fast as uncompressed contents on an USB disk unfortunately we would also need the uncompressed data on disk since some browsers (behind proxies) do not support content compression. How to deal with that? Using mainline kernel, btrfs and btrfs' own transparent file compression. So the 'uncompressed' files are also compressed but at a lower layer and while we now have each and every file twice on disk (SD card in fact) we only need 50 GB storage capacity for 100 GB original contents based on an 1:3 compression ratio. The increase in sequential read performance is still twice as fast since decompression happens on the fly. Not directly related to the filesystem but by tweaking network settings for low latency and many concurrent connections we might be able to improve requests per seconds when many clients access in parallel also by factor 2 compared to an old smelly Android 3.x kernel we still have to use on many SBC (relationship with storage: If we do tune network settings this way we need storage with high IOPS even more) An Orange Pi One together with an EVO 64GB costs a fraction of NanoPi M3 + USB HDD, consumes nearly nothing while being magnitudes faster for the 'static files web server' use case if set up correctly. While the usual moronic benchmarks testing CPU horsepower, GbE throughput and sequential speeds would show exactly the opposite.

And you get this reduction in costs and this increase in performance just by stopping to believe in all these 'benchmarking gone wrong' numbers spread everywhere and switching to active benchmarking: testing the stuff that really matters, checking how that correlates with reality (your use case and the average workload) and then setting up things the right way.

Final note: Of course an Orange Pi One is not the perfect web server due to low amount of DRAM. The best way to overcome slow storage is to avoid access to it. As soon as files are in Linux' filesystem cache the speed of the storage implementation doesn't matter any more.

So having our web server use case in mind: If we do further active benchmarking and identify a set of files that are accessed most frequently we could add another Orange Pi One and a Pine64+ with 2GB. The new OPi One acts as load balancer and SSL accelerator for the second OPi One, the Pine64+ does SSL encryption on his own and holds the most frequently accessed 1.7 GB in RAM ('grep -r foobar /var/www' at startup in the background -- please keep in mind that it's still +5 GB in reality if we're talking about a 1:3 compression ratio. Simply by switching on our brain we get 5GB contents cached in memory on a device that features only 2 GB physical RAM!). And the best: both new boards do not even need local storage since they can be FEL booted from our first OPi One.
squid4t2 got a reaction from Marcelm2000 in Need help for GPIO with BananaPro Armbian Mainline May 14, 2017

Hi Marcelm, glad to be of help.

I found sorting GPIO pins / inputs / outputs quite a challenge to start with as Odroid, Pine and Raspi etc all had / have different values.

All very confusing ....
squid4t2 got a reaction from Marcelm2000 in Need help for GPIO with BananaPro Armbian Mainline May 14, 2017

Hi Marcelm, try your calculation again .. I think you may find it works out to 275.

I hope this sorts the problem for you.

8 x 32 plus 19

Sign In

squid4t2

Posts

Joined

Last visited

Reputation Activity

Forums

My Activity Streams

Download

Store

Important Information