Jump to content

James Kingdon

Members
  • Posts

    142
  • Joined

  • Last visited

Everything posted by James Kingdon

  1. Heh, I love looking at other people's workbenches, seeing all the cool toys and the minor oddities we all keep. Look, a pile of floppies and a torroidal transformer. My eye is drawn to the line of partly glimpsed objects at the top of the frame. My inner 5 year old would never have been able to resist the temptation to see what was in every box... My own workbenches are just wildly embarrassing. I need to tidy. No, I need to excavate.
  2. Yes, you'd need the serial port hooked up. I've never managed to get uboot into interactive mode so I don't know how it works in practice after that.
  3. Can you get into uboot and mount the eMMC as usb storage? I would expect that to happen before the dtb is involved, so if you can see the filesystem where the dtb is stored via uboot then you should have a means to replace it if the new one doesn't boot. (but you're way ahead of may in practical terms, so don't take my word for it!)
  4. pc+ 2e is nice, but the processor is slow. NanoPi M3 is fast, but only has 1G ram and no eMMC. Geekbox is the only board I currently have running a 64 bit kernel, but the SOC is not as fast as the M3 and the geekbox kernel is barely functional. XU4 is a really solid board but relatively expensive for the performance and runs hot (I need to have another go at the heatsink/fan combination for that board). I like them all, I guess if I had to pick a favorite right now it would be the NanoPi M3, especially if we can get a 64bit kernel & userspace up and running, but none of the boards I have currently tick all my boxes for fast processor, min 256MB ram per core, gigabit ethernet and eMMC.
  5. No, I don't think so, it not taking floating point arguments for one thing. I think it's a helper function for something not readily expressible in the arm 32 bit instruction set, in this case 64 bit integer divide with remainder.
  6. That's great news, I'll have to give that a go. Let's hope we can figure out how to get the fan working.
  7. Yes, the 32bit version is armhf. I don't have any boards set up to run soft float.
  8. I was curious about the large difference in performance on this test, so I took a look. The code is using 64 bit longs and double precision floating point, and the majority of the time is in a small loop, so it's a candidate for a higher than average delta, but I would still have expected something closer to 2x than 10x or more. Here's the key part: unsigned long long c; unsigned long long l; double t; unsigned long long n=0; for(c=3; c < max_prime; c++) { t = sqrt((double)c); for(l = 2; l <= t; l++) if (c % l == 0) break; if (l > t ) n++; } The 64 bit code looks fairly clean, here's the inner loop: f4: 91000442 add x2, x2, #0x1 // l++ <--- top of l loop f8: 9e630042 ucvtf d2, x2 // convert l to float fc: 1e622010 fcmpe d0, d2 // compare with t 100: 54fffe4b b.lt c8 <cpu_execute_event+0x28> // jump to top of c loop ^^^ 104: 9ac20a61 udiv x1, x19, x2 // c / l -> x1 108: 9b02cc23 msub x3, x1, x2, x19 // x1 * x2 - x19, i.e. c - (c/l)*l 10c: b5ffff43 cbnz x3, f4 <cpu_execute_event+0x54> // branch to top of l loop ^^^ That's nice and straight-forward, so what's going on with the 32 bit version? (unnecessary detail, the following was compiled for arm instructions where as the default is usually thumb. It doesn't make much difference to the performance or the analysis, I'm just more used to reading 32 bit arm than thumb) 118: e2944001 adds r4, r4, #1 // l++ <-------- top of l loop 11c: e2a55000 adc r5, r5, #0 // with overflow 120: e1a00004 mov r0, r4 // copy l into r0/1 124: e1a01005 mov r1, r5 128: ebfffffe bl 0 <__aeabi_ul2d> // and call helper to convert to double (ouch again) 12c: e1a02004 mov r2, r4 // copy l into r2/3 130: e1a03005 mov r3, r5 134: ec410b17 vmov d7, r0, r1 // double(l) -> d7 138: e1a00006 mov r0, r6 // copy c into r0/1 13c: e1a01007 mov r1, r7 140: eeb48bc7 vcmpe.f64 d8, d7 // l <= t 144: eef1fa10 vmrs APSR_nzcv, fpscr 148: baffffdc blt c0 <cpu_execute_event+0x2c> // branch to top of c loop ^^^ 14c: ebfffffe bl 0 <__aeabi_uldivmod> // call library for c % l 150: e1923003 orrs r3, r2, r3 // look for any non-zero bit in the r2/3 remainder 154: 1affffef bne 118 <cpu_execute_event+0x84> // branch to top of l loop ^^^ Obviously, not so nice. The 64 bit longs have to be carried in pairs of registers, e.g. l is r4 and r5, so a simple l++ takes two instructions to cover both the low and high words. That's where the factor of 2x I was expecting comes from. But the pain comes in when we have to convert a long to a double. No ucvtf instruction available here, so we have to make a helper call to __aeabi_ul2d. And that means we have to copy l (in r4 and r5) into the parameter registers r0 and r1, make the call and then move the result from r0/r1 into d7 where we want it. The same problem arises when we want to compute c % l, without an instruction available we have make a helper call, which this time means copying 4 registers into the param regs before making the call. It's having to make these calls to helper methods that explains the large performance delta. This is rather a neat example of why 64 bit on arm is about more than addressing a large memory space. As well as twice as many registers each of twice the size, you also get the newer instruction set which has been optimized for more modern work-loads. It's still going to be rare to see this big a delta, but 10 to 40% probably isn't unusual on cpu intensive applications. Just for fun, results from my (sadly still 32bit) m3. I grabbed sysbench from github, so I had to change the command line a bit, but hopefully this is comparable to previous results: ./sysbench cpu run --cpu-max-prime=20000 --threads=8 --events=10000 --time=100 CPU speed: events per second: 197.90 General statistics: total time: 50.5295s total number of events: 10000 Temperature seems to be stable at 52C during max-prime=200000 (it's probably about 18 or 19C in the basement today) 1.60 GHz ???V 52.0C fan n/a 1.60 GHz ???V 52.0C fan n/a 1.60 GHz ???V 52.0C fan n/a 1.60 GHz ???V 52.0C fan n/a (readings at 10s intervals, with a couple of minutes worth of the same value before that)
  9. I adjusted the memory reservation down to 16M, and things are looking good so far. KiB Mem: 1011008 total, 142224 used, 868784 free, 14104 buffers KiB Swap: 524284 total, 0 used, 524284 free. 73724 cached Mem
  10. I did a little more with the board last night. The biggest problem for my use case is the 1G ram, so I was a bit disturbed to find that 150M is reserved for the GPU. I tried disabling ION and CMA but couldn't get it to compile, so as a fallback I cut the size of the reserved block. The value is in the config as CONFIG_ION_NXP_CONTIGHEAP_SIZE. I've reduced to 64M which at least boots and runs, but without any hdmi output. X11 crashes if you try and start it (and for some reason lightdm goes into a busy loop for a couple of minutes until the whole graphics stack shuts down), so I configured the board to start-up in text mode. Since I'm not getting anything out of the hdmi port now I plan on reducing the reserved space further until I find the minimum needed to start up, but this is already looking a lot better than default: KiB Mem: 961856 total, 205952 used, 755904 free, 16952 buffers I didn't like the look of the factory HSF, so I added my normal set of heat sinks and a 40mm 12v fan. That brought the idle temperature down from about 50C to 31C and kept the full (non-simd, non-gpu) load temperature in the low 50s. Feeling pleased with that I thought I'd try enabling the higher frequencies by editing the dfs_freq_table in arch/arm/plat-s5p6818/nanopi3/device.c to match the values in s5p6818-cpufreq.h. That gave me a nice stable 1.6GHz with only a small increase in load temperature (again, this may not apply to all boards or if you are loading the gpu/neon extensions). This from the end of a short (2 minute) run: 1.60 GHz ???V 50.0C fan n/a 1.60 GHz ???V 51.0C fan n/a 1.60 GHz ???V 51.0C fan n/a 1.60 GHz ???V 51.0C fan n/a 400 MHz ???V 44.0C fan n/a 400 MHz ???V 41.0C fan n/a 400 MHz ???V 40.0C fan n/a I haven't figured out where to read the voltage from on this board yet - none of the regulators have an obvious name for vdd_arm, and the fan is running continuously, so not hooked up to a gpio at the moment. That might be this weekends job, depending on the weather.
  11. That's probably the artik 710 firmware mentioned in https://github.com/friendlyarm/linux-3.4.y/issues/3
  12. I got one of these running last night, and it just edges out the XU4 for absolute throughput on my workload, and takes a fairly clear lead on perf/$. The kernel comes without cifs support, so I had to compile up cifs and md4 as kernel modules before I could mount my filespace. The board I received was marked 1610, had the fan header populated (and labelled), had the socket for an external wifi antenna, and came with one supplied in the box, so it looks like FA are listening to feedback. I really hope that a 64 bit kernel is possible on this board, as that would add to its usefulness as a test mule. The board is so small it's tempting to do a little farm box with somewhere between 8 and 16 of them stacked together just for the sheer cuteness factor.
  13. Yes, you can extract the dtb from there with imgRePackerRK_106 and then convert it to dts with dtc. I had a quick look at it, it has the expected config for the fan, but the dvfs settings didn't seem to match what I'd expect, topping out at 1GHz and with constant voltage. Maybe I was misreading the format.
  14. It comes apart very easily, just work your way around the top which is a simple clip-in lid, no screws to worry about. Use a plastic spudger if you're worried about marking the surface. I'll likely ditch the case so I just used a small screw driver. It would be awesome if we can get a mainline kernel running.
  15. One more observation, after a reboot, performance is restored. Reported temperatures are higher and seem consistent with previous measurements at the heatsink (given a reasonable allowance for the likely temperature gradient between the on-die sensor and the external temp). Output from the second of two back to back runs (hence the high starting temp): 216 MHz 0.900V 60.0C 1.80 GHz 1.350V 62.0C 1.80 GHz 1.350V 61.0C 1.80 GHz 1.350V 65.0C 1.80 GHz 1.350V 65.0C 1.80 GHz 1.350V 65.0C 1.80 GHz 1.350V 66.0C 1.80 GHz 1.350V 68.0C 1.80 GHz 1.350V 67.0C 1.80 GHz 1.350V 70.0C 1.80 GHz 1.350V 68.0C 1.80 GHz 1.350V 69.0C 1.80 GHz 1.350V 69.0C 1.80 GHz 1.350V 65.0C 216 MHz 0.900V 62.0C 126 MHz 0.900V 60.0C Script for monitoring: #!/usr/bin/python # show cpu freq and temperature at specified interval # cpumon <interval seconds> # default interval 30s import subprocess import sys import time delay=30 if (len(sys.argv)>1): delay=float(sys.argv[1]) while 1: freq=subprocess.check_output(['cpufreq-info', '-mf', '-c0']) temp=subprocess.check_output(['cat', '/sys/devices/fff16000.ug_fan/cpu_temp']) temp=float(temp)/1 volts=subprocess.check_output(['cat', '/sys/class/regulator/regulator.13/microvolts']) volts=float(volts)/1000000 print "{:8} {:.3f}V {:.1f}C".format(freq.rstrip(), volts, temp) time.sleep(delay)
  16. My UT3S arrived yesterday too, also without the A-A usb cable. I was impressed that Linux worked out of the box, although you only get 4G of the eMMC to work with. Another 9G can be mounted, but it's formatted FAT which isn't ideal. The hardware seems good - there's a generous heatsink and the low-profile fan is both quiet and effective. There's a battery fitted for the RTC and three holes that looks suspiciously like a serial header. From the limited testing so far, performance of the RK3288 at 1.8GHz is excellent, about 20% faster than the 4 A15 cores of the XU4 running at 2GHz (although the XU4 does tend to throttle slightly, so probably averages about 1.95GHz over the run). At the sale price it takes the current lead in the performance per $ ranking. Sure wish it ran Armbian though Follow-up: There's something odd going on with the performance. Twice now I've come back to the machine and found it to be running at roughly 1/3 the peak throughput. 'top' still shows good cpu utilization across the cores, there are no other obvious processes competing with the test, cpu-freq shows 1.8GHz and I'm using the performance governor during testing, so something is lying to me. I must admit I don't have a great deal of confidence in this kernel. (the missing factor in the above is temperature. I don't have a lot of confidence in the temperature being reported yet - I need to try and correlate it with what I can measure at the heatsink for both a good and a bad run. But on every other processor I've tested you can see when you hit thermal limiting by the reduction in frequency reported by cpufreq-info, and I'm not seeing that here. Also, there's the fan on this processor, so it shouldn't hit thermal limit, and a factor of three would be an unusually aggressive down-clocking! )
  17. I found this bug report where the poster describes having had a running 4.4.52 kernel on the Q8 (with a board version that matches the silk screen on my recently purchased Q8). He also states that the kernel at https://github.com/Miouyouyou/MyyQi works on the board, so it seems there is still cause for optimism. I hope to give this a try when I get time, but that may not be for a while, so I thought I'd pass the info on rather than just sit on it.
  18. Hi @Igor I'm curious where the info/photos came from - did you get a pre-release board? I'd love to get my hands on one (or more!) of these for testing.
  19. I work on a compiler development team and I thought it would be fun to have my own system at home. We have a large suite of regression tests that take quite a while to run on ARM, so the farm will do distributed build and test.
  20. There is no "too far" FWIW, I keep looking at a pile of peltier coolers I bought for another project...
  21. It is, and so far the XU4 is the workhorse in my (little) farm, so I'd rate it as pretty good value. It's weakness is the heat it puts out. I have one of those large 40mm northbridge heatsink and fan units (https://www.aliexpress.com/item/1pcs-40mm-x-10mm-Cooling-Fan-Heatsink-DIY-Northbridge-Cooler-South-North-Bridge-Radiator-for-PC/32432181804.html) on mine, and I still can't quite stop it from throttling at full load. (Admittedly the fan I put on it isn't moving a whole lot of air, so there's room for improvement). The other issue with the XU4 is that the eMMC is extra (and worth it), and shipping to Canada isn't cheap, so the total cost is a bit higher than I'd want to spend for the rest of the compute nodes. If I can hit $50 CDN/node I'll be pretty happy, where as the XU4 came out to about $140 all in (with a 16G emmc (not 32G like I first said - lousy memory)).
  22. That's awesome, thank you! I've looked at the Ugoos units several times but they're normally well over $100 which puts them into competition with 8 core boxes like the Xu4 and geekbox. I've ordered an UT3S, it will be interesting to compare with the Q8. Where did you find out about the coupon?
  23. Many thanks for the additional info, that was enough to get me in the right place. With 1.54GHz enabled my workload runs in the low fifties centigrade without the fan fitted. I'd expect to take another 5 to 10C off that when the fan is in place. So far no signs of instability but if I run into any problems I'll back the changes off. I'd have to rate the board as very impressive for thermal management, certainly compared to the odroid XU4 which pays for the speed with a whole lot of heat.
  24. Thanks for the reply. Sorry if I didn't make my question clear enough, I read the thread and attempted to follow what I could see but was unsuccessful. Hence the question asking for further guidance. And yes, I plan on using active cooling and monitoring temperatures during runs. At the moment my particular workloads only raise the cpu temp to about 45C and I'm comfortable with running the board hotter than that. I checked that /boot/script.bin was linked to bin/orangepipcplus.bin and ran bin2fex on it, but the dvfs_table in there doesn't seem to match the frequencies listed by cpufreq-info, so I'm a bit confused as to how this works. It looks like the fex is being modified or over-ridden by the patch you referenced a few posts up. How does that work?
  25. Hi, I have a slightly unusual use case, building a compile and test farm with a variety of different socs so that I can easily check for regressions. All the boards will have heatsinks and active cooling so I have more thermal budget than may be typical, and I'd like to run at the higher clockspeeds to push the tests through more quickly. I've just added an Orange pi pc+ and installed Armbian_5.25_Orangepipcplus_Ubuntu_xenial_default_3.4.113 onto the emmc (which went very smoothly - many thanks as always for the great work you do). cpufreq-info shows steps upto 1.54 GHz, but hardware limits between 480 MHz and 1.30 GHz. analyzing CPU 0: driver: cpufreq-sunxi CPUs which run at the same hardware frequency: 0 1 2 3 CPUs which need to have their frequency coordinated by software: 0 1 2 3 maximum transition latency: 2.00 ms. hardware limits: 480 MHz - 1.30 GHz available frequency steps: 60.0 MHz, 120 MHz, 240 MHz, 312 MHz, 408 MHz, 480 MHz, 504 MHz, 528 MHz, 576 MHz, 600 MHz, 624 MHz, 648 MHz, 672 MHz, 720 MHz, 768 MHz, 816 MHz, 864 MHz, 912 MHz, 960 MHz, 1.01 GHz, 1.06 GHz, 1.10 GHz, 1.15 GHz, 1.20 GHz, 1.25 GHz, 1.30 GHz, 1.34 GHz, 1.44 GHz, 1.54 GHz available cpufreq governors: interactive, conservative, ondemand, powersave, userspace, performance current policy: frequency should be within 624 MHz and 1.30 GHz. The governor "interactive" may decide which speed to use within this range. current CPU frequency is 624 MHz. cpufreq stats: 60.0 MHz:0.00%, 120 MHz:0.00%, 240 MHz:0.00%, 312 MHz:0.00%, 408 MHz:0.00%, 480 MHz:0.00%, 504 MHz:0.00%, 528 MHz:0.00%, 576 MHz:0.00%, 600 MHz:0.00%, 624 MHz:50.79%, 648 MHz:0.00%, 672 MHz:0.04%, 720 MHz:0.00%, 768 MHz:0.00%, 816 MHz:0.00%, 864 MHz:0.00%, 912 MHz:0.00%, 960 MHz:0.00%, 1.01 GHz:44.60%, 1.06 GHz:0.21%, 1.10 GHz:0.21%, 1.15 GHz:1.77%, 1.20 GHz:0.42%, 1.25 GHz:0.38%, 1.30 GHz:1.58%, 1.34 GHz:0.00%, 1.44 GHz:0.00%, 1.54 GHz:0.00% (49) During runs the maximum freq used is 1.30 GHz. I tried editing /etc/default/cpufrequtils and increasing the MAX_SPEED value, but it does not appear to have had any affect even after reboot. How do I enable the higher frequency steps? Regards, James
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines