wtarreau

  • Content Count

    40
  • Joined

  • Last visited

Posts posted by wtarreau

  1. 5 hours ago, balbes150 said:

    > The link was removed to avoid creating ads for a specific store. Everyone can find similar products with a similar price.

     

    It would have been nice at least to keep the mention that the device Balbes was talking about was the X96 Max, that's not specific to any store and it's just a board with a supported SoC (S905X3), otherwise his post becomes confusing or misleading since it's a different hardware from the one posted above.

  2. 4 hours ago, chwe said:

    Don't get me wrong, willy's buildfarm (https://www.cnx-software.com/2019/01/07/nanopi-neo4-build-farm-rk3399-overclocking/) is great, but not everyone wants to build such a farm just to avoid cross-compiling.

    Please note, my build farm is used for cross-compiling only. I used to do native builds 20 years ago, and after having been hit several times by accidental dependencies on the build host, I stopped and am always cross-compiling nowadays, even when doing x86 on x86. That's why I can use whatever host is available for the build farm. My build farm at home is heterogeneous, it's made of the armv8 boards above, one armv7 board (odroid xu4) and sometimes some x86 hosts when the devices are up. So yes, I'm a huge proponent of cross-compiling.

  3. 4 hours ago, JMCC said:

    I'm not sure whether they correspond exactly with the number you set, but I can tell they mean a real higher clock speed. I tried with HK's image, and increasing the clock speed through that method causes an increase in hashes/sec with a CPU miner (like ~14.50 hash/s at 1512MHz, vs ~16.50 at 1752Mhz).

    I understood as well that HK got their hands on the blob part and were able to figure the highest stable frequency they could use. I don't know how this blob is used but definitely without it you won't go above 1.5 GHz.

  4. 1 hour ago, datsuns said:

     

    What avenue of contact did you use when you got in touch with them? I tried techsupport@friendlyarm.com but no response so far.

    Use sales@, I've used it several times and it works. Oh and put a link to the discussion in this forum so that they know you're not just doing random stuff but are actually using their boards fine. I wouldn't be surprised if they're used to receive an occasional complaint from people who just accidently erase their micro-SD and cry for help.

  5. 3 hours ago, datsuns said:

    If all fails I will try to reset the connections with the oven method but I will treat that as plan C. I will post a followup later with whatever ends up happening.

    OK, good luck! Note, you may want to contact FriendlyElec before putting the board into the oven to let them know that one of your boards seems to be dead and let them know what you've tried and that you're planning on trying the oven. Maybe they'll simply want to send you a replacement one and will ask for this one to be sent back there for inspection. They've already sent me some replacement hardware for early defects, they're really nice.

  6. 2 hours ago, datsuns said:

    I am leaning toward a dead board but is there anything else I should try before I throw in the towel on the board?

     

    You should try to connect a serial adapter to its console port to see if it emits anything at boot. If it doesn't, it's probably dead. If it emits random errors which differ upon every boot, it could be the RAM which became defective. This happened to me on a few boards in the past, and on one of the MIQIs on my build farm. It is also possible that a solder joint has gone bad under a BGA chip. That's the most common failure cause in modern hardware (especially smartphones). It's even worse with RoHS and lead-free because lead-free tin is less elastic and breaks more easily. I've repaired a few such dead boards using a hot air gun and an infrared thermometer, but each time you know that you're probably killing it even more, so you have to be prepared to this. Note that it can also work in an oven if you want to try. Most of the time it doesn't work and can even make things worse but if the board is dead, there's nothing to lose to try. Just pre-heat your oven at 180°C without the board, make sure the temperature is stable (otherwise pre-heat it at 200 and stop it). Then place the board inside for 3-5 minutes. Don't touch it to extract it, open the oven to let it cool down enough, and pick the board once the temperature has dropped enough for the solders to be solid. Some plastic may get slightly damaged if the oven is too hot (e.g. reset button).

     

    2 hours ago, datsuns said:

    Also, I am now concerned about the health of my other boards and think I might be running them too hard/hot. I think my average temperature has been about 63 degrees Celsius. I have always been curious what an acceptable temperature is for these.

     

    The datasheet recommends -25 to 85°C operating temperature (both ambient and die), and -40 to 150 storage. In short it means that it must work flawlessly at 85, that beyond this it only will if you're lucky, and that at 150 you're definitely certain to fry it. Also running a chip past its maximum operating temperature risks to hang it and a hung chip may quickly get damaged. But below 85 you have zero risk for the device itself. If you're overclocking, the margin may be lower.

     

    2 hours ago, datsuns said:

    I run my boards pretty hard but I will throttle them back a bit if I am risking their longevity.

     

    My personal appreciation of "running pretty hard" is when the chip's temperature never has the opportunity to fall back into the recommended range ;-)

     

    2 hours ago, datsuns said:

    I'm also probably going to do the copper spacer mod that tkaiser posted about now to see if I can get my temps down.

     

    Yep I agree. I wanted to do it as well but my copper plates are too thick. However since my board is inside a very tight cardboard "enclosure", I've added an aluminum plate on the other side above two efficient thermal pads. It allows to collect some of the heat from the other side and spread it through the cardboard. It was enough to make the temperature drop by 5-10°C here. But in general since aluminum has a moderate thermal resistance, you indeed want to improve its contact surface with the hot source. A copper plate (which is a much better thermal conductor) does this by spreading the heat all over the base of the heat sink and avoiding hot spots below it. Ideally you need to have a soft thermal pad covering the whole surface of the copper plate, except the small silicon part, so that the heat is extracted from anything below. I've long been wondering if it would be possible to achieve this using a good chunk of silicon glue to stick the copper plate on top of everything. But that would definitely destroy the board.

  7. 2 hours ago, shaun27 said:

    Is their a way out this without doing a complete rewrite on sd card again?

    I never had such an issue, but I never plugged HDMI into it either, so it's hard to say if anything is related. If it fails to boot, I'd suggest it's not related to the software/distro so you'd better ask FriendlyElec directly in case they'd be aware of any stability issue or a workaround for what you're seeing. Note, be careful about the power supply, especially if it's shared between all boards. It could be possible that when they're all rebooting, the PSU doesn't cope nicely with the sudden current rush and provides too low a voltage to the boards.

  8. Hi,

     

    On 9/27/2018 at 8:57 PM, cmoski said:

    @wtarreau -- If you have a chance I'd love to take a look at your thermal throttling point configuration. Standard disclaimer of course noted :-)

     

    So I finally found some time to pick the file form the board, sorry for the delay, was busy on other stuff. I verified the temperature thresholds. First alert is at 113, second at 115 and critical is at 120. The file includes one extra DVFS entry for 1600 MHz at 1.25V. I didn't find any other difference compared to the factory dtb. For those landing here from a search engine without preliminary context, this works for my board and is very likely to fry yours, so don't randomly try this file at all, or don't complain!

    s5p6818-nanopi3-rev05.1g6-1v25-113deg.dtb

  9. 1 hour ago, tkaiser said:

     

    @wtarreau my first 96boards thing so far (just like you I felt the standard being directed towards nowhere given that there's no Ethernet). And guess what: 2 x Ethernet here!

     

    Hehe, that doesn't change my opinion of this standard which is only useful to build development boards and nothing looking even remotely like a usable prototype. The small form factor only exposes useless stuff and the large one requires access to both sides, thus imposing an enclosure size. But for development I'm totally fine with the EE form factor as it provides a rich set of connectors and standards in a reasonable size.

  10. Confirmed, from memory I just added a line with "1600000 1250000". Mine is ultra-stable even with 8 cores at full speed. It happens to throttle a little bit from time to time because the heat cannot escape easily from the cardboard enclosure, but that's all. From what I remember from the datasheet, the chip is designed to run at 1.6 GHz, that's why I picked this frequency.

    I also modified the thermal triple points because I don't want the machine to throttle quickly, I seeked the limits on mine and set the points slightly below. I can upload the file later if some want it. It's just that it's possible that my temp limits might cause issues to others so we need to be cautious and I don't want people to randomly download the file without reading the context, then complain about this board's stability when using armbian...

     

  11. 5 minutes ago, hjc said:

    @wtarreau There's currently only a wiki page for M4 but not NEO4.

    I thought they were two different names for the same upcoming board, with various intermediary designs. But you're right, the NEO4 is even smaller than the M4! So yes that makes sense, it's a single channel RAM. Then I think I'm more interested in the M4. However if they made a complete aluminum enclosure like they recently did with the NEO/NEO2 with the buttons and OLED, it could be very tempting to get one as well for about everything you can do with a machine lying in your computer bag!

  12. @tkaiser I totally agree with you. I'm checking every morning on their site while drinking my first coffee if it's available or not! While I'm not *that* much impressed by RK3399, it's still a pretty good SoC, and this combined with FE's documentation and thermal design should bring something really nice. I'm just wary of the 32-bit memory, we'll have to see once it's available. I can understand their choice given the small size of the board though.

  13. 49 minutes ago, tkaiser said:

    Well, sbc-bench is using @wtarreau's nice mhz tool to calculate real clockspeeds and I hope I use it correctly.

    What you can do is increase the 2nd argument, it's the number of loops you want to run. At 1000 you can miss some precision. I tend to use 100000 on medium-power boards like nanopis. On the clearfog at 2 GHz, "mhz 3 100000" takes 150ms. This can be much for your use case. It reports 1999 MHz. With 1000 it has a slightly larger variation (1996 to 2000). Well, it's probably OK at 10000. I took bad habits on x86 with intel_pstate taking a while to start.

     

    Maybe you should always take a small and a large count in your tests. This would more easily show if there's some automatic frequency adjustment : the larger count would report a significantly higher frequency in this case because part of the loop would run at a higher frequency. Just an idea.

     

    Or probably that you should have two distinct tools : "sbc-bench" and "sbc-diag". The former would report measured values over short periods, an the latter would be used with deeper tests to try to figure whats wrong when the first values look suspicious.

     

  14. 2 hours ago, tkaiser said:

     

    
    root@clearfogpro:~/mhz# cpufreq-set -g performance
    root@clearfogpro:~/mhz# ./mhz 
    count=645643 us50=20196 us250=100998 diff=80802 cpu_MHz=1598.087
    root@clearfogpro:~/mhz# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
    1332000

    So for whatever reasons we have a nice mismatch between clockspeeds reported via sysfs and real clockspeeds with Armada 38x :)

    Please note that the operating points is usually fed via the DT while the operating frequency is defined by the jumpers on the board. It's very possible that the DT doesn't reference the correct frequencies here. From what I've apparently seen till now, the Armada 38x has limited ability to do frequency scaling, something like full speed or half speed possibly. When I was running mine at 1.6 GHz, I remember seeing only 1600 or 800 being effectively used. I didn't check since I upgraded to 2 GHz (well 1.992 to be precise) but I suspect I'm now doing either 2000 or 1000 and nothing else. Thus if you have a smaller number of operating points it would be possible that they are incorrectly mapped. Just my two cents :-)

  15. On 6/24/2018 at 8:31 AM, tkaiser said:

     

    No need for a full aluminium enclosure (to rip-off clueless users as today with FLIRC case and others). Just a thin thermal pad combined with an aluminium bottom plate combined with an enclosure top made out of plastic, wood or whatever is all that's needed: https://forum.armbian.com/topic/6794-pi-factor-cases/?do=findComment&comment=51529

     

    Oh I definitely agree and that's what I was thinking as well in the case an RPi enclosure was used : cover all the bottom with a 1mm thick aluminum plate that will radiate the heat through the plastic over all this surface. After all my cardboard-made npi-fire3 enclosure is not far from this :-) BTW I wasn't aware of the FLIRC case at all.

  16. 14 minutes ago, Da Xue said:

    Longevity is not the only concern. At higher frequencies, signal integrity drops. The higher the clock rate, the more likely of errors due to propagation delays, leaky transistors, and threshold voltages. The difference between 0% overclock and 25% overclock may mean the difference between 30 9's to 20 9's which is statistically significant for error propagation.

    Absolutely, and this factor is impacted by current (derived from voltage) and temperature. That's why what matters for stability is to find the proper operation conditions. Overclocking in a place where the ambien temperature can vary by 10 degrees can cause big problems. Same for those who undervolt to reduce heat because signals raise softly and degrade as well, or who use too small a heatsink. That said with nowadays software quality you often face a software bug many more times than hardware bit flips :-)

  17. On 6/18/2018 at 11:33 PM, chwe said:

    I really hope that's a USB-C in the left corner and not a micro-USB.

    It definitely is one, the size and shape leave no doubt about it.

     

    I'm also seeing some symmetric lines routed to the GPIO2 connector, maybe it's USB2 that's brought there. Maybe even PCIe (though I doubt PCIe works fine on such large connectors since it requires very low capacitance).

     

    I do also appreciate a lot the CPU on the correct side. Those who complain about the inability to use a heatsink in an RPi enclosure are also the ones not planning on using one anyway if it were on the other side :)  Also, very likely the other side will feature the DDR4 chips, and it will still be possible to use a heatsink there to spread most of the heat into the enclosure. But having an aluminum enclosure for such a design would be really great.

  18. 3 hours ago, subhuman said:

    How do you tell when a salesman is lying?

    This is exactly why there are people like us who dissect products and push them to their limits so that end users don't have to rely on marketing nor salesmen but on real numbers reliably measured by third party who don't have any interest in cheating.

     

    Regarding your point about Tj and lifetime, you're totally right, and in general it's not a problem for people who overclock because if they want to get a bit more speed they won't keep the device for too long once it's obsolete.

    Look at my build farm made out of MiQi boards (RK3288). The Rockchip kernel by default limits the frequency to 1.6 GHz (thus the Tinker board is likely limited to this as well). But the CPU is designed for 1.8. In the end, with a large enough heat sink and with throttling limits pushed further, it runs perfectly well at 2.0 GHz. For a build farm, this 25% frequency increase directly results in 25% lower build time. Do I care about the shortened lifetime ? Not at all. Even if a board dies, the remaining ones are faster together than all of them at stock frequency. And these boards will be rendered obsolete long before they die.

     

    I remember a friend, when I was a kid, telling me about an Intel article explaining that their CPU's lifetime are halved for every 10 degrees Celsius above 80 or so, and based on this I shouldn't overclock my Cyrix 133 MHz CPU to 150. I replied "because you imagine that I expect to keep using that power-hungry Cyrix for 50 years? For me it's more important that my audio processing experimentation can run in real time than protecting my CPU and having to perform them offline".

     

    However it's important that we are very clear about the fact that this has to be a choice. You definitely don't want companies to build products that will end up in sensitive domains and expected to run for over a decade using these unsafe limits. On the other hand, the experiments run by a few of us help figuring available margins and optimal heat dissipation, both of which play an important role in long term stability designs.

  19. 7 hours ago, gounthar said:

    Thanks a lot @wtarreau, I feel more confident about buying this board now.

    By the way if we start to be numerous to buy the board, it may finally become incentive for someone to design a 3D printed enclosure. I'd prefer a metal one with a thermal pad serving as a heat sink at the same time, but I'd be happy with anything better than cardboard+duct tape...

  20. 44 minutes ago, gounthar said:

    The review is really interesting, and I was about to buy one of these boards.
    I asked a few questions at the TechSales, and got one interesting answer:

    I hope it will be enough if the power source comes from a Raspberry Pi power supply... If this doesn't work, what kind of power supply could I buy to plug in the "DuPont line" (I guess it means the GPIO)?

    Any "correct" USB power supply delivering more than 1.5A under 5V will work, though you'll have to make you own cable or to solder the wires. But with good quality USB cables, it will also work via the micro-USB port, because the current drawn by this board is not *that* high. I even power mine from a USB3 connector of my laptop which delivers about 1.6A (it's over spec and that's great for this use case). You really need to test. Some reported 1.2A under 5V. It's only 33% higher than the regular USB3 limit (900mA) and may actually work fine with most PCs or chargers due to large enough margins in the design.

  21. On 6/11/2018 at 11:11 PM, datsuns said:

    You seem knowledgeable about the board so what do you feel like is a good stable operating temperature for these units?

    I'm pretty sure it depends on a number of parameters. Mine starts to throttle at 113 degrees C because I found that it works fine till 120 and I don't want it to throttle for no reason. In your case for a cluster it will be difficult to test all boards and check that they're running fine over time. But it can also be valuable. I seem to remember reading 90 degrees max in the datasheet so that could be a good start but it's very close to the existing limits. I don't know if the stability of your workloads is critical or if you can take the risk to see one board hang once in a while to find the limits. One other important factor to keep in mind is whether you're using the GPU or not. I am not, which is why I can trust the ability to throttle to cool it down. If you are not using it either, you could possibly decide to start with a limit at 105.

     

    Quote

    I am running active cooling and no overclock (1.4GHz) on all of my fires and I am seeing something between 66-70 degrees C on average when I periodically check on temps. This can of course vary depending on the ambient temerature in the room as well. If I tried out a mild overclock what temperature range would you start to get concerned at?

    I'm only concerned by temperatures getting close to the ones causing instability. For most of my hardware, when I focus on performance I don't care if it shortens its life since it will be obsolete before it dies.  That's why I searched the limits for my board. You need to keep a bit of margin because it takes some time for the temperature to be reported, then when the board starts to throttle it continues to heat a bit. However at very high temperatures it cools down very quickly. Mine throttles at 113 and it rarely reaches 115.

     

    Quote

    It sounds like you are running yours at 1.6GHz - what kind of cooling are you running?

    I'm using the stock heat sink, and worse, the whole thing is packed into a cardboard "enclosure" so that it can safely lay in my computer bag. Basically there is no air flow around it, it only adds latency to the temperature raise, and spreads it all around in the cardboard. It's totally horrible, and when I leave it for too long on my desk, the desk gets hot under it :-)

     

    For my use cases (mostly network endpoint for development) it doesn't throttle at all. I've run some build tests, and I have enough time to compile for a few minutes before it starts to throttle, but even when it does, it doesn't for too long (it oscillates between 113 and 115 degrees).

     

    Quote

    I'm currently using small fans running on 3.3v. My setup is close to my desk and I have found running them at 5v is on the annoying side so I'd like to keep them at 3.3v.

    Oh I know what you're talking about, I also happen to hate fans for the same reason. I've installed a 12cm fan behind my MiQi build farm at work, which is powered by the central board's GPIO when the temperature gets too high. It's a 12V fan running on 5V so it probably rotates at less than 1000 RPM and I almost can't hear it. The one at home has much larger heat sinks and no fan. Small fans are noisy and inefficient, you should really pick a large and slow one for your whole cluster. That's what I'd do if I built one (I'd love to just for fun, it's just that I figured that I have no use case for a NanoPi cluster at the moment!).

  22. 41 minutes ago, datsuns said:

    What do you think a safe minimum amps is for a high cpu load and not much else?

    Well, all these multi-port chargers never deliver up to the amount they claim. You can safely expect 50 to 66% though, which is not bad overall. I removed the current limit detection in mine to stabilize the output for the MiQi farm. That said, I never managed to pull more than 1.6A in peak from my Fire3 at 1.6 GHz under 1.25V, so you have some headroom I guess. You need to consider that when the board is hot, its DC-DC regulators' efficiency starts to drop and to turn the current into more heat. Thus it's more important to measure the current when the board is already hot if you want to be pessimistic (or realistic). That was the case for me at when I measured 1.6A. Quite frankly, you're worrying too much : if when loading all the boards it still works, that's fine. If you want to buy more boards, then buy them and plug them to your charger until you find the limit. The charger will either cut one port or completely shut down. Then you'll know how many more chargers you need to buy depending on the number of boards :-)

  23. 6 hours ago, datsuns said:

    This might be more of a general Armbian question but if I wanted to verify that my boards aren't experiencing any undervolting, is there a log file that I can look at?

    No, in my experience, the board will either hang, switch off, or reset when undervolted. Usually you need a voltmeter to check the board voltage under load. If you don't have one, you'll need to verify that they're all working fine (ie: ping them). The best you can do is to run cpuburn-a53 on all of them at the same time. If nothing fails, you should be fine.