Neko May Posted February 9, 2020 Posted February 9, 2020 I have noticed that the NanoPi M4V2 has an OPP table which causes the device to overclock; the table is suitable for an RK3399K SoC, but the M4V2 has a standard RK3399 SoC with maximum speeds of 1.42GHz for the A53 cores and 1.80GHz for the A72 cores (despite FriendlyArm themselves citing the K speeds in marketing material, which is incorrect). I believe that this may also be causing stability issues with my unit, however I will be performing additional testing (I cannot find where it is getting the "extended" OPP tables from, so I am limiting it via cpufreq-set and testing for stability.) Does anyone have any insight or ideas?
piter75 Posted February 9, 2020 Posted February 9, 2020 You are probably using 5.x kernel... All rk3399 boards in Armbian "current" and "dev" targets are overclocked to 1.5/2.0GHz. There are quite a few reports about M4V2 instabilities with 5.x kernels and AFAIK lowering max. CPU frequencies does not help The solution to run it stable for now is to use legacy images. I am more and more leaning towards suspecting RAM configuration - M4V2 has some exotic memory chips from Rayson. One thing that I find supports this claim is that when built with u-boot v2020.01 and RAM training in u-boot's TPL phase it fails to boot kernel, while other LPDDR4 rk3399 boards I have (roc-rk3399-pc, rockpi 4, rockpro64) boot fine in this scenario.
Neko May Posted February 10, 2020 Author Posted February 10, 2020 3 hours ago, piter75 said: You are probably using 5.x kernel... All rk3399 boards in Armbian "current" and "dev" targets are overclocked to 1.5/2.0GHz. There are quite a few reports about M4V2 instabilities with 5.x kernels and AFAIK lowering max. CPU frequencies does not help The solution to run it stable for now is to use legacy images. I am more and more leaning towards suspecting RAM configuration - M4V2 has some exotic memory chips from Rayson. One thing that I find supports this claim is that when built with u-boot v2020.01 and RAM training in u-boot's TPL phase it fails to boot kernel, while other LPDDR4 rk3399 boards I have (roc-rk3399-pc, rockpi 4, rockpro64) boot fine in this scenario. I've used a 5.x kernel on my Rock Pi 4 and did not experience any stability issues. That said, overclocking these chips is not recommended anyway (overclocking never is) and Armbian should not be doing it by default. The OPP table in the mainline kernel tree has the correct values for boards with a standard RK3399 or RK3399Pro; I do not know of any common boards with an RK3399K, and certainly none that are supported in any way by Armbian.
piter75 Posted February 10, 2020 Posted February 10, 2020 5 hours ago, Neko May said: Armbian should not be doing it by default I don't know the reason for the decision to do it, as it predates the beginning of my usage of Armbian, but I tend to agree that we should play safe by default. I did not have any issues with it on my boards so I did not pursue to change it but... When I was suspecting CPU issues with M4V2 I created the changes to disable the overclocking but also to make it easy to overclock with the overlay if one wanted to experiment: https://github.com/armbian/build/compare/rk3399-disable-overclocking It did not fix the M4V2 mainline issues but I think we can reconsider the default behaviour anyway. 2
Neko May Posted February 10, 2020 Author Posted February 10, 2020 7 hours ago, piter75 said: I don't know the reason for the decision to do it, as it predates the beginning of my usage of Armbian, but I tend to agree that we should play safe by default. I did not have any issues with it on my boards so I did not pursue to change it but... When I was suspecting CPU issues with M4V2 I created the changes to disable the overclocking but also to make it easy to overclock with the overlay if one wanted to experiment: https://github.com/armbian/build/compare/rk3399-disable-overclocking It did not fix the M4V2 mainline issues but I think we can reconsider the default behaviour anyway. Very smart changes for sure! In the meantime, I have mine running, I'll give it a little push and let it go for a bit to see if there are still stability issues with the 5.5.2 kernel.
TRS-80 Posted February 10, 2020 Posted February 10, 2020 16 hours ago, Neko May said: Armbian should not be doing it by default 10 hours ago, piter75 said: I don't know the reason for the decision to do it, as it predates the beginning of my usage of Armbian, but I tend to agree that we should play safe by default. In "normal" world (x86, etc.) I would tend to agree. And personally, I never overclocked any of my hardware since ever. However in Armbian, I know that a lot of testing has been done on lots of boards. If you read back all the posts of @tkaiser (and others) it has been pretty clear to me since years that lots of scientific (testing / fact) based research has been put into optimizations on various boards and chipsets. And the results of all that then baked into Armbian. Which were some of the reasons that attracted me initially. Now, I am only a low to middle level wizard at best, but OTOH I have been reading here for a number of years already. I like to read and study. And since becoming Moderator, I have only been reading even more (into other areas I did not necessarily go into before). Point I am trying to make is, unless you have done some actual experiments (and I would suggest publishing results for peer review, as has been done with all past optimizations) I would hesitate strongly to make out of hand comments on generalities which may not apply at all to Armbian and/or the hardware in question. At a minimum, I would search the forums for the reasons why certain things are already the way they are. There is a wealth of technical information here going back years. But my default assumption would be that, as @piter75 alluded to, there is in fact "some reason" for those decisions, even if that goes against your (and my) normal gut feeling.
Neko May Posted February 17, 2020 Author Posted February 17, 2020 (edited) I've been running mine now for 7 days with no reboots; using datasheet limits, uboot-nanopim4v2 20.05.0-trunk038 and kernel 5.5.2-rockchip64 #trunk.038 it seems to be stable. I do get "Decoding ERROR" at random when running the full 7z benchmark, however. Nothing else seems affected but if anyone has suggestions of other things to try (that are not sbc-bench, which changes the CPU max speed/governor) let me know and I'll give them a spin (unless they require 600 dependencies). I should install XFCE and xscreensaver and see how that runs and if X11 crashes; perhaps give it another week with that (or less, if it fails). Update: X11 fails to start, consistently segfaulting at OsLookupColor+0x188. Edited February 17, 2020 by Neko May
chwe Posted February 17, 2020 Posted February 17, 2020 On 2/10/2020 at 7:12 PM, TRS-80 said: However in Armbian, I know that a lot of testing has been done on lots of boards. If you read back all the posts of @tkaiser (and others) it has been pretty clear to me since years that lots of scientific (testing / fact) based research has been put into optimizations on various boards and chipsets. And the results of all that then baked into Armbian. Which were some of the reasons that attracted me initially. and sometimes it's more or less the 'works for me' approach where it worked in the first place for 'developer x' and as long as nobody could show the opposite it was assumed to be 'good'. Mostly with one or maybe two boards of the SoC in question.. So our sample amount to optimize parameter might not be high enough to call this scientific. So our settings are based on observations but likely not on a scientific relevant sample amount. On 2/10/2020 at 7:12 PM, TRS-80 said: In "normal" world (x86, etc.) I would tend to agree. And personally, I never overclocked any of my hardware since ever. oh most of the few OC-er I know spend an unhealthy amount of time to ensure their OC setup works perfect stable at highest possible settings for their CPU/GPU/RAM, probably a way more time than we invest in our settings (my last AMD64 except my notebook where thermals don't allow to think about OC at all is now 7years old and has probably 2-3k hours stable at slightly overclocked settings, I don't see a difference between overclocking between architectures).. Now back to the topic and back to my observations.. I probably packed the linux kernel with 7z a few hundred times when I played around zram trying to find a difference in performance between lzo and lzo-rle (which is claimed to be faster on arm, and 7z is great to soak up a lot of memory). The board itself run for roughly 2 weeks with the image (5.3 kernel back then) and 7z run for hours over night packing and unpacking kernel sources. On my board I didn't notice instabilities except oom can kicks in (it mostly happened when trying to compile large libraries, but also happens sometimes with 7z and reducing available ram with kernel bootargs below 2GB iirc - could be less, I barley keep notes of such stuff when I don't see promising results. I sometimes should, turns out my brain runs oom too and I forget stuff over time). On my NanoPi M4V2 our default 2GHz/1.5GHz settings worked just fine.. Could be that I just got 'a better board than yours' which allows higher defaults (the same happens in AMD64 world - some CPUs from the same spec just perform better than others) or that something else on your settings isn't in a great shape and if this is the case I would bet on your powersupply. We see first indications that we run into the same nightmare with powering as we did with microUSB back then now on boards using USB-C in 'dump mode' being not PD compliant (USB-C is better than microUSB but the boards it's mostly used are also more powerhungry so being better doesn't mean being good). My setting was headless with a RPi4 PSU which is to my knowledge rated at 5.1V/3A. What PSU did you use for your board. @piter75 (maybe adding the other usual suspects too, so @TonyMac32 and @martinayotte) if this turns out to be a issue for m4v2 whereas the other boards do fine we could simply solve it by DT overlay and having it as default for those boards do well and disable it for the M4V2. Similar to https://github.com/armbian/build/blob/e78f6db215c9444ce83b8e80c85af88158ce4c2e/config/boards/orangepizero.conf#L6 to bring up USB2 on pinheader by default. 1
piter75 Posted February 17, 2020 Posted February 17, 2020 15 minutes ago, chwe said: solve it by DT overlay and having it as default for those boards do well and disable it for the M4V2 We could do that but I am pretty sure, although of course may be wrong ;p, that it's not a CPU frequency issue. What I observed so far is that most of the time when I cold boot the board I see following lines at the end of DDR training in Rockchip's binary: change freq to 856MHz 1,0 ch 0 ddrconfig = 0x101, ddrsize = 0x2020 ch 1 ddrconfig = 0x101, ddrsize = 0x2020 pmugrf_os_reg[2] = 0x3AA1FAA1, stride = 0xD ddr_set_rate to 328MHZ ddr_set_rate to 666MHZ ddr_set_rate to 928MHZ channel 0, cs 0, advanced training done channel 0, cs 1, advanced training done channel 1, cs 0, advanced training done channel 1, cs 1, advanced training done channel 1, cs 0, dq 31 RISK!!! TdiVW_total violate spec channel 1, cs 1, dq 31 RISK!!! TdiVW_total violate spec ddr_set_rate to 416MHZ, ctl_index 0 ddr_set_rate to 856MHZ, ctl_index 1 Those "dq 31 RISK!!! TdiVW_total violate spec" lines do not look promising. OTOH most of the times when I reboot the board after some time I do not see those lines which may mean that memory chips used in M4V2 require a bit different params. As a piece of additional info, I also tried to use u-boot's TPL to train LPDDR4 in v2020.01 and although it works well for RockPi 4, RockPro64 and ROC-RK3399-PC it ends up with a kernel crash on 100% of boots with M4V2 1
Neko May Posted February 18, 2020 Author Posted February 18, 2020 Last I checked Rayson's site didn't have a datasheet or even list the part number for the DRAM chips used on the M4V2. That was a while ago, so it might be worth checking again (which I can do in a bit) and if all else fails I'll try asking FriendlyARM for a copy of the datasheet or whatever specs they may have for the chips. In addition, my partner's M4V2 also has 7z fail randomly with "Decoding ERROR" but was stable for a few days at least. The chips are marked RS512M32LZ4D2ANP-75BT, no luck at their site
Neko May Posted February 22, 2020 Author Posted February 22, 2020 (edited) I got X11 running, it just worked after a reboot. (And crashes when trying to use it, exact same place.) Edited February 22, 2020 by Neko May
Recommended Posts