AxelFoley

Members
  • Content Count

    58
  • Joined

  • Last visited

About AxelFoley

  • Rank
    Advanced Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I have been monitoring the voltage and current draw of an individual RockPro64 on boot with the PCIe NVME Connected going to console. I then launched the desktop as both root (Armbian desktop) and pico user (looks like XFCE) and looked for any power spikes. There was none! Both tests caused a lockup and HW Freeze. On boot the peak current draw was 0.7A with 12v steady, and current draw dropping to 0.3A steady. Then I launched Xorg as root (in the past tended to be more stable) and user pico (always locked up immediately) I did notice a difference when I ran startx as pico user and it launched XFCE .. its current draw peaked at 0.7a and voltage dropped to 11.96V Desktop immediately locked the board on launch. When I ran startx as root and it launched armbian desktop ... it only peaked at 0.6A and voltage never dipped below 12v. The desktop was responsive until when I loaded the Armbian forum in chrome and triggered the board lockup ..... there was no current spike or voltage drop! The board locked up while only drawing 0.29A. I checked the Pine forum and other people are reporting exactly the same issue as myself with the PCIe/NVMe setup. One person in 2018 reported he fixed the issue by launching a different kernel. I have logged the problem on the pine forum and got this response; "Currently working on the issue. It seems - as odd is its sounds - that the problem is somehow linked to pulseaudio. If you uninstall pulseaudio, and use alsa instead, the issue will just vanish. We have tried blacklisting PCIe for pulse in udev, and it prevents the issue from happening, but it also returns a segmentation error (SATA card / other adapter not accessible). Its very very strange". Should I stop posting here and move the discussion to Pine ?
  2. @TonyMac32 Thankfully I have the RockPro64 v2.1 board.
  3. @pfry Looking at the power Specifications for PCIe x 4 => Suggest that it can be powered from 3.3v (9.9W) & 12v (25W), however from the RockPro64 Power schema it looks like they negate to feed the 12v rail from the Supply voltage to the PCIe. Instead Pine feeds the PCIe Interface on the board only by the 3.3v (3A) rail (9.9W). From the Power schema It also looks like the board designers feed PCIe from the 5.1v rail converted by the RK808 PMU to 1.8v on vcc1v8_pcie (not sure how this is intended to be used on PCIe) I am using the Pine PCIe ver 3.0 NVMe Card with a Samsung 970 EVO 500GB and a RockPro64 v2.1 Board. PSU is a 102W 12v LRS-100-12 The Max power draw of the EVO 970 NVMe is 5.8W (1.76A) which should be within spec for that 3.3v rail. But this bit worry's me "vcc_sys: could not add device link regulator.6 err -2" vcc_sys is the 3.3v rail that feeds the vcc3v3_pcie feed into the PCIe Socket. Although dmeag later says it can enable a regulator for the 3.3v vcc3v3_pcie rail hanging off vcc3v3_sys. I also see this warning; "Apr 8 19:09:24 localhost kernel: [ 2.010352] pci_bus 0000:01: busn_res: can not insert [bus 01-ff] under [bus 00-1f] (conflicts with (null) [bus 00-1f])" I may have to get out a JTAG debugger to work this one out :-( Not sure if this is a kernel driver issue or HW Power design. ill see if I can get any Gerber files to scope out the voltage and current spikes.
  4. I think I have found the issue !!!!!!! Its the PCIe Express NVMe Card. I remove it and the desktop seems not to hang .... I add it back in .... and the desktop hangs. I wonder if this has something to do with power spikes when there is graphics activity the errors in the dmesg indicate that vpcie1v8 = 5.1v rail Shared with GPU vpcie0v9 = 3v Rail Both of these rails hang off the same core buck converter SY8113B, the other SY8113B manages the USB peripherals seperatly. @AndrewDB .... looks like you may be correct it was power all along but it looks like its a kernel issue with the PCIe Power Management ?
  5. Interesting!!!! Fresh build on the spare EMMC64 Armbian_5.75_Rockpro64_Ubuntu_bionic_default_4.4.174_desktop.img created new user pico after changing the root password. The Device initiated nodm which initiated Xorg. Loaded the Armbian desktop ....... and hung immediately! HW Freeze and lock screen. Subsequent boots only boot to command-line with login prompt not desktop. /etc/default/nodem still has user root as default user to log in not pico. manually starting startx results in black screen of death as pico user (or on second try HW lockup see attached) but root user loads desktop OK when running startx I have done no apt upgrade && apt update armbianmonitor -u results below. http://ix.io/1FGs mmc driver issues [Mon Apr 8 19:23:29 2019] rockchip_mmc_get_phase: invalid clk rate [Mon Apr 8 19:23:29 2019] rockchip_mmc_get_phase: invalid clk rate [Mon Apr 8 19:23:29 2019] rockchip_mmc_get_phase: invalid clk rate [Mon Apr 8 19:23:29 2019] rockchip_mmc_get_phase: invalid clk rate PMU issues may affect efficiency of CPU idle when not under load [Mon Apr 8 19:23:29 2019] rockchip_clk_register_frac_branch: could not find dclk_vop0_frac as parent of dclk_vop0, rate changes may not work [Mon Apr 8 19:23:29 2019] rockchip_clk_register_frac_branch: could not find dclk_vop1_frac as parent of dclk_vop1, rate changes may not work some possible pcie\nvme issues [Mon Apr 8 19:23:30 2019] rockchip-pcie f8000000.pcie: Looking up vpcie1v8-supply property in node /pcie@f8000000 failed [Mon Apr 8 19:23:30 2019] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found [Mon Apr 8 19:23:30 2019] rockchip-pcie f8000000.pcie: Looking up vpcie0v9-supply from device tree [Mon Apr 8 19:23:30 2019] rockchip-pcie f8000000.pcie: Looking up vpcie0v9-supply property in node /pcie@f8000000 failed [Mon Apr 8 19:23:31 2019] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [Mon Apr 8 19:23:31 2019] pci_bus 0000:01: busn_res: can not insert [bus 01-ff] under [bus 00-1f] (conflicts with (null) [bus 00-1f]) some pwm issues [Mon Apr 8 19:23:31 2019] pwm-regulator: supplied by vcc_sys [Mon Apr 8 19:23:31 2019] vcc_sys: could not add device link regulator.8 err -2 [Mon Apr 8 19:23:31 2019] vcc_sys: could not add device link regulator.8 err -2 [Mon Apr 8 19:23:31 2019] vcc_sys: could not add device link regulator.11 err -2 .... etc a load of these some sound driver issues [Mon Apr 8 19:23:32 2019] of_get_named_gpiod_flags: can't parse 'simple-audio-card,hp-det-gpio' property of node '/spdif-sound[0]' [Mon Apr 8 19:23:32 2019] of_get_named_gpiod_flags: can't parse 'simple-audio-card,mic-det-gpio' property of node '/spdif-sound[0]' [Mon Apr 8 19:23:32 2019] rockchip-spdif ff870000.spdif: Missing dma channel for stream: 0 [Mon Apr 8 19:23:32 2019] rockchip-spdif ff870000.spdif: ASoC: pcm constructor failed: -22 [Mon Apr 8 19:23:32 2019] asoc-simple-card spdif-sound: ASoC: can't create pcm ff870000.spdif-dit-hifi :-22 [Mon Apr 8 19:23:32 2019] asoc-simple-card spdif-sound: ASoC: failed to instantiate card -22 some Ethernet nic issues (but still works) [Mon Apr 8 19:23:46 2019] cdn-dp fec00000.dp: Direct firmware load for rockchip/dptx.bin failed with error -2 [Mon Apr 8 19:24:02 2019] cdn-dp fec00000.dp: Direct firmware load for rockchip/dptx.bin failed with error -2 [Mon Apr 8 19:24:35 2019] cdn-dp fec00000.dp: Direct firmware load for rockchip/dptx.bin failed with error -2 [Mon Apr 8 19:25:39 2019] cdn-dp fec00000.dp: [drm:cdn_dp_request_firmware] *ERROR* Timed out trying to load firmware [Mon Apr 8 19:23:32 2019] asoc-simple-card: probe of spdif-sound failed with error -22
  6. @balbes150 It for prototyping, education and engineering ... essentially enabling a quick and dirty evaluation of SOA (Service Oriented Architecture) concepts for myself and some other devs. e.g. NOSQL Databases and in particular how to integrate Service discovery and RDMA paradigm's such as the OFED stack (RoCE), and building restFul interfaces & API Abstraction while understanding principles such as standardized Data and message models. In essence to evangelize open source software and frameworks as a solution to proprietary software integration & inter-operation inertia.
  7. @AndrewDB apologies I missed the suggestion ... I disabled the HW Acceleration and restarted chrome from the terminal. Disabling the HW Acceleration stopped error messages being displayed in stdout. However the rockpro64 still locked up with a HW freeze loading the Armbian forum. But I think the whole graphics drivers on some of my cluster node have gone foo-bar for some reason. I may need to "apt install --reinstall [graphics subsystem packages] " to be sure that this is a bug not a missing library during interrupted apt update && apt upgrade
  8. Right I have now caught up with everybody's comments and questions and hopefully answered them. I now need some guidance on what to do next .... ### Hypothesis 1 ###: The root cause of the instability was a fundamental issue with the current Armbian code base (kernel/driver) and the rockpro64 4GB V2.1 + NVME + 64GB EMMC triggered by chromium loading the Armbian forum (100% reproducible) *** Action *** Continue to troubleshoot the unstable cluster master when using Chromium and figure out how to trap the HW Lockup when chromium launches the Armbian forum (nothing captured in the system log files The only hint I get is from launching Chromium from a terminal (thanks @NicoD I had no leads until you suggested this as it was a complete HW Freeze\Lockup); root@rockpro64_0:~# chromium-browser libGL error: unable to load driver: rockchip_dri.so libGL error: driver pointer missing libGL error: failed to load driver: rockchip [17517:17517:0406/144119.625593:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process. [17647:17672:0406/144121.185188:ERROR:command_buffer_proxy_impl.cc(124)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer. ### Hypothesis 2###: The root cause of the issue is a corrupted Ambian installation caused by interruption to salt distributed "apt-get update && apt-get upgrade" (indicated by the need to force reinstall libegli due to missing libraries) *** Action *** Reformat the entire cluster using a desktop edition for the master and a cli version for the nodes and retest to see if we can recreate the issues with Chromium and also see if 20% of the nodes are still rebooting constantly. I have a spare EMMC 64 Chip so I can save the current boot image on the cluster master if I need to go back to that image. Q). Should I restart from scratch and reformat the cluster with an image recommended by this forum and I start testing from a known good stable place ? (seems Armbian does not have anybody offering to test the RockPro64 4GB V 2.1 + PCIe NVME + BlueTooth\WiFi Module seeing as I have 10 of them I may be a good volunteer) Or Should I look at validating the current master Armbian packages and kernel driver installs to make sure there was not a corrupted installation / driver configuration issue that may lead to unearthing a genuine bug report?
  9. @chwe My bad .... to be fair ... every time I loaded the forum to post logs from the unit itself ... it chromium crashed the board :-) I did not release that armbianmonitor posted to a urlupload site ... smart ! http://ix.io/1Fs4 See attached to the results from armbianmonitor -u
  10. @chwe Yes I have been working with the Pico Cluster guys to revise the power supply unit they ship with their cluster and we have upgraded it to a 12v unit instead of a 5V unit. I have also completely rewired the power cable loom and installed a buck converter for the 5v Switches and Fans. I have not gone to the extreme of checking individual output voltages and currents to the Boards from the PSU. But I have been monitoring the clusters total power consumption. The cluster is not loaded at all because I have not been able to do any work on it due to the stability of it. at peak it has never pulled in more than 64watts (5.3A). Most of the load is Cassandra recovering after the node reboots as its multi threaded it can load all 6 cores. The 12v DC In goes via two Buck Converters (SY8113B) to create 2 x 5.1v rail. One of those rails goes into a RK808 Buck converter which I think is embedded into the RK3399 chip. That RK808 feeds the GPIO pints which I have measured to be 5.190v. There is a 4 Pin SATA JST Connector that says its wired to the raw DC in (12v) (but SATA also has 12v, 5v and 3,3v specifications so I am not sure of one of those pins is in addition the 5v rail direct after the SY8113B). @chwe Do you want me to monitor the 5v rail after the SY8113B) with an Oscilloscope prior to lock up? Have you some concerns with with the board power design?
  11. @chwe A quick heads up .... people may not be aware but the pine guys have their own image installer based on Etcher that automatically pulls the Armbian image down without any indication of WIP or Stable status of the project (see attached). This is here Pine falls down a bit ... they should have focus on one main desktop & command line release to recommend to users ... if they are going to obscure project status from their installer. I only found out about WIP when I reported the issues. I have been testing several desktop imaged and to be fair to Armbian ... they are all far away from where Ambian is on the RockPro64, its by far the best performing I have found (mrfixit build is broken atm if you have an emmc boot). Maybe this is just me but I only need a desktop when developing to access a web browser .... it to save me having to kill trees ...... I have had to print this lot all out so I could continue working because of the Chromium / Mali driver issue. However I think that I have no choice but to reformat the whole cluster ... with all these lockups I have seen evidence of packages installed through apt that are actually missing libraries and I had to do force installs. apt install --reinstall lightdm apt install --reinstall libegli if it was not an issue with the armbian package release at the time or if the 10 RockPro64's I have were HW Locking up and resetting during a package install. Sometime its better to go back to square one, Whats peoples opinion ? I am happy to act as a tester for this board and Armbian and find out where the real issues are.
  12. see attached before crash, after crash and teh chromium error to the command line when launched from a terminal; I really don't care about chromium ,,,, when working on these boards I try to keep it as simple as possible and use stock packages in the hope I will have more bugs squashed. I am now using Firefox so I can at least start working on the GPIO code again. Thanks for the suggestions. It could be I need to run a debug kernel as I at least expected to see some issues there .... but the nature of the lockup could mean its a HW issue triggered in the Mali Graphics chip triggered by Chromium only
  13. @Da Alchemist that is a good point ... I made a rookie mistake thinking I could put up with a few glitches and still code. I then compounded it by using my cluster master as my development IDE, git master, salt master, Prometheus master, Grafana server.... its my own fault. I think its time to restart from scratch and move my code to github and reformat everything. I have clearly managed to bugger up the graphics mail drivers on the cluster Master/Dev box through continual upgrades and a link on the wiki to install HW accelerated mali drivers that are probably not feature complete. The decision was in part due to the fact the pico cluster case only allows for one easy HDMI output and that was my cluster master. I don't believe that there is thunderbolt support over USB C in the RockPor64 so a USB-C to HDMI Dongle is not an option. I can reformat the cluster without desktop and I have a KVM so I can access a web browser and the cluster more easily and I can use vim (mostly I am writing C and python).
  14. @NicoD Brilliant idea ..... I don't know why I did not think of that ... See attached ... I got those errors when 1st launching the Chromium browser. It works for a while as I navigate to the Armbian forum ... then locks up with no more command line output. Interesting that it goes back to my hunch this was a graphics driver issue causing the instability. Its just strange that Chromium triggers the issue but not firefox!
  15. armbian-config does not do a good job of installing lightdm, it failed to launch lightdm on boot with errors around trying to restart the service too quickly after it initially failed to start. I don't have time to troubleshoot somebody else's sloppy mess. I have reverted back to nodm so I can get back to writing some code. The display lockup when nodm boots and logs in as any other user apart from root ... is not going to be nodm's problem I suspect. more likely an Xorg config issue \ graphics driver. good to know I can avoid 90% of my stability issues just but not using the default armbian chromium browser. I am soon going to have to face facts and accept I need to reformat my entire cluster with a different distro, I just can not get Armbian stable on rockPro64. I have been trying for 4 months. I have these lockup issues with my master node and I have 20% of my nodes constantly restarting. I have eliminated power issues. I have ordered a load of heatsink fans to eliminate the last thing I can think of ... heat as being the problem (despite them all having heat sinks and a pico cluster case). The fans are the last thing I try before giving up on armbian and starting from scratch and loosing all my salt/cassandra/promethious/grafana cluster configuration.