Daneasch Posted June 30, 2017 Posted June 30, 2017 Hi guys, I'm playing around with two orange Pi's , a One and a Lite, and thanks to all the good work here on the forum with good success. Both OPi's are running python programs, one reads and counts buttons, and flashes some leds, and the other reads my smart meter and puts the data on a webpage. both programs are still under construction but working. The problem I need some idea's for is that both OPi's crash after running somewhere between 1 day to 6 weeks. So now I'm looking for ways to figure out why they crash so i can start working on a fix. Both OPi's run on separate adapters for power, I've been measuring the amps and voltages but there is nothing strange Idea's anyone?
Igor Posted June 30, 2017 Posted June 30, 2017 We are not clairvoyant or mind readers Please describe your setup as best as possible so we know what your operating environment is like. 1. Logs, when you can boot the board: armbianmonitor -u (paste URL to your forum post) 2. If your board does not boot, provide a log from serial console or at least make and attach a picture, where it stops. 3. Describe the problem the best you can and provide all necessary info that we can reproduce the problem. 4. What is your hardware setup? PSU and cables. Do you have anything plugged into USB?
StuxNet Posted June 30, 2017 Posted June 30, 2017 Do what Igor mentions. I'm pretty sure you can boot the board so here's my 2 cents. 1 day - 6 weeks is a helluva difference! If something hard/software was happening I would assume it would be at a much more regular interval than 1day - 6 weeks but absolutely don't rule it out. Just my gut but still a possibility. The computer technician in me says it's a local problem or user error. Both OPi's have the same defect? Unlikely. So here's what I mean and the ideas you ask for. Are you un/plugging devices on, in or around the device? What kind of power adapters are you using and are they both the same? Where are they powered? The same outlet? Try moving them to different locations or at least do different things to different boards and see if the issues remain, change or not. To quote Igor "Please describe your setup as best as possible so we know what your operating environment is like". This includes adapters, the 'buttons, leds' and webserver you're using, etc... I'm curious about this 1d-6wks thing. You mention your running python scripts so it wouldn't be hard for you at all to make a separate script that logs the uptime every second. When the device abruptly turns off you'll know how long the device was actually on. Then boot and run it again. Maybe this will give you a better idea of if it's actually 1d-6wks or if there's more of a pattern. In effect possibly determine what was being done around that time. Python sample script. Its coded to run. import time, subprocess def uptimeLog(uptime): with open('uptime.txt', 'w+') as log: log.write(uptime) while True: uptime = subprocess.check_output('uptime', shell=True) uptimeLog(uptime) time.sleep(1) To any haters: I realize it could probably be done in pure bash but.... I'm familiar with python enough to write working/simple code without debugging. Also .py might be overkill for such a simple task but once learned, good coding practices should always be used. ;D
Daneasch Posted July 1, 2017 Author Posted July 1, 2017 Hi Guys, Sorry if haven't been giving enough info. I'll try to be more precise: My Project goal is to build a device witch reads a smart meter and 5 standard meters, and combine all the data in some nice graphics on a web-page on the Intranet Opi lite runs Armbian jessie 3.4.113-sun8i Micro sd card Samsung evo 16Gb adapter is 2 amps from ebay has attached to Gpio a transistor and some resistors and a led with a wire to connect to the R11 port at the smart meter runs a bash script in a terminal,witch calls a Python script to read the meter, and put the data in a text file, then the bash script reads the text file and transfers the data to an RRD database that turns this data in to some diagrams witch are then displayed on a web page through a PHP script. Web page is running on the same orange with Apache. here a screen shot of a piece of the page. So I know the pi stopped on a Saturday two weeks ago. It would be possible to get a more precise datestamp out of the database, if needed. Because the above was all scraped together from the net, and all looks a bit overly complicated I decided to start over and do as much as possible in Python. Which is the OpiOne setup: Runs Armbian 5.25 Jessie 3.4.113-sun8i micro SD Samsung Evo 16Gb Adapter 2,5 amps from Motorola Attached on Gpio a breadboard with four leds and five buttons. no other hardware attached. This runs a Python script in a terminal witch for now: runs a loop of exactly 20 seconds and then blinks some leds, and puts a timestamp in the terminal ( the piece of code witch blinks the leds will later be replaced with the code to read the smart meter, that is why I need the loop of 20 seconds, because the meters gives data every 20 seconds) In the mean time the code checks if the buttons are pushed, and count the presses. ( this will be used to attach some other meters with S0 port's) so I can have a database of all my power and water use) I was now running the new Python code to see if the 20 seconds where precise enough, and to see if i could read the pulses reliably. rotation is now set to 10 seconds, and the buttons are read 390 times, giving me a window of 2,5 microseconds. This should bee fine since the pulses from the meters are going to be 100microseconds, so I shouldn't miss any. The problem I'm now facing is that after a while the Pi stops responding, and I can't log in any more. the lite stops writing the to the web page, and the one stops blinking it's led so I assumed they have the same problem. the only way I know so far to get in again is to unplug the power cord. and plug in again. then they boot again and everything works again. I pulled the logs as Igor suggested, but that could only be done after a reboot. log for the one is at: http://sprunge.us/jWXI and for the Lite at http://sprunge.us/SEhB I hope you can make anything out of them. Are these logs deleted when restarted? Looks like it, since today's date is at the top. can I set the log so that it writes to the SD card? As for the questions from StuxNet The lite has its own power outlet and adapter and lies undisturbed on a shelf next to the smart meter. the One lies on my desk on the other side of the house on a different power group and outlet with also an adapter for it's self. I don't have a drawing of the led's / buttons but since that's all working i think is very unlikely to cause the problems. But if you really want it I could draw it out for you. I included the python code so you can see wich pins i used. the button are al connected to ground through an 20 ohms resistor and use the internal pulup resistors. the leds are protected with an 220 ohms resistor. I've included my Python code so far: blinkg.py
Igor Posted July 1, 2017 Posted July 1, 2017 One thing that was noticed lately is this strange issue related to (presumably) Network Manager. Are you using NM and can you rule this out by using ifup or some USB wired adapter?
Daneasch Posted July 1, 2017 Author Posted July 1, 2017 Hi Igor, I've now connected the OpiOne through an USB to Ethernet adapter. I presume Network Manager is a piece of the Armbian installation? I'm not sure if it's used or how to see this.
Igor Posted July 1, 2017 Posted July 1, 2017 34 minutes ago, Daneasch said: I'm not sure if it's used or how to see this. How did you connect to wireless network in first place? This way? Than you use network manager. Or this way: systemctl status network-manager.service
Daneasch Posted July 1, 2017 Author Posted July 1, 2017 When setting up the wifi on the OpiLite I connected through the USB to Network cable, and then on the desktop clicked the Icon for wifi and filled in the fields. After that I had wifi. On the OpiOne I only used the Network plug to hook it up, witch worked out of the box.
Igor Posted July 1, 2017 Posted July 1, 2017 33 minutes ago, Daneasch said: then on the desktop clicked the Icon for wifi and filled in the fields That is network manager, issues are most likely related and need to be investigated when possible. No quick fix - only workarounds are possible at this stage. 1
Igor Posted July 2, 2017 Posted July 2, 2017 1 hour ago, Daneasch said: OK, how can I help? 1. Study background: https://askubuntu.com/questions/1786/what-is-the-difference-between-network-manager-and-ifconfig-ifup-etc 2. Google for more details to see if others are having this problem and if it's already solved in recent NM. Our Ubuntu packaged version is not latest, with Jessie is even more old. 3. Make sure that classic way of making a network connection works (with disabled Network Manager) ... this means some extensive testing. 4. Identify a problem and apply a fix or workaround for Network manager (or whatever causes this problem) to built system that we all would benefit from it. Something like that I would do ... Anything that helps understanding roots of the problem is valuable.
jjrojo Posted July 2, 2017 Posted July 2, 2017 I'd like make a question, in server images, why not remove network-manager and dependencies? i do it without any problem. I setup /etc/network/interfaces with my network settings params, i think more easy, less troubles and less resources hungry. Thank you, this is only my opinion.
Igor Posted July 2, 2017 Posted July 2, 2017 19 minutes ago, jjrojo said: I'd like make a question, in server images, why not remove network-manager and dependencies? It's an option, but in this case armbian-config will need some rework, since it's dependent from network-manager ... 21 minutes ago, jjrojo said: I setup /etc/network/interfaces with my network settings params, i think more easy, less troubles and less resources hungry. Thank you, this is only my opinion. From "windows users" / newbies perspective, those which doesn't want to learn, network manager is better, since all they (need to) know is what is their wireless SSID and password, which is entered in interactive process. Hard to beat that experience. In desktop mode we had problems with alternative (wicd) too so sticking to one and fixing the problems is probably the way to go. I would not rush and change everything before identifying this particular problem, than think again what kind of networking config is overall the best, and than start moving into one or another direction.
jjrojo Posted July 2, 2017 Posted July 2, 2017 Well, for me, after try setup wifi on file armbian conf for fisrt start, i gave up and ended setup on the old way. Never i connected for armbian firt file setup (maybe me problem), network manager was a nightmare for me and nmtui, ufff, this is more easy? Maybe, in file armbian_first_run.txt, only copy network conf to /etc/network/interfaces. But i don't know, i'm not expert, also not coming from windows, i come here after raspberry and after search for better scb, i bought orange pi plus 2e and after 1 week fighting with the board, is running more or less like i expected. I'll post my setup, finded searching here and there. armbian jessie, after running very well, after some days ssh is refused my connection, re setup from scrach some times, and nothing. Actually running ubuntu server with mainline kernel. Plex running in ram with anything-sync-daemon for watching recorded 5 minutes cameras videos log2ram net in wifi wlan0 over 0,2% droppet packages with the actual setup Recording 6 ip cameras over wifi (7 fps, 1,5mb rate), 3 over proftpd and 3 storing with ffmpeg bash script. mmcblk0p1 commit=60 attached sda seagate expansion drive 5400 with btrfs compress=lzo, commit=60 (+- 30mb dd 1gb test, over 22mb rsync 1g test) sda queue to deadline sysctl -w net.core.rmem_max=26214400 sysctl -w net.core.wmem_max=26214400 sysctl -w net.core.rmem_default=514400 sysctl -w net.core.wmem_default=514400 sysctl -w net.ipv4.tcp_rmem='10240 87380 26214400' sysctl -w net.ipv4.tcp_wmem='10240 87380 26214400' sysctl -w net.ipv4.udp_rmem_min=131072 sysctl -w net.ipv4.udp_wmem_min=131072 sysctl -w net.ipv4.tcp_timestamps=0 sysctl -w net.ipv4.tcp_window_scaling=1 sysctl -w net.ipv4.tcp_sack=1 sysctl -w net.core.optmem_max=65535 sudo sysctl -w vm.swappiness=0 sudo sysctl -w vm.vfs_cache_pressure=50 sudo sysctl -w vm.dirty_background_bytes=16777216 sudo sysctl -w vm.dirty_bytes=50331648 # Increase size of file handles and inode cache fs.file-max = 2097152 # Do not cache metrics on closing connections net.ipv4.tcp_no_metrics_save = 1 # recommended default congestion control is htcp net.ipv4.tcp_congestion_control=htcp # net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_slow_start_after_idle = 0 I'll post and move this post and new thread when time leave me., thak you.
Daneasch Posted July 2, 2017 Author Posted July 2, 2017 I've been searching the web, a found out I'm indeed using a very old Network Manager 0.9.10.0 while there is already 1.8.0. there are a lot of bugs fixed, but the fixes are most of the time not documented, so it's hard to figure out if my problem is one of the fixed ones. I've also been looking around on the OpiLite, and found in Var/log/ a log file with indeed reference to Network-manager on the day of the last crash. Errors itself don't mean much to me but maybe someone can shed some light. Logfile included below. Since it seems I can't install a different version of Network Manager in this distro , and i think it's a bit overkill to build a totally new image just for this problem I'm now going to totally remove Network Manager and setup the network manually. If it then stays stable I will know for sure the problems come from the Network manager user.log.1
Daneasch Posted July 23, 2017 Author Posted July 23, 2017 I've done some more digging around in the log files, a bit slow because my cash provider has been a bit demanding lately, and it seems Stuxnet was right, the problems are not the same. in the logs on the OpiOne there are messages from Blueman before the crash. Can Blueman be disabled? i have no use for Bluethooth. I've been messing with the power source, plugging my phone in and out on the same power supply, and the crash that resulted in also in references to Blueman. So the problem here seems to be power source. changed it now for a Logitech one, and it's running again for three days stable. see how this turns out. Since the crashes on the OpiLite are month's apart I think the way to go there is to reinstall everything from the beginning, to rule out any problems that have accumulated during crashes and the occasional improper shut-down. I'll give the Ubunto version a try, let's hope this one has a more stable Network Manager. I'll report back here if I have anything to ad.
Recommended Posts