luiz_siam Posted December 22, 2015 Posted December 22, 2015 Hello guys, This is not a question, but a feedback I think may be useful for some. I have a cubieboard 2 server application using Igor's armbian running for 8 months in several places (~40 installations). In my application: The OS in installed on NAND Armbian 3.8 + wheezy + kernel 3.4.107 There is a postgresql instance running, writing approx. 25kB/s average (not constant) Ordered data mode journaling on From time to time I've problems on clients with system corruption (not only the database, but the whole filesystem - sometimes I couldn't event boot the cubieboard anymore, had several kernel panics and more weird stuff like thousands of filesystem errors at once on fsck). Problems usually (but not always) appear after unclean shutdowns, due to power loss. Recently I've decided to give a try and test the system running on the SD card. For my surprise, it is working smoothly without corruptions!. I've now created a read-only partition for the system and isolated the database, logs and other stuff (tomcat, /var/*, among others) on a "data" partition. I've conducted a test here with "pseudo-aleatory" automatic power downs during the system regular operation - I could turn it down (uncleanly) more than 200 times on different moments without problems, even on the database. I'm not sure if moving to the SD card is the key point here (or the ro only partition + armbian 4.7 + jessie + kernel 3.4.110), but as I've seen other threads in other forums questioning the NAND drivers and physical bus routing on cubieboard2, I think it will be nice to share my experience and recommend users with heavier read-write requirements not to use the cubieboard 2 NAND! If any of you had similar (or completely different) experiences, please share - I'm curious to have feedback and maybe reach a definitive answer about the problems I've been experiencing here (which costs a lot of money to us on tech support calls). Now, on SD card, it is rock solid - I couldn't break the system, even trying it hard! Thanks Luiz 1
zador.blood.stained Posted December 22, 2015 Posted December 22, 2015 I don't know what mounting options for rootfs were used in Armbian 3.8, but if they were data=writeback,commit=600, then it may be the main cause for the issues - these are performance optimizations that can cause FS corruption on unclean shutdowns. From ext4 kernel doc When tuning ext3 for best benchmark numbers, it is often worthwhile to try changing the data journaling mode; '-o data=writeback' can be faster for some workloads. (Note however that running mounted with data=writeback can potentially leave stale data exposed in recently written files in case of an unclean shutdown, which could be a security exposure in some situations.) * writeback modeIn data=writeback mode, ext4 does not journal data at all. This mode provides a similar level of journaling as that of XFS, JFS, and ReiserFS in its default mode - metadata journaling. A crash+recovery can cause incorrect data to appear in files which were written shortly efore the crash. This mode will typically provide the best ext4 performance. commit=nrsec (*) Ext4 can be told to sync all its data and metadata every 'nrsec' seconds. The default value is 5 seconds. This means that if you lose your power, you will lose as much as the latest 5 seconds of work (your filesystem will not be damaged though, thanks to the journaling). This default value (or any low value) will hurt performance, but it's good for data-safety. Setting it to 0 will have the same effect as leaving it at the default (5 seconds). Setting it to very large values will improve performance. 1
luiz_siam Posted December 22, 2015 Author Posted December 22, 2015 Thanks zador.blood.stained! The fstab had writeback option set on root partition - however the partition was being mounted with ordered data mode (see this topic I've created myself: http://forum.armbian.com/index.php/topic/400-filesystem-journal-dataordered-being-used-on-nand-instead-of-writeback/). I've read the kernel page you mentioned like three times and tried many tweaks without success at that time. Important observation: now I'm using writeback data mode on SD (the setup I mentioned) and no problems are happening I think, but not so sure about this, the writeback may result in some light file corruptions, and not the whole filesystem going bad like I had (and this is the reason I think the problem is the NAND device). Other factor that contributes to my suspect is that I've some devices that never show problems (this stat is biased because most of them are backed up by no-breaks) and have some that crash even during normal operation (happened three or four times). In some cases, the device was running and having some weird behavior (input/output errors or kernel panics, most of the times) and, after a reboot, they never went up again - in this cases I suspect the filesystem got corrupted due to some NAND problem or even some bits flipped due to bad routing on the board - and the problem only showed up after the reboot, when system files are read again. Regards Well - a lot of information! =) I hope I'm not saying anything stupid - I'm basing my conclusions on the facts I had here and, of course, there are some things I'm not as sure as I wanted
zador.blood.stained Posted December 22, 2015 Posted December 22, 2015 Another thing - in Igor's message in topic you linked I noticed [ 18.399862] EXT4-fs (nand2): Remounting file system with no journal so ignoring journalled data option I checked, current NAND install script disables ext4 journal completely on rootfs, so tweaking /etc/fstab probably wasn't enough. You can check and compare your SD and NAND filesystem options by using tune2fs -l <partition>
zador.blood.stained Posted December 22, 2015 Posted December 22, 2015 I'm not saying that there are no problems in NAND kernel driver, but when I used rootfs on NAND on cubietruck, I don't remember any major filesystem problems even after unclean shutdowns. 1
luiz_siam Posted December 22, 2015 Author Posted December 22, 2015 Hi! Thanks again for your comments zador. My tune2fs output on the nand system (armbian 3.8 on NAND) is: root@siamserver:~# tune2fs -l /dev/nand2 tune2fs 1.42.5 (29-Jul-2012) Filesystem volume name: <none> Last mounted on: / Filesystem UUID: 31cdf62b-f9d8-4571-8df8-5d7be293b5e7 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: unsigned_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 241440 Block count: 964608 Reserved block count: 48230 Free blocks: 218388 Free inodes: 159688 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 235 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8048 Inode blocks per group: 503 Flex block group size: 16 Filesystem created: Fri Oct 30 11:50:19 2015 Last mount time: Thu Dec 31 22:00:06 2009 Last write time: Thu Dec 31 22:00:06 2009 Mount count: 13 Maximum mount count: -1 Last checked: Fri Oct 30 11:50:19 2015 Check interval: 0 (<none>) Lifetime writes: 5808 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 90bab9a4-3bb6-4584-a0b9-3c041c01265c Journal backup: inode blocks On the SD card system (data partition on armbian 4.7 legacy kernel): root@siamserver(ro):~# tune2fs -l /dev/mmcblk0p2 tune2fs 1.42.12 (29-Aug-2014) Filesystem volume name: siamdb Last mounted on: /data Filesystem UUID: d1d3e3fd-3861-45ab-a3f5-a746df6b1681 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 256512 Block count: 1048576 Reserved block count: 52428 Free blocks: 683870 Free inodes: 247434 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 281 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8016 Inode blocks per group: 501 Flex block group size: 16 Filesystem created: Tue Dec 8 09:24:13 2015 Last mount time: Fri Dec 18 18:29:27 2015 Last write time: Fri Dec 18 18:29:27 2015 Mount count: 839 Maximum mount count: -1 Last checked: Fri Dec 11 17:10:32 2015 Check interval: 0 (<none>) Lifetime writes: 1554 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 4efda314-91f3-4745-b9ee-1893be445276 Journal backup: inode blocks I think both have journal enabled, so this won't explain the problem either - or am I missing something! regards
zador.blood.stained Posted December 22, 2015 Posted December 22, 2015 There are no differences in fs options, so I guess it could be NAND driver implementation problems for use case with constant reads and writes.
eincube Posted December 23, 2015 Posted December 23, 2015 My experience with cubietruck is that after 6 months of fulltime use with the OS installed into nand resulting in wrong file's md5, continuous db corruption and filesystem corruption (continuous = everyday) without any crash or cold reset that could explain those errors.... after the 4th complete OS install from scratch, for me the nand is a DEAD END. Since that choice, my CT is running perfectly using only the sd card since a year. 1
berturion Posted January 11, 2016 Posted January 11, 2016 Hello, I agree with you folks. My NAND also on cubieboard2 is not reliable. I had so many file corruptions and re-installed so many times so many debian on it without being able to have a stable OS. I will never install any OS again on NAND. SD Card or SATA is a much better choice. 1
luiz_siam Posted January 11, 2016 Author Posted January 11, 2016 Thanks guys for your feedback. I've sent an e-mail to cubietech in order to know if they have any information regarding this issue, and this is what I got from them: Hi, LuizPlease tell us which model are you working on? Cubieboard2 or Cubieboard3?The Nand Flash driver is closed source driver from Allwinner, and we have no way tosolve this problem.Please also refer to this post:http://cubieboard.org/2014/08/12/how-to-choose-the-storage-media-in-cubieboard/If you want to apply Cubieboards to commercial applications, I advise you select theTSD version or EMMC version. Now we have TSD version in stock. I can't check for my NAND version right now and haven't replied him yet, but as as soon as I have news I'll let you know. Thanks
brubetinha Posted February 2, 2016 Posted February 2, 2016 Hello guys, This is not a question, but a feedback I think may be useful for some. I have a cubieboard 2 server application using Igor's armbian running for 8 months in several places (~40 installations). In my application: The OS in installed on NAND Armbian 3.8 + wheezy + kernel 3.4.107 There is a postgresql instance running, writing approx. 25kB/s average (not constant) Ordered data mode journaling on From time to time I've problems on clients with system corruption (not only the database, but the whole filesystem - sometimes I couldn't event boot the cubieboard anymore, had several kernel panics and more weird stuff like thousands of filesystem errors at once on fsck). Problems usually (but not always) appear after unclean shutdowns, due to power loss. Recently I've decided to give a try and test the system running on the SD card. For my surprise, it is working smoothly without corruptions!. I've now created a read-only partition for the system and isolated the database, logs and other stuff (tomcat, /var/*, among others) on a "data" partition. I've conducted a test here with "pseudo-aleatory" automatic power downs during the system regular operation - I could turn it down (uncleanly) more than 200 times on different moments without problems, even on the database. I'm not sure if moving to the SD card is the key point here (or the ro only partition + armbian 4.7 + jessie + kernel 3.4.110), but as I've seen other threads in other forums questioning the NAND drivers and physical bus routing on cubieboard2, I think it will be nice to share my experience and recommend users with heavier read-write requirements not to use the cubieboard 2 NAND! If any of you had similar (or completely different) experiences, please share - I'm curious to have feedback and maybe reach a definitive answer about the problems I've been experiencing here (which costs a lot of money to us on tech support calls). Now, on SD card, it is rock solid - I couldn't break the system, even trying it hard! Thanks Luiz Hello Luiz! Thank you for sharing, we were facing the same problems here and recently we decide to change to SD card. Our idea is to create 3 partitions: 1 read only, 1 "data" partition and 1 backup partition, but we are concern about the maximum write cycles. So we thought about using a different filesystem, like JFFS2 or UBIFS rather than EXT2/3/4, even we knowing that they aren't meant for block devices. Did you try somenthing like this ? What do you think about it ? Thanks Bruna 1
luiz_siam Posted February 2, 2016 Author Posted February 2, 2016 Hi Bruna, I'm using EXT4. According to my research (but I'm not an expert to be completely sure) good quality cards already do wear leveling internally and so it is kind of hard to wear out the memory. See this topic, for example: http://electronics.stackexchange.com/questions/27619/is-it-true-that-a-sd-mmc-card-does-wear-levelling-with-its-own-controller The hard part is to make sure you're buying authentic SD cards... here in Brazil I'm having some difficulties - the last approach we took was to contact Sandisk directly in order to have supplier references and increase our chances to get original Sandisk or Kingstom ones. I've checked few we bought on Brazilian's ebay like site and they were not original (I could corrupt them in the first power loss try or even before!) This page helped me a lot on testing SD cards: http://www.bunniestudios.com/blog/?page_id=1022(but there is no 100% method). Regards Luiz 1
brubetinha Posted February 29, 2016 Posted February 29, 2016 Hi Luiz! We are testing an Armbian image with ext4 using a good quality card (panasonic) but we are still facing a lot of problems. Our writing tests (local database mainly) don't pass 48h before the sd cards start to present problems and after some days they stop working completely. Have you found a solution to your problems ? Do you have an advice for us ? Regards Bruna
pbg Posted March 18, 2016 Posted March 18, 2016 I had similar problems with cubieboard2, but it turned out they were not nand problems, but memory corruption problems instead. It seems that the memory configuration used when booting from nand is too aggressive for some boards. There are two ways to check: * boot from nand and run memtester for a few days, and check por memory errors * when booting from sdcard, it is possible to edit script.fex to tune the memory frequency, so a nice test is to generate a sdcard with a modified script.bin but using rootfs on nand, and run the usual load to see if it is more stable I am not sure if it is possible to tune memory params when booting from nand. Anyone knows if that is possible?
luiz_siam Posted March 31, 2016 Author Posted March 31, 2016 Hi, I'm now running armbian on SD cards on approx. 40 cubieboards installed on customers, running a tomcat application writing and reading from a postgresql database frequently without further problems. Some of the systems are based on the legacy kernel (approx. 80%) and the most recent ones are based on the vanilla kernel. I'm not having any further filesystem problems on either case. I'm not investigating the issue any further as it was solved by moving the system to the microSD card, so sorry if I don't have any further information on how to solve NAND problems! I encourage anyone with the same problems as described in this topic to try running the system on microSD cards. brubetinha, I think maybe you are having other issues causing the memory corruption problems! Thank you all for the information shared! Hi Luiz! We are testing an Armbian image with ext4 using a good quality card (panasonic) but we are still facing a lot of problems. Our writing tests (local database mainly) don't pass 48h before the sd cards start to present problems and after some days they stop working completely. Have you found a solution to your problems ? Do you have an advice for us ? Regards Bruna I had similar problems with cubieboard2, but it turned out they were not nand problems, but memory corruption problems instead. It seems that the memory configuration used when booting from nand is too aggressive for some boards. There are two ways to check: * boot from nand and run memtester for a few days, and check por memory errors * when booting from sdcard, it is possible to edit script.fex to tune the memory frequency, so a nice test is to generate a sdcard with a modified script.bin but using rootfs on nand, and run the usual load to see if it is more stable I am not sure if it is possible to tune memory params when booting from nand. Anyone knows if that is possible?
Recommended Posts