Jump to content

Cubieboard2 NAND not reliable


luiz_siam

Recommended Posts

Hello guys,

 

This is not a question, but a feedback I think may be useful for some.

 

I have a cubieboard 2 server application using Igor's armbian running for 8 months in several places (~40 installations). In my application:

 

  1. The OS in installed on NAND
  2. Armbian 3.8 + wheezy + kernel 3.4.107
  3. There is a postgresql instance running, writing approx. 25kB/s average (not constant)
  4. Ordered data mode journaling on

From time to time I've problems on clients with system corruption (not only the database, but the whole filesystem - sometimes I couldn't event boot the cubieboard anymore, had several kernel panics and more weird stuff like thousands of filesystem errors at once on fsck). Problems usually (but not always) appear after unclean shutdowns, due to power loss.

 

Recently I've decided to give a try and test the system running on the SD card. For my surprise, it is working smoothly without corruptions!. I've now created a read-only partition for the system and isolated the database, logs and other stuff (tomcat, /var/*, among others) on a "data" partition. I've conducted a test here with "pseudo-aleatory" automatic power downs during the system regular operation - I could turn it down (uncleanly) more than 200 times on different moments without problems, even on the database.

 

I'm not sure if moving to the SD card is the key point here (or the ro only partition + armbian 4.7 + jessie + kernel 3.4.110), but as I've seen other threads in other forums questioning the NAND drivers and physical bus routing on cubieboard2, I think it will be nice to share my experience and recommend users with heavier read-write requirements not to use the cubieboard 2 NAND!

 

If any of you had similar (or completely different) experiences, please share - I'm curious to have feedback and maybe reach a definitive answer about the problems I've been experiencing here (which costs a lot of money to us on tech support calls). Now, on SD card, it is rock solid - I couldn't break the system, even trying it hard!

 

Thanks

Luiz

 

 

 

 

Link to comment
Share on other sites

I don't know what mounting options for rootfs were used in Armbian 3.8, but if they were data=writeback,commit=600, then it may be the main cause for the issues - these are performance optimizations that can cause FS corruption on unclean shutdowns.

From ext4 kernel doc

When tuning ext3 for best benchmark numbers, it is often worthwhile to try changing the data journaling mode; '-o data=writeback' can be faster for some workloads. (Note however that running mounted with data=writeback can potentially leave stale data exposed in recently written files in case of an unclean shutdown, which could be a security exposure in some situations.)

* writeback mode

In data=writeback mode, ext4 does not journal data at all. This mode provides a similar level of journaling as that of XFS, JFS, and ReiserFS in its default mode - metadata journaling. A crash+recovery can cause incorrect data to appear in files which were written shortly efore the crash. This mode will typically provide the best ext4 performance.

 

commit=nrsec    (*)    Ext4 can be told to sync all its data and metadata every 'nrsec' seconds. The default value is 5 seconds. This means that if you lose your power, you will lose 

            as much as the latest 5 seconds of work (your filesystem will not be damaged though, thanks to the journaling). This default value (or any low value) will hurt performance, but it's good for data-safety.
            Setting it to 0 will have the same effect as leaving it at the default (5 seconds). Setting it to very large values will improve performance.
Link to comment
Share on other sites

Thanks zador.blood.stained!

 

The fstab had writeback option set on root partition - however the partition was being mounted with ordered data mode (see this topic I've created myself: http://forum.armbian.com/index.php/topic/400-filesystem-journal-dataordered-being-used-on-nand-instead-of-writeback/).

 

I've read the kernel page you mentioned like three times and tried many tweaks without success at that time.

 

Important observation: now I'm using writeback data mode on SD (the setup I mentioned) and no problems are happening :) I think, but not so sure about this, the writeback may result in some light file corruptions, and not the whole filesystem going bad like I had (and this is the reason I think the problem is the NAND device). Other factor that contributes to my suspect is that I've some devices that never show problems (this stat is biased because most of them are backed up by no-breaks) and have some that crash even during normal operation (happened three or four times). In some cases, the device was running and having some weird behavior (input/output errors or kernel panics, most of the times) and, after a reboot, they never went up again - in this cases I suspect the filesystem got corrupted due to some NAND problem or even some bits flipped due to bad routing on the board - and the problem only showed up after the reboot, when system files are read again.

 

Regards

 

Well - a lot of information! =) I hope I'm not saying anything stupid - I'm basing my conclusions on the facts I had here and, of course, there are some things I'm not as sure as I wanted

Link to comment
Share on other sites

Another thing - in Igor's message in topic you linked I noticed

[   18.399862] EXT4-fs (nand2): Remounting file system with no journal so ignoring journalled data option

I checked, current NAND install script disables ext4 journal completely on rootfs, so tweaking /etc/fstab probably wasn't enough. You can check and compare your SD and NAND filesystem options by using

tune2fs -l <partition>
Link to comment
Share on other sites

Hi!

 

Thanks again for your comments zador.

 

My tune2fs output on the nand system (armbian 3.8 on NAND) is:

root@siamserver:~# tune2fs -l /dev/nand2
tune2fs 1.42.5 (29-Jul-2012)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          31cdf62b-f9d8-4571-8df8-5d7be293b5e7
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         unsigned_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              241440
Block count:              964608
Reserved block count:     48230
Free blocks:              218388
Free inodes:              159688
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      235
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8048
Inode blocks per group:   503
Flex block group size:    16
Filesystem created:       Fri Oct 30 11:50:19 2015
Last mount time:          Thu Dec 31 22:00:06 2009
Last write time:          Thu Dec 31 22:00:06 2009
Mount count:              13
Maximum mount count:      -1
Last checked:             Fri Oct 30 11:50:19 2015
Check interval:           0 (<none>)
Lifetime writes:          5808 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      90bab9a4-3bb6-4584-a0b9-3c041c01265c
Journal backup:           inode blocks

On the SD card system (data partition on armbian 4.7 legacy kernel):

root@siamserver(ro):~# tune2fs -l /dev/mmcblk0p2
tune2fs 1.42.12 (29-Aug-2014)
Filesystem volume name:   siamdb
Last mounted on:          /data
Filesystem UUID:          d1d3e3fd-3861-45ab-a3f5-a746df6b1681
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              256512
Block count:              1048576
Reserved block count:     52428
Free blocks:              683870
Free inodes:              247434
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      281
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8016
Inode blocks per group:   501
Flex block group size:    16
Filesystem created:       Tue Dec  8 09:24:13 2015
Last mount time:          Fri Dec 18 18:29:27 2015
Last write time:          Fri Dec 18 18:29:27 2015
Mount count:              839
Maximum mount count:      -1
Last checked:             Fri Dec 11 17:10:32 2015
Check interval:           0 (<none>)
Lifetime writes:          1554 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      4efda314-91f3-4745-b9ee-1893be445276
Journal backup:           inode blocks

I think both have journal enabled, so this won't explain the problem either - or am I missing something!

 

regards

Link to comment
Share on other sites

My experience with cubietruck is that after 6 months of fulltime use with the OS installed into nand resulting in wrong file's md5, continuous db corruption and filesystem corruption (continuous = everyday) without any crash or cold reset that could explain those errors.... after the 4th complete OS install from scratch, for me the nand is a DEAD END.

Since that choice, my CT is running perfectly using only the sd card since a year.

Link to comment
Share on other sites

Hello, I agree with you folks. My NAND also on cubieboard2 is not reliable. I had so many file corruptions and re-installed so many times so many debian on it without being able to have a stable OS. I will never install any OS again on NAND.

SD Card or SATA is a much better choice.

Link to comment
Share on other sites

Thanks guys for your feedback. I've sent an e-mail to cubietech in order to know if they have any information regarding this issue, and this is what I got from them:

 

Hi, Luiz
Please tell us which model are you working on? Cubieboard2 or Cubieboard3?
The Nand Flash driver is closed source driver from Allwinner, and we have no way to
solve this problem.
Please also refer to this post:
http://cubieboard.org/2014/08/12/how-to-choose-the-storage-media-in-cubieboard/
If you want to apply Cubieboards to commercial applications, I advise you select the
TSD version or EMMC version. Now we have TSD version in stock. 

 

I can't check for my NAND version right now and haven't replied him yet, but as as soon as I have news I'll let you know.

 

Thanks

Link to comment
Share on other sites

Hello guys,

 

This is not a question, but a feedback I think may be useful for some.

 

I have a cubieboard 2 server application using Igor's armbian running for 8 months in several places (~40 installations). In my application:

 

  1. The OS in installed on NAND
  2. Armbian 3.8 + wheezy + kernel 3.4.107
  3. There is a postgresql instance running, writing approx. 25kB/s average (not constant)
  4. Ordered data mode journaling on

From time to time I've problems on clients with system corruption (not only the database, but the whole filesystem - sometimes I couldn't event boot the cubieboard anymore, had several kernel panics and more weird stuff like thousands of filesystem errors at once on fsck). Problems usually (but not always) appear after unclean shutdowns, due to power loss.

 

Recently I've decided to give a try and test the system running on the SD card. For my surprise, it is working smoothly without corruptions!. I've now created a read-only partition for the system and isolated the database, logs and other stuff (tomcat, /var/*, among others) on a "data" partition. I've conducted a test here with "pseudo-aleatory" automatic power downs during the system regular operation - I could turn it down (uncleanly) more than 200 times on different moments without problems, even on the database.

 

I'm not sure if moving to the SD card is the key point here (or the ro only partition + armbian 4.7 + jessie + kernel 3.4.110), but as I've seen other threads in other forums questioning the NAND drivers and physical bus routing on cubieboard2, I think it will be nice to share my experience and recommend users with heavier read-write requirements not to use the cubieboard 2 NAND!

 

If any of you had similar (or completely different) experiences, please share - I'm curious to have feedback and maybe reach a definitive answer about the problems I've been experiencing here (which costs a lot of money to us on tech support calls). Now, on SD card, it is rock solid - I couldn't break the system, even trying it hard!

 

Thanks

Luiz

 

Hello Luiz!

 

Thank you for sharing, we were facing the same problems here and recently we decide to change to SD card. Our idea is to create 3 partitions: 1 read only, 1 "data" partition and 1 backup partition, but we are concern about the maximum write cycles. So we thought about using a different filesystem, like JFFS2 or UBIFS rather than EXT2/3/4, even we knowing that they aren't meant for block devices. Did you try somenthing like this ? What do you think about it ?

 

Thanks

 

Bruna

Link to comment
Share on other sites

Hi Bruna,

 

I'm using EXT4. According to my research (but I'm not an expert to be completely sure) good quality cards already do wear leveling internally and so it is kind of hard to wear out the memory. 

 

See this topic, for example: http://electronics.stackexchange.com/questions/27619/is-it-true-that-a-sd-mmc-card-does-wear-levelling-with-its-own-controller

 

The hard part is to make sure you're buying authentic SD cards... here in Brazil I'm having some difficulties - the last approach we took was to contact Sandisk directly in order to have supplier references and increase our chances to get original Sandisk or Kingstom ones. I've checked few we bought on Brazilian's ebay like site and they were not original (I could corrupt them in the first power loss try or even before!) :( This page helped me a lot on testing SD cards: http://www.bunniestudios.com/blog/?page_id=1022(but there is no 100% method).

 

Regards

Luiz

Link to comment
Share on other sites

Hi Luiz!

 

We are testing an Armbian image with ext4 using a good quality card (panasonic) but we are still facing a lot of problems. Our writing tests (local database mainly) don't pass 48h before the sd cards start to present problems and after some days they stop working completely.

Have you found a solution to your problems ? Do you have an advice for us ?

 

Regards

 

Bruna

Link to comment
Share on other sites

I had similar problems with cubieboard2, but it turned out they were not nand problems, but memory corruption problems instead.

 

It seems that the memory configuration used when booting from nand is too aggressive for some boards. There are two ways to check:

 * boot from nand and run memtester for a few days, and check por memory errors

 * when booting from sdcard, it is possible to edit script.fex to tune the memory frequency, so a nice test is to generate a sdcard with a modified script.bin but using rootfs on nand, and run the usual load to see if it is more stable

 

I am not sure if it is possible to tune memory params when booting from nand. Anyone knows if that is possible?

Link to comment
Share on other sites

Hi,

 

I'm now running armbian on SD cards on approx. 40 cubieboards installed on customers, running a tomcat application writing and reading from a postgresql database frequently without further problems.

 

Some of the systems are based on the legacy kernel (approx. 80%) and the most recent ones are based on the vanilla kernel. I'm not having any further filesystem problems on either case.

 

I'm not investigating the issue any further as it was solved by moving the system to the microSD card, so sorry if I don't have any further information on how to solve NAND problems! I encourage anyone with the same problems as described in this topic to try running the system on microSD cards.

 

brubetinha, I think maybe you are having other issues causing the memory corruption problems!

 

Thank you all for the information shared!

 

Hi Luiz!

 

We are testing an Armbian image with ext4 using a good quality card (panasonic) but we are still facing a lot of problems. Our writing tests (local database mainly) don't pass 48h before the sd cards start to present problems and after some days they stop working completely.

Have you found a solution to your problems ? Do you have an advice for us ?

 

Regards

 

Bruna

 

 

I had similar problems with cubieboard2, but it turned out they were not nand problems, but memory corruption problems instead.

 

It seems that the memory configuration used when booting from nand is too aggressive for some boards. There are two ways to check:

 * boot from nand and run memtester for a few days, and check por memory errors

 * when booting from sdcard, it is possible to edit script.fex to tune the memory frequency, so a nice test is to generate a sdcard with a modified script.bin but using rootfs on nand, and run the usual load to see if it is more stable

 

I am not sure if it is possible to tune memory params when booting from nand. Anyone knows if that is possible?

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines