1 1
Kosmatik

SMART Issues with 4.9.y XU4 and OMV

Recommended Posts

Hello Everyone,

 

As @tkaiser is probably aware from the hardkernel forum, I'm having issues with USB/SMART and OMV.

 

Basically, USB resets whenever SMART data is pulled from OMV's web interface, it works fine if I manually do it in shell.

 

Here is some info.

 

http://sprunge.us/HdbK

 

I was proposed to update smartmontools to 6.5, but there are library dependencies that are missing. 

 

If any of you fine folks have any ideas as to what could be causing it, I would be grateful.

 

Thank you

Share this post


Link to post
Share on other sites

So to start dealing with your problem in a somewhat logical way:

  • We're talking about JMS561 and not ASM1153E here, right?
  • Since calling smartctl -d sat manually for this Cloudshell 2 gimmick provides 1) the data but 2) doesn't trigger USB resets it seems the way OMV does SMART queries could be responsible (then this is clearly the wrong forum since OMV people could tell without doing research first)
  • Hardkernel guys unfortunately don't give a sh*t about upstream support for their products (no reports of problematic UAS bridge chips to linux USB kernel maintainers, same with those shitty Seagate enclosures and evaluating whether Exynos 5422 USB3 host controller needs quirks). This means: no support in smartmontools for JMS561.

What you could do:

  • Add JMS561 to /var/lib/smartmontools/drivedb/drivedb.h as outlined in http://forum.openmediavault.org/index.php/Thread/17855-Building-OMV-automatically-for-a-bunch-of-different-ARM-dev-boards/?postID=145278#post145278 (possible result: OMV trying 'smartctl' first without any of the '-d' modes avoiding the USB resets here)
  • Try to figure out how OMV does SMART queries (mv smartctl binary to smartctl.orig, replace smartctl with a script that logs calling parameters)
  • Check smartctl manual page for possible -d values and try them out one after another (since my guess is that OMV will exactly do that when dealing with devices that lack support as it's obviously the case with JMS561 here, so maybe the USB resets happen since OMV tries 'smartctl -d jmicron' or whatever)

BTW: I can't provide more help here since I dropped the idea of using ODROID-XU4 as NAS already (especially when combined with the Cloudshell gimmicks). For me a NAS has to be reliable and the list of potential problems with ODROID-XU4 has became too long in the meantime (and to be honest: Hardkernel's role here has been a bit too underwhelming)

 

Share this post


Link to post
Share on other sites

Ok, I figured out what's causing it.

 

The bridge is JMS561.

OMV pulls this for all drives when the Devices tab is clicked.

smartctl -x /dev/sda -d sat

 

Running this manually I can also crash it.

 

Specifically this:

 

smartctl -l devstat /dev/sda

crashes here:

root@openmediavault:~# smartctl -l devstat /dev/sda -d sat,12
smartctl 6.4 2014-10-07 r4002 [armv7l-linux-4.9.30-odroidxu4] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands require SAT ATA PASS-THROUGH (16)
Read GP Log Directory failed

Device Statistics (SMART Log 0x04)
Page Offset Size         Value  Description
ATA_SMART_READ_LOG failed: Connection timed out
Read Device Statistics pages 0-7 failed

It still appends -d sat even if the bridge is defined in the .h file.

 

What would be the (potential) fix? Update of smartmontools?

 

EDIT: Hard drive issues probably.

 

Running the same command on /dev/sdb does not cause the crash.

Share this post


Link to post
Share on other sites
3 hours ago, Kosmatik said:

What would be the (potential) fix?

 

Avoiding broken hardware? In which of the 4 possible modes do you operate your Cloudshell 2? Is a firmware update available?

Share this post


Link to post
Share on other sites

Cloudshell2 is in JBOD mode.  I've posted in the hardkernel forums as this looks like a hardware issue as this happens when the hard drives are swapped and also in Ubuntu.

Share this post


Link to post
Share on other sites
26 minutes ago, Kosmatik said:

I've posted in the hardkernel forums as this looks like a hardware issue as this happens when the hard drives are swapped

 

Yes, I added this already as problem N° 12 wrt 'ODROID XU4 and Cloudshell' yesterday: https://forum.armbian.com/index.php?/topic/3953-preview-generate-omv-images-for-sbc-with-armbian/&do=findComment&comment=32340

 

Problem N° 11 contains a link to an explanation: http://forum.openmediavault.org/index.php/Thread/17855-Building-OMV-automatically-for-a-bunch-of-different-ARM-dev-boards/?postID=144752#post144752 (JMS561 is a pretty stupid choice for accessing disks anyway and obviously contains various SMART related bugs, this one here being related to broken SAT support). I really hope Hardkernel stops selling/advertising this strange Cloudshell 2 thingie and starts to address all the USB related problems soon.

 

 

 

Share this post


Link to post
Share on other sites

Since I'm currently cleaning a bit up sorting disks and enclosures and since I came accross the SD card hosting OMV for XU4 a final try:

 

This is XU4 with an JMS567 enclosure with a Samsung PM851 SSD (no USB resets according to dmesg, it's simply the SSD not supporting the requested feature):

root@odroidxu4:~# smartctl -l devstat /dev/sda -d sat
smartctl 6.4 2014-10-07 r4002 [armv7l-linux-4.9.28-odroidxu4] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

Device Statistics (GP/SMART Log 0x04) not supported

This is the same enclosure now with an Intel 540 inside:

root@odroidxu4:~# smartctl -l devstat /dev/sda -d sat
smartctl 6.4 2014-10-07 r4002 [armv7l-linux-4.9.28-odroidxu4] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

Device Statistics (GP Log 0x04)
Page Offset Size         Value  Description
  1  =====  =                =  == General Statistics (rev 2) ==
  1  0x008  4               52  Lifetime Power-On Resets
  1  0x010  4               19  Power-on Hours
  1  0x018  6       1631780991  Logical Sectors Written
  1  0x020  6         20156628  Number of Write Commands
  1  0x028  6       1956034441  Logical Sectors Read
  1  0x030  6         24617479  Number of Read Commands
  2  =====  =                =  == Free-Fall Statistics (empty) ==
  3  =====  =                =  == Rotating Media Statistics (empty) ==
  4  =====  =                =  == General Errors Statistics (rev 1) ==
  4  0x008  4                0  Number of Reported Uncorrectable Errors
  4  0x010  4               21  Resets Between Cmd Acceptance and Completion
  5  =====  =                =  == Temperature Statistics (rev 1) ==
  5  0x008  1               28  Current Temperature
  5  0x010  1                -  Average Short Term Temperature
  5  0x018  1                -  Average Long Term Temperature
  5  0x020  1               48  Highest Temperature
  5  0x028  1               36  Lowest Temperature
  5  0x030  1                -  Highest Average Short Term Temperature
  5  0x038  1                -  Lowest Average Short Term Temperature
  5  0x040  1                -  Highest Average Long Term Temperature
  5  0x048  1                -  Lowest Average Long Term Temperature
  5  0x050  4                0  Time in Over-Temperature
  5  0x058  1               85  Specified Maximum Operating Temperature
  5  0x060  4                0  Time in Under-Temperature
  5  0x068  1                0  Specified Minimum Operating Temperature
  6  =====  =                =  == Transport Statistics (rev 1) ==
  6  0x008  4             5796  Number of Hardware Resets
  6  0x018  4                9  Number of Interface CRC Errors
  7  =====  =                =  == Solid State Device Statistics (rev 1) ==
  7  0x008  1                0  Percentage Used Endurance Indicator

Corresponding dmesg output (no USB resets -- full debug output):

[  318.107058] usb 4-1.1: new SuperSpeed USB device number 4 using xhci-hcd
[  318.128705] usb 4-1.1: New USB device found, idVendor=152d, idProduct=3562
[  318.128720] usb 4-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  318.128731] usb 4-1.1: Product: AD TO BE II
[  318.128741] usb 4-1.1: Manufacturer: ADMKIV
[  318.128752] usb 4-1.1: SerialNumber: DB123456789699
[  318.134291] scsi host0: uas
[  318.136477] scsi 0:0:0:0: Direct-Access     ADplus   SuperVer         6302 PQ: 0 ANSI: 6
[  318.198461] sd 0:0:0:0: [sda] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[  318.198478] sd 0:0:0:0: [sda] 4096-byte physical blocks
[  318.198779] sd 0:0:0:0: Attached scsi generic sg0 type 0
[  318.199475] sd 0:0:0:0: [sda] Write Protect is off
[  318.199496] sd 0:0:0:0: [sda] Mode Sense: 53 00 00 08
[  318.199986] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  318.208031]  sda: sda1
[  318.212282] sd 0:0:0:0: [sda] Attached SCSI disk
[  318.461046] EXT2-fs (sda1): (no)user_xattr optionsnot supported
[  318.461056] EXT2-fs (sda1): (no)acl options not supported
[  318.461062] EXT2-fs (sda1): error: couldn't mount because of unsupported optional features (240)

@Kosmatikplease keep in mind that JMS561 in the Cloudshell 2 presents the 2 disks as 2 different USB LUNs and that it f*cks up SMART readouts in every of the 4 modes. Even in JBOD/PM mode output for LUN 0 (/dev/sda) is broken: ATA version wrongly reported is 'ATA/ATAPI-7 (minor revision not indicated)' and SATA version is completely missing as well as other details.

 

Even if JMicron comes up with a firmware fix for this flashing might be an adventure (downloading stuff from 'somewhere on the Internet' as Hardkernel already suggest -- WTF?) and the combination of XU4 with this external JMS561 thingie is still something too fragile to rely on (at least I try to avoid complexity when it's about storing data that has a value)

Share this post


Link to post
Share on other sites

@Kosmatikplease keep in mind wrt https://forum.openmediavault.org/index.php/Thread/17855-Building-OMV-automatically-for-a-bunch-of-different-ARM-dev-boards/?postID=144749#post144749

 

1) @ryecoaaron uses different disks than you (see above for my 2 SSD, one simply doesn't support 'Device Statistics (GP Log 0x04)')

 

2) He reports in PM mode for USB LUN0 (/dev/sda):

SCT Commands not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11) not supported

3) For the same disk model as LUN1 (/dev/sdb):

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

 [ata pass-through(16): 85 09 0e 00 00 00 01 00 11 00 00 00 00 00 2f 00 ]
  scsi_status=0x0, host_status=0x0, driver_status=0x0
  info=0x0  duration=0 milliseconds  resid=0
  Incoming data, len=512 [only first 256 bytes shown]:
 00     00 00 00 00 0a 10 0b 00  01 10 00 00 03 10 00 00                        
 10     04 10 00 00 06 10 00 00  07 10 00 00 00 10 00 00                        
 20     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 30     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 40     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 50     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 60     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 70     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 80     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 90     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 a0     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 b0     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 c0     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 d0     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 e0     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
 f0     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00                        
SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2           11  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

 

In other words: his two idenctical Seagate Barracuda (according to SMART output both are ST1000DM003-1CH162 with firmware CC49 -- maybe they're different and JMS561 simply fakes one disk information?) do NOT support 'Device Statistics' (that's '-l devstat') unlike your Toshiba but it's pretty obvious that SMART output is f*cked up anyway based on disk position inside the Cloudshell 2 gimmick.

Share this post


Link to post
Share on other sites

@tkaiser I did actually read your response. I think if the JMS561 things get sorted out, then it'll be plenty enough for what I use it for. It's used as a media storage and streamer, my important data is backed up in multiple places.

 

What attracted me to the Cloudshell2 was that it's a complete package. I was looking at the ClearFog Pro/Base but ended up not getting it due to not wanting to design and 3d print an enclosure that holds it and the HDDs.

 

However, this all might be moot if Helios4 gets funded.

Share this post


Link to post
Share on other sites
36 minutes ago, Kosmatik said:

if the JMS561 things get sorted out

 

Well, based on Hardkernel feedback over in ODROID forum I doubt a little bit that they care enough about details ('smartctl -a' is something completely different than '-l devstat' and they still completely ignore that they should look into potential necessary USB3 host controller quirks). Anyway, their problem. I can't do much more here than warning for this combination in its current state. :(

Share this post


Link to post
Share on other sites
21 hours ago, tkaiser said:

based on Hardkernel feedback over in ODROID forum I doubt a little bit that they care enough about details

 

Or in other words: They don't know what they're doing at all: https://forum.odroid.com/viewtopic.php?f=147&t=27246#p192741 "The HDDScan has the same problem as smartctl. so I think that the smartctl have some bugs...."

 

HDDScan implements the ability to read log pages from disks (an really old SCSI feature that is/was not directly related to SMART). Smartmontools implements this using '-l devstat' and for disks behind an USB-to-SATA bridge (the JMS561) this requires the device to properly implement SCSI / ATA Translation (SAT). As demonstrated multiple times this is where the JMS561 fails in various modes (and where the USB resets happen). But of course 'smartctl have some bugs' and 'HDDScan has [...] problem'

 

Smartmontools relied only on GP log (ATA_LOG_EXT command) in older versions but now also checks for smartlog 0x04 page if the former is not available. The 'problem' here is located at JMS561 failing with SAT and for whatever reasons then triggering USB resets. But as usual ODROID micro community is trapped in their micro reality too small to take notice :( 

 

Edit: The CrystalDiskInfo screenshots confirm JMS561 SMART readout crappiness. That's LUN0/sda:

crystaldiskinfo_sda.jpg

And that's LUN1/sdb:

crystaldiskinfo_sdb.jpg

That's two times the same disk but SMART information differs (just check 'Transfer Mode', 'Standard' and 'Features' fields). Cloudshell 2 SMART readouts are obviously broken but vendor doesn't even notice. Wonderful.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
1 1