Kosmatik Posted June 2, 2017 Posted June 2, 2017 Hello Everyone, As @tkaiser is probably aware from the hardkernel forum, I'm having issues with USB/SMART and OMV. Basically, USB resets whenever SMART data is pulled from OMV's web interface, it works fine if I manually do it in shell. Here is some info. http://sprunge.us/HdbK I was proposed to update smartmontools to 6.5, but there are library dependencies that are missing. If any of you fine folks have any ideas as to what could be causing it, I would be grateful. Thank you
tkaiser Posted June 2, 2017 Posted June 2, 2017 So to start dealing with your problem in a somewhat logical way: We're talking about JMS561 and not ASM1153E here, right? Since calling smartctl -d sat manually for this Cloudshell 2 gimmick provides 1) the data but 2) doesn't trigger USB resets it seems the way OMV does SMART queries could be responsible (then this is clearly the wrong forum since OMV people could tell without doing research first) Hardkernel guys unfortunately don't give a sh*t about upstream support for their products (no reports of problematic UAS bridge chips to linux USB kernel maintainers, same with those shitty Seagate enclosures and evaluating whether Exynos 5422 USB3 host controller needs quirks). This means: no support in smartmontools for JMS561. What you could do: Add JMS561 to /var/lib/smartmontools/drivedb/drivedb.h as outlined in http://forum.openmediavault.org/index.php/Thread/17855-Building-OMV-automatically-for-a-bunch-of-different-ARM-dev-boards/?postID=145278#post145278 (possible result: OMV trying 'smartctl' first without any of the '-d' modes avoiding the USB resets here) Try to figure out how OMV does SMART queries (mv smartctl binary to smartctl.orig, replace smartctl with a script that logs calling parameters) Check smartctl manual page for possible -d values and try them out one after another (since my guess is that OMV will exactly do that when dealing with devices that lack support as it's obviously the case with JMS561 here, so maybe the USB resets happen since OMV tries 'smartctl -d jmicron' or whatever) BTW: I can't provide more help here since I dropped the idea of using ODROID-XU4 as NAS already (especially when combined with the Cloudshell gimmicks). For me a NAS has to be reliable and the list of potential problems with ODROID-XU4 has became too long in the meantime (and to be honest: Hardkernel's role here has been a bit too underwhelming) 1
Kosmatik Posted June 3, 2017 Author Posted June 3, 2017 Ok, I figured out what's causing it. The bridge is JMS561. OMV pulls this for all drives when the Devices tab is clicked. smartctl -x /dev/sda -d sat Running this manually I can also crash it. Specifically this: smartctl -l devstat /dev/sda crashes here: root@openmediavault:~# smartctl -l devstat /dev/sda -d sat,12 smartctl 6.4 2014-10-07 r4002 [armv7l-linux-4.9.30-odroidxu4] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands require SAT ATA PASS-THROUGH (16) Read GP Log Directory failed Device Statistics (SMART Log 0x04) Page Offset Size Value Description ATA_SMART_READ_LOG failed: Connection timed out Read Device Statistics pages 0-7 failed It still appends -d sat even if the bridge is defined in the .h file. What would be the (potential) fix? Update of smartmontools? EDIT: Hard drive issues probably. Running the same command on /dev/sdb does not cause the crash.
tkaiser Posted June 3, 2017 Posted June 3, 2017 3 hours ago, Kosmatik said: What would be the (potential) fix? Avoiding broken hardware? In which of the 4 possible modes do you operate your Cloudshell 2? Is a firmware update available?
Kosmatik Posted June 3, 2017 Author Posted June 3, 2017 Cloudshell2 is in JBOD mode. I've posted in the hardkernel forums as this looks like a hardware issue as this happens when the hard drives are swapped and also in Ubuntu.
tkaiser Posted June 3, 2017 Posted June 3, 2017 26 minutes ago, Kosmatik said: I've posted in the hardkernel forums as this looks like a hardware issue as this happens when the hard drives are swapped Yes, I added this already as problem N° 12 wrt 'ODROID XU4 and Cloudshell' yesterday: https://forum.armbian.com/index.php?/topic/3953-preview-generate-omv-images-for-sbc-with-armbian/&do=findComment&comment=32340 Problem N° 11 contains a link to an explanation: http://forum.openmediavault.org/index.php/Thread/17855-Building-OMV-automatically-for-a-bunch-of-different-ARM-dev-boards/?postID=144752#post144752 (JMS561 is a pretty stupid choice for accessing disks anyway and obviously contains various SMART related bugs, this one here being related to broken SAT support). I really hope Hardkernel stops selling/advertising this strange Cloudshell 2 thingie and starts to address all the USB related problems soon.
tkaiser Posted June 5, 2017 Posted June 5, 2017 Since I'm currently cleaning a bit up sorting disks and enclosures and since I came accross the SD card hosting OMV for XU4 a final try: This is XU4 with an JMS567 enclosure with a Samsung PM851 SSD (no USB resets according to dmesg, it's simply the SSD not supporting the requested feature): root@odroidxu4:~# smartctl -l devstat /dev/sda -d sat smartctl 6.4 2014-10-07 r4002 [armv7l-linux-4.9.28-odroidxu4] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org Device Statistics (GP/SMART Log 0x04) not supported This is the same enclosure now with an Intel 540 inside: root@odroidxu4:~# smartctl -l devstat /dev/sda -d sat smartctl 6.4 2014-10-07 r4002 [armv7l-linux-4.9.28-odroidxu4] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org Device Statistics (GP Log 0x04) Page Offset Size Value Description 1 ===== = = == General Statistics (rev 2) == 1 0x008 4 52 Lifetime Power-On Resets 1 0x010 4 19 Power-on Hours 1 0x018 6 1631780991 Logical Sectors Written 1 0x020 6 20156628 Number of Write Commands 1 0x028 6 1956034441 Logical Sectors Read 1 0x030 6 24617479 Number of Read Commands 2 ===== = = == Free-Fall Statistics (empty) == 3 ===== = = == Rotating Media Statistics (empty) == 4 ===== = = == General Errors Statistics (rev 1) == 4 0x008 4 0 Number of Reported Uncorrectable Errors 4 0x010 4 21 Resets Between Cmd Acceptance and Completion 5 ===== = = == Temperature Statistics (rev 1) == 5 0x008 1 28 Current Temperature 5 0x010 1 - Average Short Term Temperature 5 0x018 1 - Average Long Term Temperature 5 0x020 1 48 Highest Temperature 5 0x028 1 36 Lowest Temperature 5 0x030 1 - Highest Average Short Term Temperature 5 0x038 1 - Lowest Average Short Term Temperature 5 0x040 1 - Highest Average Long Term Temperature 5 0x048 1 - Lowest Average Long Term Temperature 5 0x050 4 0 Time in Over-Temperature 5 0x058 1 85 Specified Maximum Operating Temperature 5 0x060 4 0 Time in Under-Temperature 5 0x068 1 0 Specified Minimum Operating Temperature 6 ===== = = == Transport Statistics (rev 1) == 6 0x008 4 5796 Number of Hardware Resets 6 0x018 4 9 Number of Interface CRC Errors 7 ===== = = == Solid State Device Statistics (rev 1) == 7 0x008 1 0 Percentage Used Endurance Indicator Corresponding dmesg output (no USB resets -- full debug output): [ 318.107058] usb 4-1.1: new SuperSpeed USB device number 4 using xhci-hcd [ 318.128705] usb 4-1.1: New USB device found, idVendor=152d, idProduct=3562 [ 318.128720] usb 4-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 318.128731] usb 4-1.1: Product: AD TO BE II [ 318.128741] usb 4-1.1: Manufacturer: ADMKIV [ 318.128752] usb 4-1.1: SerialNumber: DB123456789699 [ 318.134291] scsi host0: uas [ 318.136477] scsi 0:0:0:0: Direct-Access ADplus SuperVer 6302 PQ: 0 ANSI: 6 [ 318.198461] sd 0:0:0:0: [sda] 234441648 512-byte logical blocks: (120 GB/112 GiB) [ 318.198478] sd 0:0:0:0: [sda] 4096-byte physical blocks [ 318.198779] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 318.199475] sd 0:0:0:0: [sda] Write Protect is off [ 318.199496] sd 0:0:0:0: [sda] Mode Sense: 53 00 00 08 [ 318.199986] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 318.208031] sda: sda1 [ 318.212282] sd 0:0:0:0: [sda] Attached SCSI disk [ 318.461046] EXT2-fs (sda1): (no)user_xattr optionsnot supported [ 318.461056] EXT2-fs (sda1): (no)acl options not supported [ 318.461062] EXT2-fs (sda1): error: couldn't mount because of unsupported optional features (240) @Kosmatikplease keep in mind that JMS561 in the Cloudshell 2 presents the 2 disks as 2 different USB LUNs and that it f*cks up SMART readouts in every of the 4 modes. Even in JBOD/PM mode output for LUN 0 (/dev/sda) is broken: ATA version wrongly reported is 'ATA/ATAPI-7 (minor revision not indicated)' and SATA version is completely missing as well as other details. Even if JMicron comes up with a firmware fix for this flashing might be an adventure (downloading stuff from 'somewhere on the Internet' as Hardkernel already suggest -- WTF?) and the combination of XU4 with this external JMS561 thingie is still something too fragile to rely on (at least I try to avoid complexity when it's about storing data that has a value)
tkaiser Posted June 6, 2017 Posted June 6, 2017 @Kosmatikplease keep in mind wrt https://forum.openmediavault.org/index.php/Thread/17855-Building-OMV-automatically-for-a-bunch-of-different-ARM-dev-boards/?postID=144749#post144749 1) @ryecoaaron uses different disks than you (see above for my 2 SSD, one simply doesn't support 'Device Statistics (GP Log 0x04)') 2) He reports in PM mode for USB LUN0 (/dev/sda): SCT Commands not supported Device Statistics (GP/SMART Log 0x04) not supported SATA Phy Event Counters (GP Log 0x11) not supported 3) For the same disk model as LUN1 (/dev/sdb): SCT Error Recovery Control command not supported Device Statistics (GP/SMART Log 0x04) not supported [ata pass-through(16): 85 09 0e 00 00 00 01 00 11 00 00 00 00 00 2f 00 ] scsi_status=0x0, host_status=0x0, driver_status=0x0 info=0x0 duration=0 milliseconds resid=0 Incoming data, len=512 [only first 256 bytes shown]: 00 00 00 00 00 0a 10 0b 00 01 10 00 00 03 10 00 00 10 04 10 00 00 06 10 00 00 07 10 00 00 00 10 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x000a 2 11 Device-to-host register FISes sent due to a COMRESET 0x0001 2 0 Command failed due to ICRC error 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS In other words: his two idenctical Seagate Barracuda (according to SMART output both are ST1000DM003-1CH162 with firmware CC49 -- maybe they're different and JMS561 simply fakes one disk information?) do NOT support 'Device Statistics' (that's '-l devstat') unlike your Toshiba but it's pretty obvious that SMART output is f*cked up anyway based on disk position inside the Cloudshell 2 gimmick.
Kosmatik Posted June 8, 2017 Author Posted June 8, 2017 @tkaiser I did actually read your response. I think if the JMS561 things get sorted out, then it'll be plenty enough for what I use it for. It's used as a media storage and streamer, my important data is backed up in multiple places. What attracted me to the Cloudshell2 was that it's a complete package. I was looking at the ClearFog Pro/Base but ended up not getting it due to not wanting to design and 3d print an enclosure that holds it and the HDDs. However, this all might be moot if Helios4 gets funded.
tkaiser Posted June 8, 2017 Posted June 8, 2017 36 minutes ago, Kosmatik said: if the JMS561 things get sorted out Well, based on Hardkernel feedback over in ODROID forum I doubt a little bit that they care enough about details ('smartctl -a' is something completely different than '-l devstat' and they still completely ignore that they should look into potential necessary USB3 host controller quirks). Anyway, their problem. I can't do much more here than warning for this combination in its current state.
tkaiser Posted June 9, 2017 Posted June 9, 2017 21 hours ago, tkaiser said: based on Hardkernel feedback over in ODROID forum I doubt a little bit that they care enough about details Or in other words: They don't know what they're doing at all: https://forum.odroid.com/viewtopic.php?f=147&t=27246#p192741 "The HDDScan has the same problem as smartctl. so I think that the smartctl have some bugs...." HDDScan implements the ability to read log pages from disks (an really old SCSI feature that is/was not directly related to SMART). Smartmontools implements this using '-l devstat' and for disks behind an USB-to-SATA bridge (the JMS561) this requires the device to properly implement SCSI / ATA Translation (SAT). As demonstrated multiple times this is where the JMS561 fails in various modes (and where the USB resets happen). But of course 'smartctl have some bugs' and 'HDDScan has [...] problem' Smartmontools relied only on GP log (ATA_LOG_EXT command) in older versions but now also checks for smartlog 0x04 page if the former is not available. The 'problem' here is located at JMS561 failing with SAT and for whatever reasons then triggering USB resets. But as usual ODROID micro community is trapped in their micro reality too small to take notice Edit: The CrystalDiskInfo screenshots confirm JMS561 SMART readout crappiness. That's LUN0/sda: And that's LUN1/sdb: That's two times the same disk but SMART information differs (just check 'Transfer Mode', 'Standard' and 'Features' fields). Cloudshell 2 SMART readouts are obviously broken but vendor doesn't even notice. Wonderful.
Stuart Naylor Posted August 18, 2017 Posted August 18, 2017 Hasn't smartmon fixed the wrong call?https://www.smartmontools.org/ticket/552
Recommended Posts