Jump to content

HC4 doesn't see second SSD


ChrisO

Recommended Posts

Some time ago a I was desperate to have ZFS on an Odroid HC4 and after some trying found

Armbian 21.05.8 Buster with Linux 5.10.57-meson64

on which I could compile ZFS module. Then I moved everything from the SD card to filesystems in a ZFS pool.

It boots from SD card, though. And has been working fine for several months.

Recently I bought another HC4 and took the current Armbian image for it:

Armbian 22.08.8 Jammy with Linux 5.19.17-meson64

Everything works fine.

Tried this image on the older HC4 and it doesn't see the second SSD.

In the syslog I found this message:

 

ata1: SATA link down (SStatus 0 SControl 300)

 

Is it a firmware problem?

 

Googling hasn't help.

Any idea anyone?

 

Thanks for your time.

Chris

 

BTW

Does anybody know how to disable the "Clear Screen" after the "Starting kernel" message?

I would like to see what u-boot has to tell me, especially if I have some problems.

Edited by ChrisO
addendum
Link to comment
Share on other sites

I have just spent the whole evening trying to sort a similar problem out.

 

I have a HC4 with two Samsung EVO 250Gb SSDs.

 

I performed an update (apt update equivalent) through the Openmediavault web interface I have installed and my Samsung drives disappeared. The only sign of them in the system logs were:

 

[    2.625851] ata1: SATA max UDMA/133 abar m512@0xfc700000 port 0xfc700100 irq 40
[    2.625860] ata2: SATA max UDMA/133 abar m512@0xfc700000 port 0xfc700180 irq 40

 

I too got the sata link down message.

 

I tried rolling back kernel versions but it did not cure the problem.

 

The drives are in a mirrored raid array. I took my life in my hands and pulled out one of the drives and replaced it with a different one. The new drive was recognised! I then shut down the system and put the original drive back in, rebooted and was back to no drives being recognised. I then pulled out one of the original drives and plugged it back in and it was recognised. I pulled out the second original drive and plugged it back in and it too was recognised and the raid array reported 'clean'.

 

This came up in the logs as I unplugged and plugged in the drives:

 

[   73.769467] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   73.769705] ata2.00: FORCE: horkage modified (noncq)
[   73.769763] ata2.00: supports DRM functions and may not be fully accessible
[   73.769772] ata2.00: ATA-11: Samsung SSD 870 EVO 250GB, SVT02B6Q, max UDMA/133
[   73.769779] ata2.00: 488397168 sectors, multi 1: LBA48 NCQ (not used)
[   73.771662] ata2.00: supports DRM functions and may not be fully accessible
[   73.773228] ata2.00: configured for UDMA/133
[   73.773654] scsi 1:0:0:0: Direct-Access     ATA      Samsung SSD 870  2B6Q PQ: 0 ANSI: 5
[   73.774693] ata2.00: Enabling discard_zeroes_data
[   73.774974] sd 1:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/233 GiB)
[   73.775026] sd 1:0:0:0: [sda] Write Protect is off
[   73.775036] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   73.775118] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   73.791957] sd 1:0:0:0: Attached scsi generic sg0 type 0
[   73.794071] ata2.00: Enabling discard_zeroes_data
[   73.821636] ata2.00: Enabling discard_zeroes_data
[   73.822490] sd 1:0:0:0: [sda] Attached SCSI disk
[   76.717317] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   76.717552] ata1.00: FORCE: horkage modified (noncq)
[   76.717610] ata1.00: supports DRM functions and may not be fully accessible
[   76.717619] ata1.00: ATA-11: Samsung SSD 870 EVO 250GB, SVT02B6Q, max UDMA/133
[   76.717626] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (not used)
[   76.719585] ata1.00: supports DRM functions and may not be fully accessible
[   76.721244] ata1.00: configured for UDMA/133
[   76.721693] scsi 0:0:0:0: Direct-Access     ATA      Samsung SSD 870  2B6Q PQ: 0 ANSI: 5
[   76.722575] sd 0:0:0:0: Attached scsi generic sg1 type 0
[   76.722584] ata1.00: Enabling discard_zeroes_data
[   76.722893] sd 0:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/233 GiB)
[   76.722946] sd 0:0:0:0: [sdb] Write Protect is off
[   76.722957] sd 0:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   76.723040] sd 0:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   76.741822] ata1.00: Enabling discard_zeroes_data
[   76.769518] ata1.00: Enabling discard_zeroes_data
[   76.770398] sd 0:0:0:0: [sdb] Attached SCSI disk
[   76.800642] md/raid1:md0: active with 2 out of 2 mirrors
[   76.806255] md0: detected capacity change from 0 to 249924026368
[   77.072668] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl

 

 

 

I *do not* recommend doing what I did as you could easily lose data but it does seem to show that during boot, the HC4 has problems. Once it's up and running it can recognise the drives. I really hope that there is a software fix for this, but I'm not sure if it's an Armbian or Hard Kernel problem. The update to the system I did seems doesn't seem to be the problem as I rolled back the kernel versions.

 

Link to comment
Share on other sites

I also am suddenly having issues where the second drive isn't recognized and "SATA link down" in dmesg.

 

22.08.4 was working, I started seeing the issue after updating to 22.11.1

 

I rolled back the following packages and the drive is recognized again.

 

sudo apt install linux-dtb-current-meson64:arm64=22.08.4 linux-image-current-meson64:arm64=22.08.4 linux-libc-dev:arm64=22.08.6 linux-u-boot-odroidhc4-current:arm64=22.08.6

Link to comment
Share on other sites

Unfortunately did not work for me.

 

The replacement Crucial MX500 250GB ssds arrived, have replaced the two Samsung 870 EVO 250GB ssds .

 

Everything back to normal.

 

Have tried the Samsungs in another HC4 with the latest Armbian Jammy CLI os - they don't come up. I might try them on an earlier build to see if they are recognised.

 

 

Link to comment
Share on other sites

Just for completeness:

 

Downloaded Armbian_21.02.1_Odroidhc4_buster_current_5.10.12.img.xz and setup HC4 with one samsung drive, os on sd card.

 

uname -a gives:

 

Linux odroidhc4 5.10.12-meson64 #21.02.1 SMP PREEMPT Wed Feb 3 21:06:36 CET 2021 aarch64 GNU/Linux

 

Samsung drive recognised without problem.

Performed apt update and then apt upgrade and reboot

 

uname -a gives:

 

Linux odroidhc4 5.19.17-meson64 #22.11.1 SMP PREEMPT Wed Nov 30 11:05:42 UTC 2022 aarch64 GNU/Linux

 

Samsung drive not recognised. 

 

Anyone know if I should report this somewhere or just wait to see if future kernel (or whatever) updates cure the problem? My system is back up and running with the Crucial drives but it would be nice to get the samsungs working again so that I could use 'mixed manufacturer' raid 1. Looking back it was silly not to do this in the first place but hindsight is a wonderful thing :)

 

I'm not technical enough to know what to do but happy to test if needed.

Link to comment
Share on other sites

1 hour ago, Jermany said:

Anyone know if I should report this somewhere

 

This forum exists for this.

Here on graph below https://armbian.atlassian.net/jira/dashboards/10103 you can see how long it takes before tickets are closed when developers are sponsoring you at close to 100% and not the other way around.

 

1 hour ago, Jermany said:

I'm not technical enough to know what to do but happy to test if needed.


Biggest challenge and personal time loss is organising and coordinating activities (what you are expecting can easily be a full time job for several people) so we are rather relying on automated testings. Its significantly cheaper, more reliable and can be executed at any moment. We know for most of the issues, but since day has only 24 and not 2400 hours, this is best what can be done. What you would do? Project needs more help in other non technical tasks.

Link to comment
Share on other sites

Whoa! Sorry if I came across as demanding, that certainly was not my intention - it was more a question of reporting a problem so that other people can see the work-around.

 

If this is the correct place to report my observations then that is at least done.

 

As for contributing to solving the problem I will go and research the matter further and see what I come up with. There is a link to a similar problem here: https://bugzilla.kernel.org/show_bug.cgi?id=216592#c0 so maybe it's a Samsung thing.

 

I'll also take a look at the links you have provided as I do appreciate that keeping something like Armbian going requires a joint effort, actually a big joint effort.

 

Link to comment
Share on other sites

38 minutes ago, Jermany said:

so maybe it's a Samsung thing.

 

This is not related.
 

55 minutes ago, Jermany said:

I'll also take a look at the links you have provided

 

But it seems you don't understand.

 

1 hour ago, Jermany said:

I do appreciate that keeping something like Armbian going requires a joint effort, actually a big joint effort.

 

There is a group that is doing something and you are from the group that is consuming that work and pressing on the first group. There is no joint effort whatsoever. This is an illusion.

Link to comment
Share on other sites

I think there is a misunderstanding here. Either that or I've stumbled into a Monty Python sketch about arguments :D

 

I took a brief look at the links you provided and they are quite clear - please pay or please help or even better, both. And I totally agree. This is why I said I understand that keeping something like Armbian going requires a big joint effort. I was not saying this was the reality, just that in order to succeed it should be. So I think we are saying the same thing.

 

I will look at the vacancies you have in your links in more detail and see if I can help. Many moons ago I worked in Software Configuration Management and Release Management using ancient tools like PVCS, Continuus and Clearcase but the principles are the same. I will PM you if I find something that I could contribute to as this thread is probably not the best place to continue discussing this.

 

As a light hearted conclusion to the drive not recognised problem: It's Crucial, you don't buy Samsung! - well at least for the edge case of a 250GB ssd :) 

 

 

 

Edited by Jermany
Link to comment
Share on other sites

1 hour ago, josh said:

bug reports take time/effort

 

I know this very well, but I also know that bug resolving can easily take 1000 x more. Lets trade. Your week for my 5 minutes. Do you feel insulted?

Link to comment
Share on other sites

@josh Thanks for taking the time to report that you had problems with Crucial as well - I will keep a closer eye on my little installation as a result. Useful information.

@Igor You are one of the leading experts here and we appreciate any help you can give if and when you have time. I also recognise the enormous amount of effort this takes and hope us flagging up an observed problem is helpful. Are there any ways I can improve on how I documented my observations? 

Link to comment
Share on other sites

38 minutes ago, Jermany said:

Are there any ways I can improve on how I documented my observations? 

 

Nobody will read your observation, so why would that be important? Hire a developer and explain him troubles you have. Then send a patch to Armbian or upstream. Good luck!

Link to comment
Share on other sites

Ok thanks for the clarification. The message thread has attracted a fair amount of views so somebody is reading them. The irony here is that you keep on repeating how overworked you are and how you and the project needs help and yet you use a manner that is abrasive and condescending making people less likely to help!

 

Do you think your manner in dealing with this thread has brought you nearer to your goal of attracting help or funding or pushed you further away? 

 

 
 


 

 

Link to comment
Share on other sites

17 hours ago, Jermany said:

making people less likely to help!


Do I need a working device or you?

 

17 hours ago, Jermany said:

help or funding or pushed you further away? 


Donations never covered more then 0.5% of our costs. What is here you are unable to process?

Link to comment
Share on other sites

1 minute ago, Jermany said:

you have chosen not to answer the question.

 

My answer for your level of service is: its probably a week of work for a developer, its few hours for other professional staff. This costs X. If there is unlikely that our time will be covered, I have no choice but to sent you to the market. If we need to raise money in some alternative ways, it just add expense too. If I need to ask vendor (we have asked them but they don't care about you) to cover, it adds to costs. If i need to fight and convince users / you, that this problem is 100% your expense and has to be paid (in advance) in order to fix, this goes on the bill too. We have several parasite projects that takes our value in real time and forget to explain users where stability comes from. That fight adds to the bill too.

 

Support from you we receive by asking you "donate" doesn't cover expenses of communicating with you. In total it usually doesn't cover expenses of a "small" problem like this one from this topic. This is why I have asked for a trade. Your week of work for my five minutes. This is common perception of people - its 5 minutes of work to fix some problem, while in reality this is a full blown week. Professional developer work week costs Y. When you will cover it, we will move on. Until you don't, we will have a conflict each time you will ask for something ... And you will. If not you, then someone else.

 

We are far from actual R&D, debugging, development ... On the market, this service is going to be a lot more expensive for all of you but at least we will save a lot of time, money and stress. If we do it for free, we are stealing from professionals that has to feed their families, advancing our competitors ... And in fact, Armbian is ran by professionals, as there are many dirty jobs no volunteer wants to pick it up. They seek for fancy and cool jobs and I don't blame them.

 

Most of people that read the topic are here to report and complain. Just a small % could help and just a tiny % of those could spent a week to spent. Almost nobody goes further to the point to ask - what the project needs to help us better. Perhaps this? Why I don't do it? Because I am unable to cover also that role. Why you will not do it? Because its a serious commitment. Why I don't hire a person for that? You don't pay for services, you put a pressure to this community to get it for free. And you are not a alone. 

 

Working device is what you need, but projects that helps you with that has different primary needs. We need money to cover what we do, to expand, to hire help to be able to help you ... while all you see are your small technical problems that are big expense, we have no way to finance for you. You have to do that!

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines