Jump to content

Kernel Oops in btrtl module


Recommended Posts

When I reboot my Orange Pi PC 2 with my Realtek USB Dongle so the firmware is already loaded onto the Dongle I am seeing this:

 

[   31.659072] Bluetooth: hci0: RTL: examining hci_ver=0a hci_rev=097b lmp_ver=0a lmp_subver=ec43
[   31.659815] Bluetooth: hci0: RTL: unknown IC info, lmp subver ec43, hci rev 097b, hci ver 000a
[   31.659830] Bluetooth: hci0: RTL: assuming no firmware upload needed
[   31.659852] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000002
[   31.668671] Mem abort info:
[   31.671488]   ESR = 0x96000004
[   31.674561]   EC = 0x25: DABT (current EL), IL = 32 bits
[   31.679880]   SET = 0, FnV = 0
[   31.682937]   EA = 0, S1PTW = 0
[   31.686077] Data abort info:
[   31.688951]   ISV = 0, ISS = 0x00000004
[   31.692783]   CM = 0, WnR = 0
[   31.695752] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043305000
[   31.702186] [0000000000000002] pgd=0000000000000000, p4d=0000000000000000
[   31.708978] Internal error: Oops: 96000004 [#1] SMP
[   31.713848] Modules linked in: btusb btrtl btbcm btintel sun4i_i2s bluetooth sun4i_gpadc_iio industrialio ecdh_generic rfkill ecc snd_rawmidi uvcvideo videobuf2_vmalloc snd_seq_device sunxi_cedrus(C) videobuf2_dma_contig v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc snd_soc_simple_card snd_soc_simple_card_utils display_connector cpufreq_dt zram sch_fq_codel nfsv3 nfs fscache realtek sy8106a_regulator i2c_mv64xxx dwmac_sun8i mdio_mux
[   31.754604] CPU: 0 PID: 313 Comm: kworker/u9:2 Tainted: G         C        5.10.34-sunxi64 #21.05.1
[   31.763633] Hardware name: Xunlong Orange Pi PC 2 (DT)
[   31.768813] Workqueue: hci0 hci_power_on [bluetooth]
[   31.773775] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   31.779776] pc : btrtl_setup_realtek+0x64/0xe10 [btrtl]
[   31.784993] lr : btrtl_setup_realtek+0x54/0xe10 [btrtl]
[   31.790207] sp : ffff800011843c90
[   31.793515] x29: ffff800011843c90 x28: 0000000000000000
[   31.798820] x27: ffff000005782548 x26: ffff000005664100
[   31.804124] x25: 0000000000000000 x24: ffff0000054d4698
[   31.809429] x23: ffff0000054d4a38 x22: ffff0000054d4038
[   31.814732] x21: 0000000000000000 x20: ffff0000054d4000
[   31.820037] x19: ffff0000025ea880 x18: ffff8000114d17d0
[   31.825341] x17: 0000000000000000 x16: 0000000000000000
[   31.830646] x15: 0000000000000223 x14: ffff800011843870
[   31.835950] x13: 00000000ffffffea x12: ffff000037d707a8
[   31.841254] x11: 0000000000000003 x10: ffff000037d6a768
[   31.846557] x9 : ffff000037d6a7c0 x8 : 0000000000005fe8
[   31.851861] x7 : c0000000fffffbff x6 : 0000000000000001
[   31.857165] x5 : 0000000000015fa8 x4 : 0000000000000000
[   31.862468] x3 : 0000000000000000 x2 : 58ab9a179ad83900
[   31.867772] x1 : 0000000000000000 x0 : 0000000000008703
[   31.873078] Call trace:
[   31.875525]  btrtl_setup_realtek+0x64/0xe10 [btrtl]
[   31.880425]  hci_dev_do_open+0x298/0x700 [bluetooth]
[   31.885400]  hci_power_on+0x54/0x2c0 [bluetooth]
[   31.890016]  process_one_work+0x1f0/0x3c0
[   31.894019]  worker_thread+0x140/0x520
[   31.897762]  kthread+0x120/0x128
[   31.900986]  ret_from_fork+0x10/0x30
[   31.904563] Code: 2a0003f5 350000c0 f9400261 5290e060 (79400421)
[   31.910650] ---[ end trace a2a6ba9ede4ddc6e ]--- 

 

The reason becomes obvious when looking at the relevant assembly and the cause seems to be this patch: 0143-Bluetooth-btrtl-add-support-for-the-RTL8723CS.patch

 

It adds an additional btrtl_apply_quirks(hdev, btrtl_dev) without properly checking if btrtl_dev->ic_info is null (which it is in my case, that's why the "unknown IC info, lmp subver ec43, hci rev 097b, hci ver 000a" is printed). So either in btrtl_apply_quirks there should be an additional null check or before it is called a null-check needs to be added.

 

What's the best and fastest way to get this fixed? As I've already found the cause it should be pretty straight forward?

Link to comment
Share on other sites

5 hours ago, Flole said:

What's the best and fastest way to get this fixed?

 

By who? Why would that be our problem? It's on you to research the topic (or hire someone to do that for you) and provide merge request, which needs to be tested on hardware that is equipped with the same chip: https://docs.armbian.com/Process_Contribute/ 

 

Since you don't support our project seriously, such support requests sounds more like a joke - its not personal - all similar sounds the same. We can't afford to hire help (your donations doesn't even pay for the servers you download from) while volunteers are full for several years in advance. So tell me how? Linux and surrounding tooling needs serious maintenance, development is constantly moving on and we - in common public code which we are not authors - find way more bugs that is possible to handle. We also developed a tool for finding them faster, which also added to maintaining expenses, ... but solving what is found - with what? When? Hardware we are dealing with (3rd party usb stuff was never a part of this) is far away from perfect functioning.

 

For me, for people on the project, time is totally critical - i / we also have a family, two kids and a job that pays the bills - why fixing bugs for you and all Linux distributions on request and 100% on our private expense? Also there are 1000 people before you reported "something is wrong with their system". We are donating 50-80h every day, more we can't. Even you asked politely and do a part of the  needed work. Most of people just demand ...

Link to comment
Share on other sites

Not entirely sure why you responded (or even are on this forum) if all you want to say is "find the issue and solution yourself" without providing any real help. Especially if your time is so valuable to you. Instead of writing your post you could've easily fixed this issue in the same time.

The bug is not there upstream and gets added by a patch that can be found in the Armbian repository, that's why I asked here and not on an upstream mailing list, hoping to get it fixed easily. But if you want to keep buggy patches in your repository and ruin working software in doing so that's up to you and fine with me, I'll simply patch the binary then, easy fix to kick out that comparison again in the assembly. I just thought that it might be a good idea to have others benefit from this aswell but apparently that's just too complicated.

Link to comment
Share on other sites

2 hours ago, Flole said:

Not entirely sure why you responded (or even are on this forum) if all you want to say is "find the issue and solution yourself" without providing any real help.


Telling you that asking us to work on your problems is waste of your time is real help. We don't have this issue - you do. I will help you, but I will not solve this problem for you. If you solve it for you, you solve it for others. This is what we do every day. We would accept your problem and solve it for everyone, but we have no resources.
 

2 hours ago, Flole said:

Instead of writing your post you could've easily fixed this issue in the same time.


First, I don't work for you. Second, answering brainless facts consumes way less time, money and energy then spending days on debugging. In your interest. Third, 1000 other people that also have some issue are in the line before you. Will that prevent me to communicate on forums and will I move to work on (your) bugs instead? That would be a very nice joke. Isn't it clear that our project is 1/1000 too small and we are totally overloaded? After my response to you? Or is this a fact you ignore since you only care about solving your problem? But since we are not jumping from happiness that should be clear? Or not?

 

What about things that works well? How much do you support that part? We have actually a problem that contribute way way way too much into the public code. Not the other way around.
 

2 hours ago, Flole said:

The bug is not there upstream and gets added by a patch that can be found in the Armbian repository, that's why I asked here and not on an upstream mailing list, hoping to get it fixed easily.

 

I wouldn't be so sure that this problem is that trivial as you think it is, but its on you to find out. I am highly concerned hw we (try to) support - onboard wireless - might have issues. This is Linux and you are saying what - Realtek? OMG

 

2 hours ago, Flole said:

I just thought that it might be a good idea to have others benefit from this aswell but apparently that's just too complicated.

Waiting for your merge requests. If you can't test on devices that are using this driver, ask on forum for help. Use this infrastructure.

Link to comment
Share on other sites

Patched the binary in the meantime, issue is solved for me. Thanks.

 

As I still don't know where that patch is coming from and where a PR has to go that's it for me. I just know who was the one who pulled it in (first as .disabled) and then enabled it later on. Don't tell me things about testing and so on when you are mass pulling in patches from somewhere, obviously without testing them or even looking at what they do and then introduce bugs due to that.... You even wrote "It's a mess which will be eventually sorted out or merged up.", yeah good you merged a mess so don't expect me to clean up your mess. Just throwing in some patches, breaking things for me and then expecting me to fix that stuff is not going to work, maybe you have more sucess with others though.

 

By the way: Upstream it is done properly since a few months, so the mess you pulled in will be thrown out again eventually (hopefully).

Link to comment
Share on other sites

48 minutes ago, Flole said:

Upstream it is done properly since a few months, so the mess you pulled in will be thrown out again eventually

 

Yes, this is our regular MO (if we want to provide features months ahead or fix bugs in ad-hoc manner) and we need to adjust code constantly. Sometimes this pull is well known / documented, clean or with our authored patches. Sometimes they are pulled from some 3rd party project such as Librelec or similar. Mainly for multimedia patches since we don't deal with this area much on our own. Sometimes our code works well, but got broken by upstream code change or just that implementation is done a little bit different. We are seeing that too.

 

For this specific problem, can't tell without moving away from what I am doing and research. Which is not going to happen just like that.
 

48 minutes ago, Flole said:

obviously without testing them or even looking at what they do and then introduce bugs due to that.

 

- you are saying Greg Kroah-Hartman team checks and tests every patch that lands into the Linux codebase?

- our project is obviously not a professional venture, Linux kernel is

 

48 minutes ago, Flole said:

Just throwing in some patches, breaking things for me


Impossible to know for all chip variants. We can't test hardware we don't have. And even we would had hardware, our test system is too primitive. It lack few millions to get on the professional level. Testing on all possible wireless devices that are on the market on code change? Nice to have, but we are far away. Manual testing? Bad joke.

Link to comment
Share on other sites

14 hours ago, Flole said:

The reason becomes obvious when looking at the relevant assembly and the cause seems to be this patch: 0143-Bluetooth-btrtl-add-support-for-the-RTL8723CS.patch


But our code base doesn't have this patch ?! Where did you find it?

Link to comment
Share on other sites

It is in build/patch/kernel/archive/sunxi-5.4/0143-Bluetooth-btrtl-add-support-for-the-RTL8723CS.patch, don't ask me why it's applied to 5.10 though (or maybe that's how this build system works, no clue). It's not there upstream in 5.10 but the disassembler clearly tells me that it's applied to the module in the 5.10 module directory (and as it is, or at least was before I patched it, signed it's obviously built by something that has the signing keys).

Link to comment
Share on other sites

9 minutes ago, Flole said:

It is in build/patch/kernel/archive/sunxi-5.4/0143-Bluetooth-btrtl-add-support-for-the-RTL8723CS.patch, don't ask me why it's applied to 5.10 though (or maybe that's how this build system works, no clue).


I would claim that its almost impossible ... but well, bugs are always possible even there where its "impossible" ;)  What would be possible is that patch was somehow present at the time kernel was built ... but history doesn't reveal that. Strange. We also were changing CI in past six months a lot (which indeed has signing keys and was not very well tested yet), where strange things could happen. If you move to latest nightly kernel (made with CI tooling) or manually kernel build, bug from this topic should not be present.

Link to comment
Share on other sites

As I've patched it for now I can wait until the current dev (5.13) becomes the new stable. Any future update of the kernel (and it's modules) should also not have this patch, so I should not need my patch ever again. Let's wait and see then.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines