Search the Community
Showing results for tags 'rock-5-cmio'.
-
Hello. For the past month, I have been working on fixing, updating, and rewriting the hardware crypto patch for RK3588. I've done so much work on it that I'm considering submitting it for review to be added to mainline kernel. I needed hardware cryptographic acceleration for a small project running on my Kubernetes ARM cluster. I noticed that Armbian is still using rk35xx-montjoie-crypto-v2-rk35xx.patch, so I decided to give it a try, but unfortunately the results were quite buggy. I took a closer look at rk35xx-montjoie-crypto-v2-rk35xx.patch and discovered a fairly large number of bugs and potential improvements. I then spent the past month rewriting and updating the original Montjoie code. I probably would have finished earlier, but I was determined to make Multi-Channel Scatter-Gather (Multi-SG) Linked List Items (LLI) in Direct Memory Access (DMA) work — and it now looks like that simply cannot be done. Bit of explanation: A cryptographic hash like SHA-256 takes a file of any size and boils it down to a fixed-length digest. Because files can be big, the math is done in chunks. The very last chunk of the file must be formatted in a specific way. It requires a special "padding" to be attached to the end, which includes a "1" bit, a bunch of "0" bits, and the exact total length of the original file. When a file is loaded into RAM, the kernel memory manager and DMA mapping layer may place its data across multiple non-contiguous memory regions instead of one large continuous block. When the CPU asks the Rockchip crypto hardware to hash this data, it provides a Scatter-Gather (SG) list — essentially a map that says: Read 4KB here, then read 8KB over there, then 2KB over here. The hardware reads the list and jumps around memory automatically. The Rockchip V2 hardware has a built-in feature to automatically add that cryptographic "padding" to the end of a message. However I suspect that the hardware has a design flaw in how it tracks its progress when jumping between scattered memory chunks. When the hardware reaches the boundary between one chunk and the next in a Multi-SG list, its internal byte/block counter loses track of the exact byte count across the gaps in memory. The result is a completely wrong digest. Because I couldn't figure out hardware internal math (and believe me I was trying hard for around 3 weeks trying everything I could under the sun), the only logical mainline fix was to check the Scatter-Gather list and if it's only one chunk send it to Rockchip hardware engine and if the map has more than one chunk (Multi-SG) it's intercepted bypassing the Rockchip hardware and sent to the CPU's standard software (fallback) crypto driver, which knows how to deal with it correctly. The (most likely) hardware bug is causing descriptor-chain state continuation failure and that could involve, for instance: byte counter resets, partial block FIFO resets, hash state reloads incorrectly, descriptor parser wrongly treats entries as independent jobs therfore HW_PAD cannot correctly track block boundaries across chained LLI descriptors. Moreover Rockchip documentation is a bit suspicious in that RK3588 TRM advertises “LLI DMA” and “HW padding”, but never explicitly mentions that hash operations may span multiple linked descriptors. Below is the list of fixes, updates and changes: 1. Documentation/devicetree/bindings/crypto/rockchip,rk3588-crypto.yaml Clock names corrected. Original had "a" and "h" as clock-names items. v3 corrects them to "core", "aclk", "hclk" matching the actual clock signal names in the hardware and passing make dt_binding_check. YAML example reg size fixed. Original example said 0x4000 (16KB). v3 corrects to 0x2000 (8KB) to match both DTS nodes and the actual hardware register map. 2. drivers/crypto/Makefile Root Makefile hook added. v3 adds obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP2) += rockchip/ to the root crypto Makefile. Without this, if CONFIG_CRYPTO_DEV_ROCKCHIP (the V1 driver) is disabled, the build system never descends into drivers/crypto/rockchip/ and the V2 driver silently doesn't build regardless of Kconfig selection. 3. rk2_crypto.h dbgfs_info field added to struct rockchip_ip. Original was missing it, causing register_debugfs to overwrite dbgfs_stats with the info file pointer. v3 adds struct dentry *dbgfs_info as the third debugfs member. #include <linux/reset.h> added. Moved from rk2_crypto.c so that rk2_crypto_ahash.c and rk2_crypto_skcipher.c can call reset_control_assert/deassert directly in their DMA error recovery paths. fallback_req ordering documented. Added // keep at the end comment on struct skcipher_request fallback_req in rk2_cipher_rctx. Same ordering maintained in rk2_ahash_rctx with struct ahash_request fallback_req as last member — required for the flexible array __ctx placement to work correctly. 4. rk2_crypto.c pm_init return value fixed. Original returned err at the end, which after pm_runtime_enable would always be 0 but was misleading. v3 explicitly return 0. IRQ handler fires complete() only on LISTDONE or hard error. Original called complete() on any non-zero DMA_INT_ST. Intermediate SRC_INT per-LLI-entry interrupts would prematurely wake the hash path before all data was processed, producing wrong digests. v3 only signals completion for RK2_CRYPTO_DMA_INT_LISTDONE (BIT(0)) or error bits (0xF8). Consistent spin_lock_bh throughout. get_rk2_crypto(), probe, and remove all use spin_lock_bh/spin_unlock_bh. Original used plain spin_lock in get_rk2_crypto() and probe, risking deadlock if a crypto engine callback ran on the same CPU. Debugfs overwrite bug fixed. Original register_debugfs wrote to rocklist.dbgfs_stats twice — the second write (info file) overwrote the stats file pointer. v3 correctly uses rocklist.dbgfs_info for the second file. pm_runtime_resume_and_get return value checked in probe. Original ignored this return value. v3 checks it and jumps to err_pm on failure. PM reference count balanced on probe success. Original left usage count at 1 after successful registration, pinning the device ON forever. v3 calls pm_runtime_mark_last_busy + pm_runtime_put_autosuspend after successful registration. Separate err_register_alg label with list_del + pm_runtime_put_sync. Original jumped to err_pm on registration failure, which skipped both the list cleanup and the PM put. v3 has a dedicated label that removes the device from the list and decrements the PM count before falling through to rk2_crypto_pm_exit. crypto_engine_exit called before rk2_crypto_pm_exit in remove. Original order was reversed — clocks disabled before engine drained. v3 exits the engine first to flush all in-flight requests, then disables PM. algt->dev handoff on multi-device remove. When the registering device is removed while a second device survives, v3 iterates all algorithm templates and updates algt->dev to the surviving device, preventing use-after-free in tfm_init/tfm_exit logging. dma_free_coherent added to rk2_crypto_remove. The LLI table is a non-managed allocation. Original never freed it on normal remove, leaking it on every rmmod. v3 frees it at the end of remove. pm_resume does full assert/udelay/deassert cycle. Ensures hardware is cleanly initialised with clocks running on every PM wakeup, not just deassert. 5. rk2_crypto_ahash.c Multi-SG hardware fallback. Added if (sg_nents(areq->src) > 1) check at the top of rk2_ahash_need_fallback. The RK3568/3588 hardware padding engine (HW_PAD) loses block-count state across LLI descriptor boundaries, producing wrong digests for any multi-SG request. Single-SG requests continue to use hardware; multi-SG routes to software fallback and increments stat_fb_sgdiff. Explicit payload length for hardware padding. Added writel(areq->nbytes, rkc->reg + RK2_CRYPTO_CH0_PC_LEN_0) before starting DMA. The HW_PAD bit requires the total message length to be programmed in advance so the hardware knows where to apply the Merkle–Damgård padding. Without this, single-SG hardware hashes would also produce wrong results. INT_EN write-enable mask fixed. Original wrote RK2_CRYPTO_DMA_INT_LISTDONE | 0x7F with no upper bits, which is silently discarded by the HIWORD_UPDATE register pattern. v3 first clears stale pending interrupts (writel(0x7F, DMA_INT_ST)), then writes 0x007F007F so enable bits and their write-enable mask are both set. LIST_INT added to last LLI descriptor. LISTDONE (BIT(0)) is only set in DMA_INT_ST when the last LLI entry has RK2_LLI_DMA_CTRL_LIST_INT (BIT(8)) set in dma_ctrl. Without it, DMA completes silently, no interrupt fires, 2-second timeout every request. v3 adds RK2_LLI_DMA_CTRL_LIST_INT | RK2_LLI_DMA_CTRL_SRC_INT to the last descriptor. Uninterruptible completion wait. Changed from wait_for_completion_interruptible_timeout to wait_for_completion_timeout. The interruptible variant can return early on SIGTERM/SIGKILL while DMA is still writing to memory, causing data corruption or use-after-free. timeout return value captured and checked. Original discarded the return value, making timeout undetectable. v3 stores the return in unsigned long timeout and checks !timeout for ETIMEDOUT, then separately checks !rkc->status for EIO. rctx->nrsgs = 0 initialised before dma_map_sg. Ensures rk2_hash_unprepare never calls dma_unmap_sg on an uninitialised count. Safe dma_unmap_sg in unprepare. Added if (rctx->nrsgs) guard before dma_unmap_sg, preventing a kernel panic if dma_map_sg failed in the prepare step. MAX_LLI overflow check. Added if (ddi >= MAX_LLI) at the top of the LLI build loop with dev_err and -EINVAL. Prevents overflowing the DMA-coherent LLI table with a scatter-list longer than 20 entries. readl_poll_timeout_atomic return value checked. Original discarded the return, allowing wrong digest output if HASH_VALID never set. v3 assigns to err and jumps to theend on timeout. be32_to_cpu in digest readback. Hardware outputs all digest words (SHA1, SHA256, SHA384, SHA512, SM3, MD5) in big-endian register format. readl() on LE ARM64 gives a byte-swapped value; be32_to_cpu corrects it before put_unaligned_le32. pm_runtime_mark_last_busy before put_autosuspend. Refreshes the autosuspend timer on each request completion so the device doesn't suspend between back-to-back requests. DMA error hardware reset. On !rkc->status (DMA error interrupt), v3 performs reset_control_assert/udelay(10)/reset_control_deassert to recover the DMA engine from a stuck state before returning -EIO. crypto_ahash_set_statesize promotion in rk2_hash_init_tfm. After allocating the fallback, v3 queries its statesize and promotes the template's statesize if the fallback needs more space. Fixes -EOVERFLOW on export()/import() when ARM Crypto Extensions drivers (sha1-ce, sha256-ce) advertise a larger statesize than the raw hash state. 6. rk2_crypto_skcipher.c rk2_print gated with #ifdef CONFIG_CRYPTO_DEV_ROCKCHIP2_DEBUG. Register dumps on every DMA error flood production logs. v3 wraps the entire function and its callsite with the debug config guard. OTP KEY VALID bit corrected. Original checked BIT(2) for both HASH BUSY and OTP KEY VALID (copy-paste error). v3 uses BIT(3) for OTP KEY VALID. Multi-bit switch statements corrected. All field switches now use (v >> N) & mask before comparing case values, fixing cases that could never match after masking. dd->dma_ctrl = assignment not |=. The LLI table persists across requests. Using |= accumulates stale flag bits from previous requests. v3 uses = for the initial value. RK2_CRYPTO_DMA_CTL_START << 16 not literal 1 << 16. Ensures the write-enable mask is always correct regardless of the constant value. get_rk2_crypto() with null check for cipher. Original used algt->dev (always first device) for all cipher requests, bypassing load balancing. v3 uses get_rk2_crypto() matching the hash path, with if (!rkc) return -ENODEV. DMA unmap moved before timeout/error checks. Original goto theend on timeout bypassed dma_unmap_sg, permanently leaking the mapping. v3 unmaps unconditionally before checking results. wait_for_completion_timeout (uninterruptible). Same fix as hash — prevents signal interruption while DMA is active. Separate timeout/error errnos. ETIMEDOUT for timeout, EIO for DMA error with rk2_print debug dump. INT_EN write-enable mask + stale clear. Same fix as hash path: writel(0x7F, DMA_INT_ST) then writel(0x007F007F, DMA_INT_EN). LIST_INT added to cipher descriptor. RK2_LLI_DMA_CTRL_LIST_INT added to dd->dma_ctrl so LISTDONE fires in DMA_INT_ST. pm_runtime_mark_last_busy before put_autosuspend. DMA error hardware reset. Same reset_control_assert/deassert recovery as hash. Fallback log changed to dev_dbg. Original dev_info in rk2_cipher_tfm_init flooded logs during PM stress testing (hundreds of TFM allocations per second). v3 uses dev_dbg. Stricter XTS fallback check. Changed sg_nents(areq->src) > 1 to != 1 for the XTS-specific SG check. XTS requires exactly one contiguous buffer — more than one SG is wrong but so is zero. 7. include/dt-bindings/reset/rockchip,rk3588-cru.h Patch correctly uses mainline SCMI for RK3588. mainline Linux kernel Commit 849d9db form Feb 22, 2025 "dt-bindings: reset: Add SCMI reset IDs for RK3588" so there is no reason to modify this file as entries exist already. Please note the patch still removes the entire RK3588_SECURECRU_RESET_OFFSET macro and all its entries as keeping them in rst-rk3588.c alongside SCMI would be redundant and potentially dangerous. rk35xx-crypto-v3.patch
- 2 replies
-
- ROCK 5 ITX
- ROCK 5C
- (and 11 more)
