lucass Posted Thursday at 12:01 PM Posted Thursday at 12:01 PM Hey everyone, I'm eyeing the Orange Pi 6 Plus for some edge AI projects, given its 12-core CIX SoC, up to 28.8 TOPS NPU, and massive RAM options (16/32/64GB LPDDR5). Has anyone gotten Ollama running on it with NPU acceleration? Does it support it out of the box, or do you need to convert models (e.g., to INT4/INT8 formats) using custom tools like rkllama or similar? Additionally, with the higher RAM, can it handle bigger LLMs (like 13B+ models) more smoothly than lower-spec SBCs? Any benchmarks or tips on setups (e.g., Ubuntu/Debian installs, frameworks like MLC-LLM)? I'd love to hear real-world experiences—thanks! 0 Quote
arkeon Posted 2 hours ago Posted 2 hours ago Hello, I'm also looking for a solution to test NPU, for now I made this tests with ollama built with vulkan support on orangepi 6 plus 32Go ram. # Ollama Benchmark Results - Date: Mon Nov 10 10:19:17 PM CET 2025 - System: Linux 6.6.89-cix aarch64 - Benchmark Script: ./obench.sh - Runs per model: 3 - Total models: 18 ## Results ### llama3.2:latest ``` Running benchmark 3 times using model: llama3.2:latest | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 11.78 tokens/s | | 2 | 11.66 tokens/s | | 3 | 11.79 tokens/s | |**Average Eval Rate**| 11.74 tokens/second | ``` ### huihui_ai/deepseek-r1-abliterated:7b ``` Running benchmark 3 times using model: huihui_ai/deepseek-r1-abliterated:7b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 5.93 tokens/s | | 2 | 6.07 tokens/s | | 3 | 6.09 tokens/s | |**Average Eval Rate**| 6.03 tokens/second | ``` ### huihui_ai/huihui-moe-abliterated:1.5b ``` Running benchmark 3 times using model: huihui_ai/huihui-moe-abliterated:1.5b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 21.13 tokens/s | | 2 | 21.77 tokens/s | | 3 | 21.00 tokens/s | |**Average Eval Rate**| 21.30 tokens/second | ``` ### granite4:1b ``` Running benchmark 3 times using model: granite4:1b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 8.63 tokens/s | | 2 | 8.59 tokens/s | | 3 | 8.59 tokens/s | |**Average Eval Rate**| 8.60 tokens/second | ``` ### huihui_ai/qwen3-abliterated:4b ``` Running benchmark 3 times using model: huihui_ai/qwen3-abliterated:4b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 8.09 tokens/s | | 2 | 7.67 tokens/s | | 3 | 7.91 tokens/s | |**Average Eval Rate**| 7.89 tokens/second | ``` ### huihui_ai/qwen3-abliterated:1.7b ``` Running benchmark 3 times using model: huihui_ai/qwen3-abliterated:1.7b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 13.81 tokens/s | | 2 | 14.30 tokens/s | | 3 | 13.54 tokens/s | |**Average Eval Rate**| 13.88 tokens/second | ``` ### mistral-small:22b ``` Running benchmark 3 times using model: mistral-small:22b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 1.27 tokens/s | | 2 | 1.27 tokens/s | | 3 | 1.26 tokens/s | |**Average Eval Rate**| 1.26 tokens/second | ``` ### granite4:3b ``` Running benchmark 3 times using model: granite4:3b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 12.85 tokens/s | | 2 | 12.73 tokens/s | | 3 | 12.85 tokens/s | |**Average Eval Rate**| 12.81 tokens/second | ``` ### granite4:350m ``` Running benchmark 3 times using model: granite4:350m | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 32.37 tokens/s | | 2 | 32.57 tokens/s | | 3 | 32.14 tokens/s | |**Average Eval Rate**| 32.36 tokens/second | ``` ### qwen2.5:0.5b ``` Running benchmark 3 times using model: qwen2.5:0.5b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 46.67 tokens/s | | 2 | 45.30 tokens/s | | 3 | 46.07 tokens/s | |**Average Eval Rate**| 46.01 tokens/second | ``` ### qwen3-embedding:0.6b ``` Running benchmark 3 times using model: qwen3-embedding:0.6b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| BENCHMARK FAILED ``` ### granite3.2-vision:latest ``` Running benchmark 3 times using model: granite3.2-vision:latest | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 15.79 tokens/s | | 2 | 15.83 tokens/s | | 3 | 15.81 tokens/s | |**Average Eval Rate**| 15.81 tokens/second | ``` ### llava-llama3:latest ``` Running benchmark 3 times using model: llava-llama3:latest | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 6.27 tokens/s | | 2 | 6.29 tokens/s | | 3 | 6.26 tokens/s | |**Average Eval Rate**| 6.27 tokens/second | ``` ### huihui_ai/gpt-oss-abliterated:20b ``` Running benchmark 3 times using model: huihui_ai/gpt-oss-abliterated:20b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 4.07 tokens/s | | 2 | 4.27 tokens/s | | 3 | 4.31 tokens/s | |**Average Eval Rate**| 4.21 tokens/second | ``` ### huihui_ai/llama3.2-abliterate:1b ``` Running benchmark 3 times using model: huihui_ai/llama3.2-abliterate:1b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 31.15 tokens/s | | 2 | 31.24 tokens/s | | 3 | 31.32 tokens/s | |**Average Eval Rate**| 31.23 tokens/second | ``` ### huihui_ai/gemma3-abliterated:1b ``` Running benchmark 3 times using model: huihui_ai/gemma3-abliterated:1b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 28.00 tokens/s | | 2 | 28.53 tokens/s | | 3 | 28.22 tokens/s | |**Average Eval Rate**| 28.25 tokens/second | ``` ### huihui_ai/qwen3-vl-abliterated:4b ``` Running benchmark 3 times using model: huihui_ai/qwen3-vl-abliterated:4b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 7.59 tokens/s | | 2 | 7.08 tokens/s | | 3 | 6.99 tokens/s | |**Average Eval Rate**| 7.22 tokens/second | ``` ### huihui_ai/qwen3-abliterated:8b ``` Running benchmark 3 times using model: huihui_ai/qwen3-abliterated:8b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 5.55 tokens/s | | 2 | 5.41 tokens/s | | 3 | 5.30 tokens/s | |**Average Eval Rate**| 5.42 tokens/second | ``` ## Summary | Model | Status | Notes | |-------|--------|-------| | llama3.2:latest | ✅ Completed | - | | huihui_ai/deepseek-r1-abliterated:7b | ✅ Completed | - | | huihui_ai/huihui-moe-abliterated:1.5b | ✅ Completed | - | | granite4:1b | ✅ Completed | - | | huihui_ai/qwen3-abliterated:4b | ✅ Completed | - | | huihui_ai/qwen3-abliterated:1.7b | ✅ Completed | - | | mistral-small:22b | ✅ Completed | - | | granite4:3b | ✅ Completed | - | | granite4:350m | ✅ Completed | - | | qwen2.5:0.5b | ✅ Completed | - | | qwen3-embedding:0.6b | ✅ Completed | - | | granite3.2-vision:latest | ✅ Completed | - | | llava-llama3:latest | ✅ Completed | - | | huihui_ai/gpt-oss-abliterated:20b | ✅ Completed | - | | huihui_ai/llama3.2-abliterate:1b | ✅ Completed | - | | huihui_ai/gemma3-abliterated:1b | ✅ Completed | - | | huihui_ai/qwen3-vl-abliterated:4b | ✅ Completed | - | | huihui_ai/qwen3-abliterated:8b | ✅ Completed | - | 0 Quote
Werner Posted 1 hour ago Posted 1 hour ago Armbian does not have a config for this board nor is it supported in some way. 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.