Jump to content

SPI issue with Banana Pro/A20


Recommended Posts

Hi all,

there is an issue with SPI on a Banana Pro (A20). I am on mainline kernel 4.5.5 now, but the problem already occurred on 3.4.109.

SPI transfers are way to slow for my application, independently from the SPI clock. Investigating the SPI signals it turned out that there is an enormous delay between CS going active (low) and the transmission starting. The delay is in the magnitude of 10...20 usec which is a large multiple of the pure transmission time, so increasing the SPI clock does not solve the problem.

The connected device is an SPI TFT display, so I have to send a large number of SPI words of 2 bytes each, where CS must toggle after each word (required by the target device). Consequently, the transmission of a large amount of data takes about several hundred times longer than required.

I am not a driver expert, but I have looked into the source code of spi_sun4i.c which is the low level driver used for the A20 as far as I know. Here it seems, that the CS signal is under software control, i.e. it is handled as a normal GPIO. CS must be set/cleared by an explicit function call. I wonder, if this is the reason for the large delay.

I found a quick-and-dirty workaround, where somebody configured the SPI controller directly via /dev/mem (i.e. bypassing kernel driver) and kept CS under hardware control, so the delay between CS and start of transmission is only some hundred nanoseconds. Now I am thinking of a modification of the SPI low level driver to keep CS under hardware control as well.

However, I am not sure if this is a solution to my problem. Does anybody have a similar experience or a better idea than kernel modification? Any comments are welcome :-)


Thanks

Link to comment
Share on other sites

Hi Vancouver. I'm having the exact same problem on the H3 (Orangepi pc plus). I'm trying to drive a small tft display (2.2" ILI9225) from the spi pins on the header and found that the SPI transfers are so slow due to the delays that its almost unusable - it takes seconds to draw a text string. I optimized by code for a couple of days without improvement then got out the logic analyzer to look at the SPI transactions and saw exactly the same thing you did, which explains everything.

 

The official armbian for opi pc plus is just using the legacy kernel. That generally works just fine, I've put it on the nand emmc. 

So I thought an updated (mainline) kernel might help but when I build the latest kernel with the armbian build system (which is a very nice piece of work by the way) the orange pi system wont boot; it just runs a boot loop trying to start up. There is something wrong and that's probably why Igor only provides the legacy kernel.

 

So there are two paths to go - 1) patch the SPI driver in the legacy kernel to make it work properly  - using hardware CS, the FIFO and hopefully DMA; or 2) figuring out why the mainline kernel won't boot and fixing it. The newer SPI code is probably optimized.

 

Perhaps Igor has some thoughts on what would be the best approach?

Link to comment
Share on other sites

I think @martinayotte had some success with SPI on H3 (check his posts), but AFAIK only with loop-back test. On A20 I am using one display with legacy kernel via SPI and the speed, huh, it's bad, but good enough for (my) certain scenarios. I have no idea if it's better with new kernel. Didn't came that far yet.

Link to comment
Share on other sites

In fact, I've also used an small ST7735 display on one of my OPiPC, it was working fine in 4.4.4 and 4.4.5, but when I've upgraded in 4.5.x and even later 4.6.x, it was horribly slow (displaying a clock even skip some seconds sometimes).

Unfortunately, I didn't get chance to narrow the reasons and didn't spent too much time on the case, since it wasn't my priority and "time is still the missing ingredient".

I presume it is a bug introduced somehow in clock management of Mainline.

Link to comment
Share on other sites

I've took some time to dig the issue and I've found something by diffing my old 4.4.5 and my 4.6.2 :

diff sources/linux-vanilla/v4.4.5/drivers/spi/spi-sun6i.c sources/linux-sun8i-mainline/orange-pi-4.6/drivers/spi/spi-sun6i.c
220,221c220,221
< 	if (mclk_rate < (2 * spi->max_speed_hz)) {
< 		clk_set_rate(sspi->mclk, 2 * spi->max_speed_hz);
---
> 	if (mclk_rate < (2 * tfr->speed_hz)) {
> 		clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
239c239
< 	div = mclk_rate / (2 * spi->max_speed_hz);
---
> 	div = mclk_rate / (2 * tfr->speed_hz);
246c246
< 		div = ilog2(mclk_rate) - ilog2(spi->max_speed_hz);
---
> 		div = ilog2(mclk_rate) - ilog2(tfr->speed_hz);

Mainline guys introduced the usage of "transfer speed" instead of using "spi max speed".

Until now, I didn't find where the "transfer speed" is initialized, and probably it is never initialized.

 

So, workaround for now, is to revert to the "spi max speed" ...

Link to comment
Share on other sites

Ok ! I dig a bit more ...

In fact, the commit above done 9 months ago in Mainline is legitimate and it only cause a bug/mistake in the orangepi_PC_gpio_pyH3 to come to the surface and been discovered :

diff orangepi_PC_gpio_pyH3-master/pyA20/spi/spi.c.bak orangepi_PC_gpio_pyH3-master/pyA20/spi/spi.c
239c239
<     config.speed = 100000;
---
>     config.speed = 10000000;

As you can see, the author was setting "transfer speed" default to 100kHz instead of 10MHz ... ;)

 

I won't blame duxingkei33 who simply ported the original pyA20 from Olimex to H3, but Stefan Mavrodiev author of the Olimex one.

 

EDIT : I've opened issue on both github :

https://github.com/OLIMEX/OLINUXINO/issues/39

https://github.com/duxingkei33/orangepi_PC_gpio_pyH3/issues/4

Link to comment
Share on other sites

Hi all,

thanks a lot for your replies. Good to know I'm not alone with this problem. However, I could proceed a little bit with the problem of slow SPI transfers on the A20.

I upgraded to kernel 4.7.2 (which did not solve the problem), and I modified the SPI low level driver. In drivers/spi/spi-sun4i.c there is a function sun4i_spi_set_cs(), that takes control over the CS handling. Here, the A20 is configured for software controlled CS and the CS state is set every time this function is called. (I don't know if the H3 uses the same driver as the A20, but the SPI subsystem may be similar)

I just commented out the lines that configure CS for manual control and set the CS state, so this function simply does nothing now (CS is under hardware control by default). After compiling and installing this modified driver, the delay between CS becoming active and start of SPI clocking reduced to 2usec at 1Mhz SPI clocking, and even 150nsec at 10Mhz. Without the modification, it was about 20usec minimum independently from the clock speed (opened a bottle of wine here...)

However, the sun4i_spi_set_cs() function is still called from somewhere, and this seems still to cause a delay. So, sending a number of SPI words still takes approximately the same time as without the kernel modification, but the CS pulses are significantly shorter while the pause between the transmissions is still much too long.

So there is still some work to do. I have to find out where sun4i_spi_set_cs() is called from (possibly, the spidev driver?). I wonder if there were a better way for switching between manual and hardware control for CS than changing driver code. How is this handled in Raspberry? Moreover, the CS polarity is wrong and I could not set it correctly so far.

For an unknown reason, the low level driver cannot be compiled as a module, so always the kernel must compiled and installed completely, which is time consuming.

I will keep you informed. However, any help is welcome.


 

Link to comment
Share on other sites

Hi together,

I spent a lot of time in understanding the SPI driver architecture, and I came to the conclusion that I cannot gain the performance required for my application while using this linux SPI driver. Maybe I am wrong with that, but each SPI transfer requires traversing a driver architecture of three levels (spi-sun4i.c, spi.c, spidev.c) with many function calls in between. Even if the time for a pure SPI data transfer is determined by the SPI clock frequency only, the setup time before starting the transfer is significantly longer.
This does not play a role as long as we want to readout a temperature or inertial sensor a few ten times per second, but for controlling a QVGA tft display it is definitely too slow.

The display in question (see http://admatec.de/pdfs/C-Berry_0.pdf)is shipped with a demo software for the Rapsberry Pi, and on an Raspberry (an old model B+) I interestingly do not have any performance issues. The display runs as expected with about 2-3 fps. So I looked into their source code and found that they bypass the kernel's driver architecture. The software is based on the BCM2835 library which seems directly write into the peripheral registers via /dem/mem. In the BCM library documentation is said

"In order for bcm2835 library SPI to work, you may need to disable the SPI kernel module [...]"

There is a port of the bcm2835 library available for the Banana which claims to be fully compatible, but in fact the Banana version takes the long way over the linux driver which is about 10 times slower.

In order to come to a solution, I went the same way as the Raspberry software and handled all the SPI and GPIO stuff via the /dev/mem interface. The display performance is comparable to the Raspberry version now. However, accessing peripheral registers directly from the user space is clearly a nightmare from the kernel's point of view. So I decided to go the clean way and changed it into a separate kernel driver. Somebody already wrote a framebuffer driver for this display on the Raspberry again. I modified this driver and replaced the BCM specific parts by the A20 register interface. Then I build a kernel version without any SPI driver (except mine). There is still some potential or enhancement, for example I  take control over some GPIO pins even if the GPIO register space is occupied by another driver. Here I have to take care not to change any GPIOs used by other devices. However... it works.

It was a very long way to come here, but I think this is the best way. I tested some small demo applications using pygame. The speed is as I would expect from an SPI connected display.



 

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines