Patch for quick interrupt handling on the H3 (Fast GPIO!)

ChrisK · August 25, 2016

Hi,

my current project is about using a NanoPI M1 as a controller for a 3D printer. As such, one wants to have very tight timing on the GPIO output, with as little jitter as possible, to get smooth motion of the stepper motors.

However, it is common belief that getting a fast timing with little jitter is virtually impossible to do under linux, without the help of a dedicated chip just for the IO. While that is true in general on stock kernels, it's possible to vastly improve on that front by spending a little effort.

One way on ARM chips is to use the FIQ. That is an interrupt that has the highest priority, which can interrupt all other, regular interrupts. However, modern chips use a GIC for interrupt control instead of a VIC, so adding a FIQ is not that easy. There are patches out there to do just that, but then, those are for newer kernels, and not the old 3.4.xxx, which is somewhat of the standard kernel for the time beeing.

So, what to do? Well, diving into the kernel sources and do some nasty hacking, of course! To give a quick overview of how an interrupt is usually handled on our ARM platform:

1: IRQ happens, the IRQ controller notifies the CPU about that

2: A small assembly stub is invoked, which then jumps into a small routine, asm_do_IRQ()

3: That small routine calls another routine, handle_IRQ()

4: That in turn calls generic_handle_irq()

5: Which then calls handle_fasteoi_irq(), by using a pointer to that in the IRQ descriptor

6: Subsequently, handle_irq_event() is called

7: And since we are on a SMP machine, that in turn calls handle_irq_event_percpu()

8: And finally, that calls action->handler(), the _actual_ interrupt routine!

Now, not only is that a long way to go, but in many of those functions a lot of other functions are called, many of which are by themselves interruptible/preemptible. No surprise then that there is little chance to get a tight timing...

Now, this is where my nasty hack come in. Step 2 of the above list is preserved as it is, but in step 3, that is, in handle_IRQ(), i added some code. That code checks if the requested IRQ is one to quickly handle, and if it is, basically calls the real IRQ handler directly. This leads to _great_ improvements, as you can see in the following images. All scope images are with 10 second persistency turned on, so that glitches become visible.

First, this is how it would normally look, without any patches at all. It is supposed to be a ~40kHz square wave:

Basically just a blur. OK, most of that is because the interrupt runs on core 0, which is used by other stuff as well. I made a patch to have an IRQ attached to a specific core or cores (see this thread: http://forum.armbian.com/index.php/topic/1885-rt-patches-for-sun8i-kernel/). Once the kernel has that patch applied, and the boot-arg isolcpus=3 is given, only the first 3 cores are used by the kernel, leaving the fourth one free. Attaching the used timer-IRQ to that core gives this result:

Much better, but still a lot of jitter. Now, this is where the quickirq patch comes in. Applying that patch results in this:

Now, that is already a _lot_ better. Is there still room for improvement? Yes, by applying the RT patch. Which gives the final result:

In all the scope shots the board was running at 624MHz RAM, 1200MHz core, while in the background "stress -c 4 -i 4" was active, resulting in a cpu load of roundabout 9 according to top.

Now, let's be clear: Using the quickirq patch is not for the faint of heart. One has to know exactly what she/he is doing, otherwise the kernel will very likely lock up. There are noc checks, no nothing, it assumes that the handler itself is set up and registered correctly. And that no other interrupt handler wants to attach itself (or is already attached) to that interrupt number.

Also, it is no hard realtime. While the outcome is vastly improved, there still is the occasional jitter. However, it is quite good enough to control stepper motors within reasonable limits. Since my aim is tu use the nanoPI M1 to directly control stepper motors, it should be noted that a frequency of 40kHz would, if we assume 100 steps/mm, result in a speed of 400mm/s, or 40cm/s! More reasonable speeds of 200 or 300mm/s mean that any jitter that happens is less pronounced, relative to the pulse width itself.

The patch also include a sample driver which will output a square wave on PA0. In the kernel menuconfig, under Device Drivers -> quickirq you can enable/disable the quickirq handling, define up to 3 interrupt numbers to handle through that patch, as well as enable the sample driver.

The sample driver uses TMR1 of the H3, which is otherwise completely unused. TMR1 has interrupt number 51. The driver uses PA0, but it accesses the port registers directly. So if you have any other stuff that toggles GPIO pins on PORTA, that will interfere with the output of the sample driver. So, if you want to test it and look at the output on a scope, it's best to disable anything else on PORTA (like the heartbeat LED, for example).

If the sample driver is loaded (either directly compiled into the kernel, or as a module and the "modprobe -i quickirq"), it creates a device node at /dev/quickirq. You can echo characters into that to control it:

echo 0 > /dev/quickirq -> disable the timer

echo 1 > /dev/quickirq -> enable the timer (squarewave appears on PA0)

Sending it the numbers 2, 3, 4, 5, 6 and 7 changes the output frequency to (about) 10Hz, 100Hz, 1kHz, 10kHz, 20kHz and 40kHz repsectively. Sending it + or - will adjust the raw timer reload value in 1-increments, q and w in 10-increments, e and r in 100 increments. The shorter the intervall time gets (the higher the frequency), you will notice that there is a base overhead that can't be avoided. Like, the timer reload value for the 40kHz setting is half of that for the 20kHz setting, but the output is slightly less than double.

It's just a crude example, after all.

All that said, here is the patch:

0000-add-quickirq.patch.gz

Have fun with it, hopefully it is useful for others as well. But keep in mind that this is a rather rude brute-force method. You _really_ have to know what you are doing!

Greetings,

Chris

EDIT: Re-uploaded the patch

fatboyatdesk · October 12, 2016

Great post Chris, thanks for sharing.

There are a multiple things this will educate me on including fast timer interrupt handling and direct port/pin access both of which are of high interest to me coming from the PIC world where everything is on the metal whether that is needed or not (I'm trying to find the balance with Armbian on NanoPi Neos for the time being).

If the goal is to control steppers in a 3D printer why not run Machinekit on the M1 with the RT Patch?

Or if you have $50 lying around buy a BBB which leverages the PRUs for GPIO and has images, configurations, capes and source already in place.

On my BBB the jitter test shows ~1000045000 or 45MicroSeconds at worst case and is better than any of the old PCs/LPs I am running LinuxCNC on and I would expect to work just fine for a 3D printer.

BBB has configurations that you can likely use to test without a cape. BePopr++ is setup for 3D printer.

Perhaps think of this as an excuse to buy a BBB :-).

Allan

dedeschnee · August 25, 2018

not knowing much about the inner workings of linux I cannot say I _really_ know what this hack does exactly but I do understand the concept. That said: is this strongly depending on hardware or does it work above a hardware abstraction layer?

What I am asking is: will this patch work on other processors as well i.e. on an orange pi zero (Allwinner H2+)?