ChrisK

Members

View Profile See their activity

Posts
21
Joined
August 21, 2016
Last visited
February 12, 2022

Reputation Activity

ChrisK got a reaction from markbirss in NanoPI NEO / AIR October 10, 2018

It should be noted, however, that such thick SILPADS are quite bad when it comes to thermal resistance. While in some cases using such a thick pad and some metal may be better than nothing at all, it is better to avoid them and find a better solution. Using thermal pads or paste or glue is mainly to bridge air gaps due to imperfect surfaces. That stuff isn't really that thermally conductive at all, it just happens to be better than air. The general rule is that such layers should be as thins possible.

Using a small 14x14x10mm heatsink, attached with thermal glue, is very likely to be far better than a 4mm SILPAD and a big piece of metal. Nice, suitable heatsinks for the H3 would be these: http://www.ebay.de/itm/172290970476. Add a very small dab of heat conductive glue, for example http://www.ebay.de/itm/181026615992, then put on the heatsink, align it and finally press it down _really_ good (most of the glue should squeeze out on the sides). Let it sit for a while until it's cured, done.

Greetings,

Chris
ChrisK got a reaction from fatboyatdesk in Patch for quick interrupt handling on the H3 (Fast GPIO!) October 12, 2016

Hi,

my current project is about using a NanoPI M1 as a controller for a 3D printer. As such, one wants to have very tight timing on the GPIO output, with as little jitter as possible, to get smooth motion of the stepper motors.

However, it is common belief that getting a fast timing with little jitter is virtually impossible to do under linux, without the help of a dedicated chip just for the IO. While that is true in general on stock kernels, it's possible to vastly improve on that front by spending a little effort.

One way on ARM chips is to use the FIQ. That is an interrupt that has the highest priority, which can interrupt all other, regular interrupts. However, modern chips use a GIC for interrupt control instead of a VIC, so adding a FIQ is not that easy. There are patches out there to do just that, but then, those are for newer kernels, and not the old 3.4.xxx, which is somewhat of the standard kernel for the time beeing.

So, what to do? Well, diving into the kernel sources and do some nasty hacking, of course! To give a quick overview of how an interrupt is usually handled on our ARM platform:

1: IRQ happens, the IRQ controller notifies the CPU about that
2: A small assembly stub is invoked, which then jumps into a small routine, asm_do_IRQ()
3: That small routine calls another routine, handle_IRQ()
4: That in turn calls generic_handle_irq()
5: Which then calls handle_fasteoi_irq(), by using a pointer to that in the IRQ descriptor
6: Subsequently, handle_irq_event() is called
7: And since we are on a SMP machine, that in turn calls handle_irq_event_percpu()
8: And finally, that calls action->handler(), the _actual_ interrupt routine!

Now, not only is that a long way to go, but in many of those functions a lot of other functions are called, many of which are by themselves interruptible/preemptible. No surprise then that there is little chance to get a tight timing...

Now, this is where my nasty hack come in. Step 2 of the above list is preserved as it is, but in step 3, that is, in handle_IRQ(), i added some code. That code checks if the requested IRQ is one to quickly handle, and if it is, basically calls the real IRQ handler directly. This leads to _great_ improvements, as you can see in the following images. All scope images are with 10 second persistency turned on, so that glitches become visible.

First, this is how it would normally look, without any patches at all. It is supposed to be a ~40kHz square wave:

Basically just a blur. OK, most of that is because the interrupt runs on core 0, which is used by other stuff as well. I made a patch to have an IRQ attached to a specific core or cores (see this thread: http://forum.armbian.com/index.php/topic/1885-rt-patches-for-sun8i-kernel/). Once the kernel has that patch applied, and the boot-arg isolcpus=3 is given, only the first 3 cores are used by the kernel, leaving the fourth one free. Attaching the used timer-IRQ to that core gives this result:

Much better, but still a lot of jitter. Now, this is where the quickirq patch comes in. Applying that patch results in this:

Now, that is already a _lot_ better. Is there still room for improvement? Yes, by applying the RT patch. Which gives the final result:

In all the scope shots the board was running at 624MHz RAM, 1200MHz core, while in the background "stress -c 4 -i 4" was active, resulting in a cpu load of roundabout 9 according to top.

Now, let's be clear: Using the quickirq patch is not for the faint of heart. One has to know exactly what she/he is doing, otherwise the kernel will very likely lock up. There are noc checks, no nothing, it assumes that the handler itself is set up and registered correctly. And that no other interrupt handler wants to attach itself (or is already attached) to that interrupt number.

Also, it is no hard realtime. While the outcome is vastly improved, there still is the occasional jitter. However, it is quite good enough to control stepper motors within reasonable limits. Since my aim is tu use the nanoPI M1 to directly control stepper motors, it should be noted that a frequency of 40kHz would, if we assume 100 steps/mm, result in a speed of 400mm/s, or 40cm/s! More reasonable speeds of 200 or 300mm/s mean that any jitter that happens is less pronounced, relative to the pulse width itself.

The patch also include a sample driver which will output a square wave on PA0. In the kernel menuconfig, under Device Drivers -> quickirq you can enable/disable the quickirq handling, define up to 3 interrupt numbers to handle through that patch, as well as enable the sample driver.

The sample driver uses TMR1 of the H3, which is otherwise completely unused. TMR1 has interrupt number 51. The driver uses PA0, but it accesses the port registers directly. So if you have any other stuff that toggles GPIO pins on PORTA, that will interfere with the output of the sample driver. So, if you want to test it and look at the output on a scope, it's best to disable anything else on PORTA (like the heartbeat LED, for example).

If the sample driver is loaded (either directly compiled into the kernel, or as a module and the "modprobe -i quickirq"), it creates a device node at /dev/quickirq. You can echo characters into that to control it:

echo 0 > /dev/quickirq -> disable the timer
echo 1 > /dev/quickirq -> enable the timer (squarewave appears on PA0)

Sending it the numbers 2, 3, 4, 5, 6 and 7 changes the output frequency to (about) 10Hz, 100Hz, 1kHz, 10kHz, 20kHz and 40kHz repsectively. Sending it + or - will adjust the raw timer reload value in 1-increments, q and w in 10-increments, e and r in 100 increments. The shorter the intervall time gets (the higher the frequency), you will notice that there is a base overhead that can't be avoided. Like, the timer reload value for the 40kHz setting is half of that for the 20kHz setting, but the output is slightly less than double.

It's just a crude example, after all.

All that said, here is the patch:

0000-add-quickirq.patch.gz

Have fun with it, hopefully it is useful for others as well. But keep in mind that this is a rather rude brute-force method. You _really_ have to know what you are doing!

Greetings,

Chris

EDIT: Re-uploaded the patch
ChrisK got a reaction from Allfifthstuning in Orange Pi PC (plus) + Armbian + Guitarix= Guitar effectprocessor? September 21, 2016

You can try to reserve one of the cores to the audio process(es). First you isolate a core by using the kernel parameter (boot argument) "isolcpus=...", and then you can use taskset to place threads/threads onto that core. In addition you should then also set a higher priority for those processes/threads. That way you have a complete core only for the audio processing, which should help a bit.

Greetings,

Chris
ChrisK reacted to Allfifthstuning in Orange Pi PC (plus) + Armbian + Guitarix= Guitar effectprocessor? September 21, 2016

Thanks that did the trick.

Thanks, screen is now setup fine.

I've got Guitarix (build from source) up and running with a latency under 10ms (which is pretty decent), although it is a bit unstable (getting xruns) when using certain effects.
Next step is to streamline the whole setup to minimize the xruns.

Hans
ChrisK got a reaction from SKayser in U-Boot patch to use uEnv.txt instead of boot.scr September 7, 2016

Hi,

attached is a patch for U-Boot (should work on 2016.09-rc1 and 2016-09-rc2-dirty, will probably work on older ones as well) that makes it use the plain text file uEnv.txt instead of boot.scr, which has to be translated to a binary file first. This way changing the environment means only editing a text file, instead of editing a file, converting it to binary, and copying it over to the /boot partition on the SD. This becomes active if the U-Boot config has the option OLD_SUNXI_COMPAT disabled (in the menuconfig this is under ARM architecture -> Enable workarounds for booting old kernels).

It also adds a small patch to board/sunxi/board.c to skip the setting of the MAC address if the option CONFIG_CMD_NET is disabled. That way it is possible to leave out networking stuff and save some time at boot because it doesn't init the networking stuff. Personally i prefer a boot that is as quick as possible, because i just hate it to wait for no useful reason On my board it basically just falls through to immediately boot the kernel after power-up, because i have the wait time for user-input disabled as well in the config.

As an example, this is what my uEnv.txt looks like, i have reserved a core and incresed the loglevel:
machid=1029 bootm_boot_mode=sec bootargs=console=ttyS0,115200 noinitrd root=/dev/mmcblk0p2 rootfstype=ext4 rootwait loglevel=7 isolcpus=3 bootcmd=load mmc 0 0x43000000 script.bin; load mmc 0 0x40008000 uImage; bootm 0x40008000 Greetings,

Chris

0000-use-uEnv.txt-by-default-on-sunxi.patch.gz
ChrisK got a reaction from tkaiser in Different GPIO pin mappings on BPi M2+ compared to Orange Pis August 27, 2016

That's not possible. The OWA interface (which is what they call SPDIF in the Allwinner datasheet) can only use PA17, there is no other routing/multiplexing option.

The same holds true for any other hardware peripherial of the chip. Check chapter 3.7 of the datasheet, "GPIO Multiplexing Functions". And if you do direct register access to set up pin functions, check chapter 4.22/4.23 to see what register settings are required to chose what pin is selected for which function/peripherial.

The case of 1-Wire sensors etc. is different, since that is implemented in software (the H3 does not have a 1-Wire peripherial), and thus can be placed on any generic GPIO pin.

Greetings,

Chris
ChrisK got a reaction from zador.blood.stained in Different GPIO pin mappings on BPi M2+ compared to Orange Pis August 27, 2016

That's not possible. The OWA interface (which is what they call SPDIF in the Allwinner datasheet) can only use PA17, there is no other routing/multiplexing option.

The same holds true for any other hardware peripherial of the chip. Check chapter 3.7 of the datasheet, "GPIO Multiplexing Functions". And if you do direct register access to set up pin functions, check chapter 4.22/4.23 to see what register settings are required to chose what pin is selected for which function/peripherial.

The case of 1-Wire sensors etc. is different, since that is implemented in software (the H3 does not have a 1-Wire peripherial), and thus can be placed on any generic GPIO pin.

Greetings,

Chris
ChrisK got a reaction from tkaiser in Patch for quick interrupt handling on the H3 (Fast GPIO!) August 25, 2016

Hi,

my current project is about using a NanoPI M1 as a controller for a 3D printer. As such, one wants to have very tight timing on the GPIO output, with as little jitter as possible, to get smooth motion of the stepper motors.

However, it is common belief that getting a fast timing with little jitter is virtually impossible to do under linux, without the help of a dedicated chip just for the IO. While that is true in general on stock kernels, it's possible to vastly improve on that front by spending a little effort.

One way on ARM chips is to use the FIQ. That is an interrupt that has the highest priority, which can interrupt all other, regular interrupts. However, modern chips use a GIC for interrupt control instead of a VIC, so adding a FIQ is not that easy. There are patches out there to do just that, but then, those are for newer kernels, and not the old 3.4.xxx, which is somewhat of the standard kernel for the time beeing.

So, what to do? Well, diving into the kernel sources and do some nasty hacking, of course! To give a quick overview of how an interrupt is usually handled on our ARM platform:

1: IRQ happens, the IRQ controller notifies the CPU about that
2: A small assembly stub is invoked, which then jumps into a small routine, asm_do_IRQ()
3: That small routine calls another routine, handle_IRQ()
4: That in turn calls generic_handle_irq()
5: Which then calls handle_fasteoi_irq(), by using a pointer to that in the IRQ descriptor
6: Subsequently, handle_irq_event() is called
7: And since we are on a SMP machine, that in turn calls handle_irq_event_percpu()
8: And finally, that calls action->handler(), the _actual_ interrupt routine!

Now, not only is that a long way to go, but in many of those functions a lot of other functions are called, many of which are by themselves interruptible/preemptible. No surprise then that there is little chance to get a tight timing...

Now, this is where my nasty hack come in. Step 2 of the above list is preserved as it is, but in step 3, that is, in handle_IRQ(), i added some code. That code checks if the requested IRQ is one to quickly handle, and if it is, basically calls the real IRQ handler directly. This leads to _great_ improvements, as you can see in the following images. All scope images are with 10 second persistency turned on, so that glitches become visible.

First, this is how it would normally look, without any patches at all. It is supposed to be a ~40kHz square wave:

Basically just a blur. OK, most of that is because the interrupt runs on core 0, which is used by other stuff as well. I made a patch to have an IRQ attached to a specific core or cores (see this thread: http://forum.armbian.com/index.php/topic/1885-rt-patches-for-sun8i-kernel/). Once the kernel has that patch applied, and the boot-arg isolcpus=3 is given, only the first 3 cores are used by the kernel, leaving the fourth one free. Attaching the used timer-IRQ to that core gives this result:

Much better, but still a lot of jitter. Now, this is where the quickirq patch comes in. Applying that patch results in this:

Now, that is already a _lot_ better. Is there still room for improvement? Yes, by applying the RT patch. Which gives the final result:

In all the scope shots the board was running at 624MHz RAM, 1200MHz core, while in the background "stress -c 4 -i 4" was active, resulting in a cpu load of roundabout 9 according to top.

Now, let's be clear: Using the quickirq patch is not for the faint of heart. One has to know exactly what she/he is doing, otherwise the kernel will very likely lock up. There are noc checks, no nothing, it assumes that the handler itself is set up and registered correctly. And that no other interrupt handler wants to attach itself (or is already attached) to that interrupt number.

Also, it is no hard realtime. While the outcome is vastly improved, there still is the occasional jitter. However, it is quite good enough to control stepper motors within reasonable limits. Since my aim is tu use the nanoPI M1 to directly control stepper motors, it should be noted that a frequency of 40kHz would, if we assume 100 steps/mm, result in a speed of 400mm/s, or 40cm/s! More reasonable speeds of 200 or 300mm/s mean that any jitter that happens is less pronounced, relative to the pulse width itself.

The patch also include a sample driver which will output a square wave on PA0. In the kernel menuconfig, under Device Drivers -> quickirq you can enable/disable the quickirq handling, define up to 3 interrupt numbers to handle through that patch, as well as enable the sample driver.

The sample driver uses TMR1 of the H3, which is otherwise completely unused. TMR1 has interrupt number 51. The driver uses PA0, but it accesses the port registers directly. So if you have any other stuff that toggles GPIO pins on PORTA, that will interfere with the output of the sample driver. So, if you want to test it and look at the output on a scope, it's best to disable anything else on PORTA (like the heartbeat LED, for example).

If the sample driver is loaded (either directly compiled into the kernel, or as a module and the "modprobe -i quickirq"), it creates a device node at /dev/quickirq. You can echo characters into that to control it:

echo 0 > /dev/quickirq -> disable the timer
echo 1 > /dev/quickirq -> enable the timer (squarewave appears on PA0)

Sending it the numbers 2, 3, 4, 5, 6 and 7 changes the output frequency to (about) 10Hz, 100Hz, 1kHz, 10kHz, 20kHz and 40kHz repsectively. Sending it + or - will adjust the raw timer reload value in 1-increments, q and w in 10-increments, e and r in 100 increments. The shorter the intervall time gets (the higher the frequency), you will notice that there is a base overhead that can't be avoided. Like, the timer reload value for the 40kHz setting is half of that for the 20kHz setting, but the output is slightly less than double.

It's just a crude example, after all.

All that said, here is the patch:

0000-add-quickirq.patch.gz

Have fun with it, hopefully it is useful for others as well. But keep in mind that this is a rather rude brute-force method. You _really_ have to know what you are doing!

Greetings,

Chris

EDIT: Re-uploaded the patch
ChrisK got a reaction from tkaiser in RT patches for sun8i kernel? August 23, 2016

That reminds me, i forgot to include another patch. That one makes irq_set_affinity_hint() work properly. The interrupt controller allows IRQ's to be tied to specific cores, by setting the cpu affinity of the IRQ. While irq_set_affinity_hint() is already there and exported, it fails to actually put the IRQ on the selected CPU(s).

It can be quite useful to be able to tie an IRQ to a specific CPU/core, especially in conjunction with the RT patches. You can, for example, simply exclude one core from being used by the kernel, using the isolcpus= boot arg for the kernel. Like, isolcpus=3 will exclude the fourth core from being used by the kernel. If you then install your interrupt and call irq_set_affinity_hint(IRQNUM, cpumask_of(3)) (with IRQNUM being the number of the IRQ in question), that IRQ will then handled exclusively on the fourth core. This greatly reduces latency and jitter.

Greetings,

Chris

0005-make-irq-set-affinity-hint-work-properly.patch.gz
ChrisK got a reaction from tkaiser in NanoPI NEO / AIR August 23, 2016

It should be noted, however, that such thick SILPADS are quite bad when it comes to thermal resistance. While in some cases using such a thick pad and some metal may be better than nothing at all, it is better to avoid them and find a better solution. Using thermal pads or paste or glue is mainly to bridge air gaps due to imperfect surfaces. That stuff isn't really that thermally conductive at all, it just happens to be better than air. The general rule is that such layers should be as thins possible.

Using a small 14x14x10mm heatsink, attached with thermal glue, is very likely to be far better than a 4mm SILPAD and a big piece of metal. Nice, suitable heatsinks for the H3 would be these: http://www.ebay.de/itm/172290970476. Add a very small dab of heat conductive glue, for example http://www.ebay.de/itm/181026615992, then put on the heatsink, align it and finally press it down _really_ good (most of the glue should squeeze out on the sides). Let it sit for a while until it's cured, done.

Greetings,

Chris

Sign In

ChrisK

Posts

Joined

Last visited

Reputation Activity

Forums

My Activity Streams

Download

Store

Important Information