Float support on M4F -context switch efficiency

Posted by dnadler on June 25, 2017

Hi Guys – Sorry if this is a dumb question but I’m wondering about float support, and particularly context-switch efficiency. If I read the code correctly, FreeRTOS M4F is saving/restoring the floating point registers each context switch. I thought I understood from the ST example that float was even supported in ISRs, though I didn’t understand how this is implemented reading the FreeRTOS code. First, my understanding (please correct if I’ve got this wrong) about M4F float support: * processor will interrupt on attempt to use float if magic bit is set * processsor will interrupt on leaving interrupt context if different magic bit is set I naively expected no overhead if float only used in one task. Is the time to save/restore FP registers inconsequential and I’m worrying about nothing? In an OS I wrote donkeys years ago, the save/restore was so expensive I prohibited more than one task using FP, and trapped any offender to HCF. BTW a facility to enforce FP in only one task would be great in FreeRTOS… Anyway, what I naively expected was: * OS tracks which tasks use float, * on context switch from task using float to task not using float, magic bit is set to disable float and cause interrupt if its used, and note is made of last task to use float * on said interrupt, FP context saved to noted task’s TCB, current task is marked float-user, new float context initialized and float enabled, also note more than one task is using FP so for FP-users context save/restore is required on context switch. * if float-used interrupt caused from ISR, save FP registers as required above, and restore on leaving ISR… Sounds a bit complicated but isn’t much code, and makes average context switch less expensive. Can you educate me? What am I missing? Thanks! Best Regards, Dave

Float support on M4F -context switch efficiency

Posted by rtel on June 25, 2017

The overhead for saving and restoring flop registers on the Cortex-M4F is not high. The scheme you describe sounds like a manual lazy save of flop registers, whereas the Cortex-M4F has automatic (hardware level) lazy saving. The problem is, though, the automatic lazy save only really works in non-multithreaded applications because the CPU saves stack space for flop registers, but doesn’t actually save them unless it needs to. For example, if an interrupt uses flop registers too, then the saved stack space actually gets used to save the flop registers before the interrupt corrupts them. The problem is that, in a multithreaded environment, the stack pointer is manipulated by the scheduler, so the hardware might save space on the stack of one task, then, if the running task changes, save the registers to the wrong task. In FreeRTOS the flop registers are only saved for tasks that are actually using flop instructions. Again, this happens automatically and the hardware tracks when flop registers are in use and when not. Doing this manually, the way you describe, takes more instructions and requires the execution of more interrupts. For example, during a context save the flop unit has to be turned off, then if it is used, the flop interrupt has to execute for more code to run to then change the flop owner and manually save then restore flop registers – much simpler to have just a few instructions and let the intelligent hardware do the heaving lifting without needing additional assembly instructions or flop specific interrupts.