Locating Source of Stack Growth

I’m having an issue where a task is overflowing its stack (or at least appears to be) at some unpredictable point in time. I have vApplicationStackOverflowHook enabled, and found that one of my tasks was triggering it. I added a call to uxTaskGetStackHighWaterMark() in that task, and monitor the result. Upon creation, the HWM reports back as 34. The task itself is quite simple: read an ADC, convert the result to a properly formatted string, send it out the debug serial port. There’s nothing in there that should cause the stack to grow or shrink after the initial pass through the While(1) body of the task, but I also haven’t fully investigated all of the calls to some vendor supplied driver functions. However, if I wait for the code to crash (by getting trapped in vApplicationStackOverflowHook()), then go back and look at the debug port output, I see that this task’s stack’s HWM keeps decreasing sporadically with the last reported value before crash being 10. The time it takes for this to happen is variable and unpredictable. I’ve kept an eye on it over the last 20 minutes and it’s only down to 30 now What are potential causes of this? My first thought was interrupts, but It’s my understanding that the FreeRTOS port I’m using (for a Cortex-M4) uses a separate stack for exceptions/interrupts. How can I confim this is true? Is there another component of the RTOS that can cantribute to stack growth like this? Thanks

Locating Source of Stack Growth

Something you said there:
The task itself is quite simple: read an ADC, convert the result to a properly formatted string, send it out the debug serial port.
Question: any chance you’re using something like sprintf() to format the string? Traditionally the “printf” family of functions are very hard on the stack… If you’re using sprintf(), try commenting that out and just send a byte like ‘A’ out the port, and see if the stack overflow goes away…

Locating Source of Stack Growth

I am using sprintf, but the format of what I’m printing never changes, including the character count. What in that function could cause the stack usage to increase suddenly after being the same for the last X-hundred calls? I will try your suggestion, though the unpredictable nature of this issues makes it hard to tell if a change has removed the problem.

Locating Source of Stack Growth

Hi Adam, Just another suggestion for your melting pot… Are you calling sprintf() from other threads also? If so you should probably check if the libc implementation you are using supports reentrancy for the sprintf() function. Kind Regards, Pete

Locating Source of Stack Growth

Yes you ask a very good question… I could speculate on possible causes, but there are many. In Handler mode (exceptions), the CM4 always uses the main stack. While in Thread Mode, depending on configuration, the application mode could also be using the main stack. (I use many RTOSes, I can’t recall how FreeRTOS is configured by default, I suspect it might be everything uses the main stack) Also if you are using the FPU in or more tasks, the amount of context stacked depends on FPU usage. (You only need to save/stack FPU regs for task(s) that use the FPU). There are other possibilities to explain why “doing the same thing sometimes behaves differently” due to the asynchronous nature of interrupts (for example, nested interrupts), but I don’t even know if you’re using the FPU and/or nested interrupts. Often soemthing like this happens when things line up in time “in just the wrong way” So let’s see about printf() and then go from there. Fun problem but not when you’re the one on the short end trying to ship a product…

Locating Source of Stack Growth

Can’t recall if the HWM is in bytes or words, would need to look at the docs or source to see – but in any case – 30 odd does not sound very much. It is conceivable that you call sprintf() lots of times with no problem because there didn’t happen to be a context switch to another task within the call, then at some point there is a context switch – at which point the task’s context is also saved to the stack and boom – it overflows. In answer to another post in this thread – on Cortex-M parts FreeRTOS uses a different stack for interrupts.

Locating Source of Stack Growth

Thanks Pete, I think all the printf family of functions I’m using are thread safe and reentrant, but I will check to make sure once I dig them out of the IDE that obfuscates things somewhat. Richard – I believe it is in words. So you’re saying that when my issue is happening I may just be timing things right with maximum stack usage (due to the extra overhead of sprintf) at the moment of a context switch? That’s plausible. I’m going to simply increase the stack for now and see if it continues to grow beyond where it would normally overflow.

Locating Source of Stack Growth

Oh I am well aware of “in just the wrong way”. I spent the better part of a week debugging an intermittent I2C issue that turned out to be triggered by a context switch perfectly timed between a write and read operation. You did bring up a point that prompts a new question from me – my processor has a built in FPU, and I do make use of it in some tasks. Do I need to manually do something to stack the FPU regs, or is that handled automatically? I haven’t seen any FPU related issues yet, but I may have just been lucky. Richard’s post below brought up something I hadn’t thought of which ties sprintf and interrupts together, and I think it’s likely the root cause. I have a bit of investigating to do though.

Locating Source of Stack Growth

As this is a Cortex-M FPU is handled automatically (for anybody else finding this post – it is specific to Cortex-M!).

Locating Source of Stack Growth

Adam – any updates on the root cause? Just wondering if the usual suspect (printf) was chewing up the stack… Problems like this are fun to debug — when you’re not under schedule pressure — because you always learn something. Then again, who isn’t constantly under schedule pressure?

Locating Source of Stack Growth

It was most likely printf’s (or similar) doing. I think what Richard suggested was the root cause – a context switch at just the right (wrong) time when the stack was nearly full pushed it over. I increased that task’s stack by a modest 20 words, and after running for quite some time I think the lowest the HWM got to was around 13, so previously it was overflowing by 7 words. The long term HWM also changes depending on other code changes I’ve made, which further points to that issue. I’ll let you know if I discover anything else though!