Quality RTOS & Embedded Software

 Real time embedded FreeRTOS RSS feed 
Real time embedded FreeRTOS mailing list 
Quick Start Supported MCUs PDF Books Trace Tools Ecosystem TCP & FAT Training




Loading

How to catch code that caused the hard fault

Posted by alsaleem on July 10, 2017

Hi, I have an application that runs for several hours then stops at the default handler. My code is running repeatedly so there is no new code introduced at the fault time. I am using STM32F411 with (FreeRTOS V8.2.1)

I am using the code presented here in RTOS site, repeated down:

~~~ .section .text.DefaultHandler,"ax",%progbits DefaultHandler: /* Load the address of the interrupt control register into r3. / / NVICINTCTRLCONST / /ldr r3, #0xE000ED04 / ldr r3, =SCBICSR / Load the value of the interrupt control register into r2 from the address held in r3. / ldr r2, [r3, #0] / The interrupt number is in the least significant byte - clear all other bits. */ uxtb r2, r2 InfiniteLoop: b InfiniteLoop (<====) .size DefaultHandler, .-Default_Handler

~~~ R2 indeed has the value of (3), PC is pointing to (b Infinite_Loop)

I also implemeted the other code for hard fault handler with C function (hardfaulthandler_c) to print variables, but nothing was printed.

~~~ .section .text.HardFaultHandler .weak HardFaultHandler .type HardFaultHandler, %function HardFaultHandler: TST LR, #4 ITE EQ MRSEQ R0, MSP MRSNE R0, PSP B hardfaulthandlerc .size HardFaultHandler, .-HardFault_Handler

~~~ The vector table is arranged as : ~~~ gpfnVectors: .word _estack .word ResetHandler .word NMIHandler .word HardFaultHandler .word MemManageHandler .word BusFaultHandler .... .... ~~~ The hard fault handler is defined as ~~~ .weak HardFaultHandler .thumbset HardFaultHandler,DefaultHandler ~~~ I've also impleneted the stack overflow function hooks, and also nothing was printed. ~~~ void vApplicationMallocFailedHook( void ) { printf("malloc failed -----------------------------------------------n"); } void vApplicationStackOverflowHook( TaskHandlet xTask, signed char *pcTaskName ) { printf("stack overflow in task id %lu, name: %s -------------------------------------------n", (uint32t)xTask, pcTaskName); } ~~~

So, could some one point where to catch the code that caused the interrupt and why hard_fault interrupt is not being served?

Thanks.


How to catch code that caused the hard fault

Posted by rtel on July 10, 2017

So, if I understand correctly, you have determined that it is the hard fault that caused the exception, but the hard fault handler is not executing.

Are you sure the hard fault handler is installed in the vector table?

Did you take note of the Handling Imprecise Faults section on the page you linked to?


How to catch code that caused the hard fault

Posted by heinbali01 on July 10, 2017

You can test if your HardFault_Handler does get called by putting a break-point in it and execute the following code:

~~~~ uint32_t ulAddress = 0xF0937531; printf( ( "Divide by zero = %un", *( ( unsigned * )ulAddress ) ) ); ~~~~ 0xF0937531 is just an unaligned non-implemented memory address. The printf() should make sure that the dereferencing does take place.

But if your Default_Handler is being called, isn't there some interrupt that you haven't set correctly yet?

Maybe you have miss-spelled the name of some Interrupt handler, using the wrong case, e.g. ETH_IRQhandler in stead of ETH_IRQHandler ?

~~~~ .weak HardFaultHandler .thumbset HardFaultHandler,DefaultHandler ~~~~

The above means that the Default_Handler will be called in stead of the HardFault_Handler. The weak means that the user can override this definition without changing anything to the library.

If you want to override the above weak definition, I would not use weak here:

~~~~ .section .text.HardFaultHandler .weak HardFaultHandler .type HardFaultHandler, %function HardFaultHandler: ~~~~

Why don't you use the C example from the FreeRTOS page that you refer to ?

Slightly modified, with less code:

~~~~

struct xREGISTERSTACK { uint32t spare0[ 8 ]; uint32t r0; uint32t r1; uint32t r2; uint32t r3; uint32t r12; uint32t lr; /* Link register. / uint32_t pc; / Program counter. / uint32_t psr;/ Program status register. */ uint32_t spare1[ 8 ]; };

volatile struct xREGISTER_STACK *pxRegisterStack = NULL;

void prvGetRegistersFromStack( uint32t pulFaultStackAddress ) { / 'pxRegisterStack' can be inspected in a break-point. */ pxRegisterStack = ( struct xREGISTERSTACK *) ( pulFaultStackAddress - ARRAY_SIZE( pxRegisterStack->spare0 ) );

/* When the following line is hit, the variables contain the register values. */
for( ;; );

} /-----------------------------------------------------------/

/* A non-static declaration, not using naked: / void HardFault_Handler(void) { __asm volatile ( " tst lr, #4 n" " ite eq n" " mrseq r0, msp n" " mrsne r0, psp n" " ldr r1, [r0, #24] n" " bl prvGetRegistersFromStack n" ); } /-----------------------------------------------------------*/

~~~~

Your ISR declarations should not be weak.


How to catch code that caused the hard fault

Posted by heinbali01 on July 10, 2017

My sample code is using a macro that is often used in /labs, defined as:

~~~ #define ARRAY_SIZE( x ) ( int )( sizeof( x ) / sizeof( x )[ 0 ] ) ~~~

pulFaultStackAddress will point to the location where register r0 is stored. The 16 bytes in spare0 / spare1 sometimes give a bit more information about the process: bytes that were stored before the crash (spare1) and after it (spare0).


How to catch code that caused the hard fault

Posted by alsaleem on July 10, 2017

Hi Hein, Sorry for late reply. the problem occurs after 5 hours, so i have to wait until the fault is cought to check. - As you can see above it is not miss-spelled mistake! at least as it looks! And indeed it is hardf ault since R2 = (3) Nevertheless, I went on to analyze the fault manually with the default handler. @Real Time Engineers ltd, the fault is precise BFSR @ 0xE000ED29 = 0

I checked the content of memory pointed by MSP (my case 0x2001ff18) and found:

Address 0 - 3 4 - 7 8 - B C - F
000000002001FF10 04600240 6CE60020 686014A0 A4130020
000000002001FF20 686014A0 686014A0 00000000 F1D10008
000000002001FF30 BCC30008 0F000001 A4130020 686014A0
000000002001FF40 C9DD0108 B40A0020 58FF0120 50FF0120
000000002001FF50 67450108 BDAA2000 A0130020 64170020
000000002001FF60 BDAA2000 00000000 70FF0120 CBDF0008
000000002001FF70 AFD20008 50000000 00000000 02000000
000000002001FF80 88FF0120 4FC30008 90FF0120 71860108

R0 = A0146068 R1 = 200013A4 R2 = A0146068 R3 = A0146068 R12 = 0 LR = 0800D1F1 PC = 0800C3BC PSR = 0100000F BFAR = 200013A4 CFSR = A0146068 HFSR = 0801DDC9 DFSR = 20000AB4 AFSR = 20000AB4 SCB_SHCSR = 2001FF58

From my .map file, here is what I found (~ PC = 0800C3BC) : ~~~ .text.osSystickHandler 0x0800c33c 0x18 Middlewares/ThirdParty/FreeRTOS/Source/CMSISRTOS/cmsisos.o 0x0800c33c osSystickHandler .text.vListInitialise 0x0800c354 0x40 Middlewares/ThirdParty/FreeRTOS/Source/list.o 0x0800c354 vListInitialise .text.vListInitialiseItem 0x0800c394 0x1c Middlewares/ThirdParty/FreeRTOS/Source/list.o 0x0800c394 vListInitialiseItem .text.vListInsertEnd 0x0800c3b0 0x48 Middlewares/ThirdParty/FreeRTOS/Source/list.o 0x0800c3b0 vListInsertEnd .text.vListInsert 0x0800c3f8 0x74 Middlewares/ThirdParty/FreeRTOS/Source/list.o 0x0800c3f8 vListInsert .text.uxListRemove 0x0800c46c 0x54 Middlewares/ThirdParty/FreeRTOS/Source/list.o 0x0800c46c uxListRemove ~~~ And for LR ~~~ .text.xTaskGetTickCount 0x0800d0d4 0x20 Middlewares/ThirdParty/FreeRTOS/Source/tasks.o 0x0800d0d4 xTaskGetTickCount .text.xTaskIncrementTick 0x0800d0f4 0x17c Middlewares/ThirdParty/FreeRTOS/Source/tasks.o 0x0800d0f4 xTaskIncrementTick .text.vTaskSwitchContext 0x0800d270 0xd4 Middlewares/Third_Party/FreeRTOS/Source/tasks.o 0x0800d270 vTaskSwitchContext ~~~

Now it looks that hard fault was cought in a FreeRTOS code (vListInsertEnd). (Am I right?)

FYI, I have RTC_WKUP with priority 10U every one second. It reads RTC registers and computes EPOCH using mktime(). EXTI4/EXTI0 with priority 10. UART1 no ISR (debug) UART2 ISR priority 10. I2C/SPI no ISR I2S DMA ISR priority 10

5 tasks with same priority (=1)

Do I have to upgrade?

Thanks.


How to catch code that caused the hard fault

Posted by alsaleem on July 10, 2017

Also, HFSR = 0801DDC9 (<== printf code)


How to catch code that caused the hard fault

Posted by rtel on July 11, 2017

There is some good debug information here.

As for your question - do I have to upgrade? No, you should not have to. Newer versions have more assert() statements to help catch problems, but you should not have issues like this in any version.

What you are describing doesn't make sense so far - which just means there is some information missing.

You appear to be entering an ISR that is not defined. If the interrupt entry is genuine, then it is nothing to do with FreeRTOS as such, as interrupts are generated by hardware.

However, reading the registers indicates you are in a hard fault, but the hard fault handler is not being called. Potentially the actual interrupt handler is itself faulting, but I would still expect the fault handler to be entered even if the fault occurred inside another interrupt.

There are several fault handlers at the base of the interrupt vector table. Do they all have their own fault handlers, or do some of them just go to the default handler. If some go to the default handler then try adding a unique handler for each to see if you end up in one of those.

When you say: > LR = 0800D1F1

where is that value coming from? When an interrupt is taken the PC address is pushed onto the task stack before the ISR is entered. Did you pull the value from the task stack (I can't see it in the memory dump)? Inside a non-nested ISR itself the LR should contain an EXC_RETURN code, not an address.

You could try unwinding the task stack from inside the default handler, like you are inside the hardfault handler, to find the address of the instruction that was executing when the interrupt was taken. That would only be helpful if it was a fault that caused the interrupt entry though - if it is a genuine interrupt then it will be asynchronous to the code execution.


How to catch code that caused the hard fault

Posted by alsaleem on July 11, 2017

LR is the 5th word in MSP, it is shown but LE, PC is the sixth. Here is the code to print the MSP values, I borrowed names from it. There is also corection on BFSR+ values. : ~~~ stackedr0 = ((unsigned long) hardfaultargs[0]); stackedr1 = ((unsigned long) hardfaultargs[1]); stackedr2 = ((unsigned long) hardfaultargs[2]); stackedr3 = ((unsigned long) hardfaultargs[3]);

stacked_r12 = ((unsigned long) hardfault_args[4]);
stacked_lr = ((unsigned long) hardfault_args[5]);
stacked_pc = ((unsigned long) hardfault_args[6]);
stacked_psr = ((unsigned long) hardfault_args[7]);

printf ("\n\n[Hard fault handler - all numbers in hex]\n");
printf ("R0 = %x\n", stacked_r0);
printf ("R1 = %x\n", stacked_r1);
printf ("R2 = %x\n", stacked_r2);
printf ("R3 = %x\n", stacked_r3);
printf ("R12 = %x\n", stacked_r12);
printf ("LR [R14] = %x  subroutine call return address\n", stacked_lr);
printf ("PC [R15] = %x  program counter\n", stacked_pc);
printf ("PSR = %x\n", stacked_psr);
printf ("BFAR = %lx\n", (*((volatile unsigned long *)(0xE000ED38))));
printf ("CFSR = %lx\n", (*((volatile unsigned long *)(0xE000ED28))));
printf ("HFSR = %lx\n", (*((volatile unsigned long *)(0xE000ED2C))));
printf ("DFSR = %lx\n", (*((volatile unsigned long *)(0xE000ED30))));
printf ("AFSR = %lx\n", (*((volatile unsigned long *)(0xE000ED3C))));
printf ("SCB_SHCSR = %lx\n", SCB->SHCSR);

~~~ BFAR (0xE000ED38) = A014606C CFSR (0xE000ED28) = 00820000 HFSR (0xE000ED2C) = 00000040 DFSR (0xE000ED30) = 0B000000 AFSR (0xE000ED3C) = 00000000

Yes, I had the same thinking to put a unique handler for each interrupt and just not use the default at all to see which is the one that causing interrupt. Actually I am using only few of them. All look like this, for example : ~~~ .weak EXTI0IRQHandler .thumbset EXTI0IRQHandler,DefaultHandler ~~~

I will make separate isr for each interrupt.

Regards,


How to catch code that caused the hard fault

Posted by heinbali01 on July 11, 2017

I will make separate isr for each interrupt.

I also did that sometimes. You'll have to do a lot of careful typing, but it can reveil information on the problem.


How to catch code that caused the hard fault

Posted by alsaleem on July 11, 2017

Now, I got the hard fault running after making separate isr for each interrupt and removing default handler. Unfortunately the result is the same as indicared above. ~~~ R0 = a0146068 R1 = 200013a4 R2 = a0146068 R3 = a0146068 R12 = 0 LR [R14] = 800d1f1 subroutine call return address PC [R15] = 800c3bc program counter PSR = 100000f BFAR = a014606c CFSR = 8200 HFSR = 40000000 DFSR = a AFSR = 0 SCB_SHCSR = 800 ~~~ PC & LR indicates addresses inside the FreeRTOS code zone, please see notes before.from .map file.

Any idea?

Thanks


How to catch code that caused the hard fault

Posted by rtel on July 11, 2017

If you are 100% sure all your interrupt priorities are as per the FreeRTOS requirements (nothing with a logical priority above configMAXSYSCALLINTERRUPT_PRIORITY is calling any FreeRTOS API functions from an ISR), and you have checked everything in the "my application does not run, what could be wrong?" FAQ, then I suspect some form of data corruption. That is, something is writing over one of the RTOS data structures resulting in a hard fault when the structure is accessed.


How to catch code that caused the hard fault

Posted by alsaleem on July 11, 2017

per FreeRTOSConfig.h. ~~~

define configLIBRARYMAXSYSCALLINTERRUPTPRIORITY 5
define configMAXSYSCALLINTERRUPTPRIORITY ( configLIBRARYMAXSYSCALLINTERRUPTPRIORITY << (8 - configPRIOBITS) )

~~~ All of my interrupts (5 interrupts) have priority of 10 which is lower than FreeRTOS's. I am not calling any of the FreeRTOS functions inside them.

All my variables are global. I do not use malloc. I am using FreeRTOS heap4.c

Are any of the FreeRTOS variables (data structures) dependent on FreeRTOS's alocated stack?

Is there away to know which task that this hard fault appear into while on hard fault (code snippet)? I do not mind digging into memory, but this will give me clue on where that happen.

Thanks.


How to catch code that caused the hard fault

Posted by rtel on July 11, 2017

The pxCurrentTCB variable points to the TCB of the currently executing task. Depending on the debugger, you may have to cast it to a tskTCB type in the debugger watch window to see its internals, which includes its name:

(tskTCB*)pxCurrentTCB


How to catch code that caused the hard fault

Posted by alsaleem on July 11, 2017

From my curiosity while waiting the hard fault exception: (1) It is mentioned in this, re-quoted again > Also, some processors could generate a fault or exception in response to a stack corruption before the RTOS kernel overflow check can occur.

Can you suggest a method to detect this situation ?

(2) The below code is my implementation of the stack overflow check: ~~~ void vApplicationStackOverflowHook( TaskHandlet xTask, signed char *pcTaskName ) { printf("stack overflow in task id %lu, name: %s n", (uint32t)xTask, pcTaskName); } ~~~ Now, this function is used to report stack overflow and the same time it uses stack!! Could you suggest an implementation where I do not use a stack to print/report error message? Note: On my previous message on reporting the hard fault, LR is showing the address of the printf. I think this may lead to locating the cause.

(3) As mentioned : > Stack overflow is by far the most common source of support requests. The size of the stack available to a task is set using the usStackDepth parameter of the xTaskCreate() or xTaskCreateStatic() API function.

Suggestion : why do not make a safe space/threshold to report stack overflow before it gets into this delimma. Size may be a #define.

Thanks.


How to catch code that caused the hard fault

Posted by rtel on July 11, 2017

Can you suggest a method to detect this situation ?

Not easily.

(2) The below code is my implementation of the stack overflow check:

void vApplicationStackOverflowHook( TaskHandlet xTask, signed char *pcTaskName ) { printf("stack overflow in task id %lu, name: %s n", (uint32t)xTask, pcTaskName); }

You should not try to return from a stack overflow - it is a fatal error (unless you are using the MPU version, in which case the overflow is trapped before it occurs). You can implement the stack overflow hook simply as:

void vApplicationStackOverflowHook( TaskHandle_t xTask, signed char pcTaskName ) { // To ensure nothing else executes. DisableInterrupts(); // Psuedocode only.

 // To make sure this function never exits.
 for( ;; );

}

Then place a break point on the infinite loop.

(3) As mentioned :

Stack overflow is by far the most common source of support requests.
The size of the stack available to a task is set using the
usStackDepth parameter of the xTaskCreate() or xTaskCreateStatic()
API function.

Suggestion : why do not make a safe space/threshold to report stack overflow before it gets into this delimma. Size may be a #define.

There already is - its eating into the space/threashold that triggers the overflow hook (if the stack overflow configuration parameter is set to 2).


How to catch code that caused the hard fault

Posted by alsaleem on July 11, 2017

I got the hard fault I made a break point into vApplicationStackOverflowHook( TaskHandle_t xTask, signed char *pcTaskName ), ==> stopped there ==> hard fault

pxCurrentTCB points to a very simple task I created to show health on debug port as below. ~~~ void tskDum( void *pvParameters ) { TickType_t tickCnt; int i=0;

printf("dum: dum start ...\n");

for( ;; )
{
	tickCnt = xTaskGetTickCount();

	printf("dum: dum run %d, %d, %u\n", i, (int)tickCnt, rtcEpoc);
	i++;
	HAL_Delay(5000);
}

} ~~~ ~~~ tRet = xTaskCreate( tskDum, "dum", 200, NULL, 1, &hTaskDum); ~~~ HAL_Delay is STM32F4 HAL function. rtcEpoc is updated by rtc WKUP ISR (priority = 10).

~~~ [Hard fault handler - all numbers in hex] R0 = a0146068 R1 = 200013a4 R2 = a0146068 R3 = a0146068 R12 = 0 LR [R14] = 800d1f1 subroutine call return address PC [R15] = 800c3bc program counter (<==== RTOS code) PSR = 101000f BFAR = a014606c CFSR = 8200 HFSR = 40000000 DFSR = b AFSR = 0 SCB_SHCSR = 800 ~~~

Is the pxCurrentTCB really showing the current task? because this is a simple task.

Regards,


How to catch code that caused the hard fault

Posted by rtel on July 11, 2017

I'm not following. Are you saying you went into the stack overflow hook, and then (while inside the hook) a hard fault is generated? If so then it sounds like the hard fault is generated either by the code in the overflow hook function or when attempting to return from the overflow hook function (see my previous reply).

It may be a simple function, but it is using printf() - and printf() can, depending on the implementation of the library, use masses of stack space. That is why embedded systems often have cut down versions of printf/sprintf - you will find such cut down versions in the FreeRTOS download.


How to catch code that caused the hard fault

Posted by alsaleem on July 11, 2017

Yes, I am returning from stack overflow hook. And, true, the hard fault maybe be caused by printf code in it. I am using the stdio printf(). But does it keep the allocated space (i.e memory leak) (not freed)? because it runs for 5 hours.

Regards,


How to catch code that caused the hard fault

Posted by rtel on July 11, 2017

Yes, I am returning from stack overflow hook.

I have said twice not to do that.

And, true, the hard fault maybe be caused by printf code in it. I am using the stdio printf(). But does it keep the allocated space (i.e memory leak) (not freed)? because it runs for 5 hours.

printf() will use stack, and may use the heap. If it uses the stack then the stack space will be returned when the function exits. printf() often also calls malloc(), perhaps unexpectedly for some developers, in which case the memory it allocates should be freed again - assuming there are no bugs in the printf() implementation.

However, you are missing the point - the overflow hook is only called AFTER the stack has already overflowed. Calling printf() when you know there is no stack space cannot be recommend. In fact, calling printf() in a small embedded system is rarely recommended at all unless you know how it is implemented. For example, it is very unlikely to be thread safe.


How to catch code that caused the hard fault

Posted by alsaleem on July 12, 2017

Thanks. I have disabled the printf and run again to verify printf is the problem.

I see you recommend (printf-stdarg.c) for printf. I will use it to see if it is a printf problem or other, I am using sprintf in other tasks too

I do not know if the one (gcc) I have is thread-safe, does not leak memory, or bug-free.

Regards,


How to catch code that caused the hard fault

Posted by heinbali01 on July 12, 2017

On this forum, a lot has been written about the pros and cons of the standard [v][s][n]printf() family. For instance here where I also attached the latest version of printf-stdarg.c,

That implementation is using stack only. And although it is unusual to do, the string-formatting functions can be used from within an ISR.


How to catch code that caused the hard fault

Posted by alsaleem on July 12, 2017

Thanks. It's been running for over 12 hours now. I have included printf-stdarg.c file in my project. I am trying to remove reference to stdio.h from my project.

Thanks and appreciate your help.


[ Back to the top ]    [ About FreeRTOS ]    [ Sitemap ]    [ ]




Copyright (C) 2004-2010 Richard Barry. Copyright (C) 2010-2016 Real Time Engineers Ltd.
Any and all data, files, source code, html content and documentation included in the FreeRTOSTM distribution or available on this site are the exclusive property of Real Time Engineers Ltd.. See the files license.txt (included in the distribution) and this copyright notice for more information. FreeRTOSTM and FreeRTOS.orgTM are trade marks of Real Time Engineers Ltd.

Latest News:

FreeRTOS V9.0.0 is now available for download.


Free TCP/IP and file system demos for the RTOS


Sponsored Links

⇓ Now With No Code Size Limit! ⇓
⇑ Free Download Without Registering ⇑


FreeRTOS Partners

ARM Connected RTOS partner for all ARM microcontroller cores

Renesas Electronics Gold Alliance RTOS Partner.jpg

Microchip Premier RTOS Partner

RTOS partner of NXP for all NXP ARM microcontrollers

Atmel RTOS partner supporting ARM Cortex-M3 and AVR32 microcontrollers

STMicro RTOS partner supporting ARM7, ARM Cortex-M3, ARM Cortex-M4 and ARM Cortex-M0

Xilinx Microblaze and Zynq partner

Silicon Labs low power RTOS partner

Altera RTOS partner for Nios II and Cortex-A9 SoC

Freescale Alliance RTOS Member supporting ARM and ColdFire microcontrollers

Infineon ARM Cortex-M microcontrollers

Texas Instruments MCU Developer Network RTOS partner for ARM and MSP430 microcontrollers

Cypress RTOS partner supporting ARM Cortex-M3

Fujitsu RTOS partner supporting ARM Cortex-M3 and FM3

Microsemi (previously Actel) RTOS partner supporting ARM Cortex-M3

Atollic Partner

IAR Partner

Keil ARM Partner

Embedded Artists