CortexM3/PSoC-5LP : Trouble identifying the task which caused an MPU exception

Hello, I’m implementing a “blue screen of death” for a PSoC-5LP (ARM Cortex M3 core) based product, and I’ve encountered a problem identifying which task was the offender that caused the stack overflow or other memory protection exception that I’m reporting in my BSoD. I’m using FreeRTOS v7.2.0 . Right now, I’m catching stack overflows and other memory errors based on a two-tiered approach: I have a vApplicationStackOverflowHook() function implemented (with configCHECKFORSTACK_OVERFLOW set to 2 in FreeRTOSConfig.h) so that the RTOS can do its stack size and stack canary checks when each task is swapped out, and I have an interrupt handler set up for the MPU exception interrupt (position 3 in the interrupt vector) to catch MPU exceptions (“hard faults”). The MPU exception interrupt handler triggers reliably, but I can’t get it to identify the offending task properly. I’m using xTaskGetCurrentTaskHandle() and pcGetTaskName() to spit the task name to the display, and manually triggering faults for testing purposes by reducing the task stack of a selected task to something tiny (eg. 10 words for a task whose stack usage is around 320 words) which is sure to cause a fault. The problem is the task name I get never matches the task that I intentionally made to overflow its stack, and sometimes is gibberish (which I assume means either that xTaskGetCurrentTaskHandle returned the handle for the idle task, which doesn’t have a valid task name string pointer in its tcb, or that the tcb for the task which overflowed its stack has been corrupted). Nb. the vApplicationStackOverflowHook function never seems to be called at all, even when I purposefully trigger a much smaller stack overflow (eg 4 bytes) which “shouldn’t” make the MPU angry – it’s always the interrupt handler triggered by an MPU exception that gets called. I’ve got a few speculative ideas as to what might be going on (along the lines of “the mis-sized stack for TaskA is really causing TaskB to access memory in a region it can’t”), but I haven’t dug into the guts of the RTOS to the point where I’m confident my mental model of what’s going on is right in this case, and the debugger that Cypress ships with the IDE is nearly worthless. Does anyone have an idea as to what else might be causing my problem? Thanks.

CortexM3/PSoC-5LP : Trouble identifying the task which caused an MPU exception

Hi, I working on M4 MPU (the same apply to the M3) and I wrote some fonction to help me. I use GDB so the debugging is probably better than cypress I don’t know. My function was inspired from others and my reading on the hardfault to get an I dea of what is going on when we hit the fault. //// code here ///// void printErrorMsg(const char * errMsg); void printUsageErrorMsg(uint32t CFSRValue); void printBusFaultErrorMsg(uint32t CFSRValue); void printMemoryManagementErrorMsg(uint32t CFSRValue); void stackDump(uint32t stack[]); void HardFaultmpudump(void) { static char msg[200]; int i = 0; uint32t *mpurnr = (uint32t *) 0xE000ED98; uint32t *mpu_base = (uint32_t *) 0xE000ED9C; uint32_t *mpu_attr = (uint32_t *) 0xE000EDA0;
 //First disable the UART interrupt


 printErrorMsg("-------MPU------rn");
 printErrorMsg("Privileged default memory map is ");
 if(MPU->CTRL & 0x01 == 0x01)
     printErrorMsg("Enabledrn");
 else
     printErrorMsg("Disabledrn");

 for (i = 0; i < 8; i++)
 {
     *mpu_rnr = i;
     if (*mpu_attr & 0x1) //If region is activated
     {
         sprintk(msg,"b:0x%08p - 0x%08p, sz:2**%d (%d), attr",    
mpu_base & 0xFFFFFFE0, (mpubase & 0xFFFFFFE0) + (1 << ((*mpuattr & 0x3E) >> 1) + 1), ((mpu_attr & 0x3E) >> 1) + 1, 1 << ((mpuattr & 0x3E) >> 1) + 1); printErrorMsg(msg); if(((*mpuattr) & 0x07000000)== portMPUREGIONREADWRITE) { printErrorMsg(“: P-RW, U-RW”); } else if(((*mpuattr) & 0x07000000)== portMPUREGIONPRIVILEGEDREADWRITEUSERREADONLY) { printErrorMsg(“: P-RW, U-RO”); } else if(((*mpuattr) & 0x07000000) == portMPUREGIONPRIVILEGEDREADONLY) { printErrorMsg(“: P+RO”); } else if(((mpu_attr) & 0x07000000) == portMPU_REGION_READ_ONLY) { printErrorMsg(“: P-RO, U-RO”); } else if(((mpuattr) & 0x07000000) == portMPUREGIONPRIVILEGEDREAD_WRITE) { printErrorMsg(“: P-RW”); } printErrorMsg(“rn”); } } } void HardFaultHandler(uint32_t stack[]) { static char msg[80]; printErrorMsg(“x1B[2J”);
 printErrorMsg("In Hard Fault Handlerrn");
 sprintk(msg, "SCB->HFSR = 0x%08xrn", SCB->HFSR);
 printErrorMsg(msg);
 if ((SCB->HFSR & (1 << 30)) != 0)
 {
  printErrorMsg("Forced Hard Faultrn");
 }
  sprintf(msg, "SCB->CFSR = 0x%08xrn", SCB->CFSR );
  printErrorMsg(msg);
  if((SCB->CFSR & 0xFFFF0000) != 0) {
     printUsageErrorMsg(SCB->CFSR);
  }
  if((SCB->CFSR & 0xFF00) != 0) {
     printBusFaultErrorMsg(SCB->CFSR);
  }
  if((SCB->CFSR & 0xFF) != 0) {
     printMemoryManagementErrorMsg(SCB->CFSR);
  }

  stackDump(stack);
  HardFault_mpu_dump();
  __asm("BKPT #0rn") ; // Break into the debugger

  while(1);
} void printErrorMsg(const char * errMsg) { while(errMsg != ‘’) { while (!(COM2_PERIPHERAL->SR & USART_SR_TXE)); COM2_PERIPHERAL->DR = (errMsg & 0x1FF); ++errMsg; } } void printUsageErrorMsg(uint32_t CFSRValue) { printErrorMsg(“Usage fault: “); CFSRValue >>= 16; // right shift to lsb
 if((CFSRValue & (1<<9)) != 0)
 {
   printErrorMsg("Divide by zerorn");
 }
} void printBusFaultErrorMsg(uint32_t CFSRValue) { static char buf[200]; printErrorMsg(“Bus fault: rn”); if((CFSRValue & (1 << 0)) == (1 << 0)) //IACCVIOL printErrorMsg(“–>Instruction access violationrn”); if((CFSRValue & (1 << 1)) == (1 << 1)) //DACCVIOL printErrorMsg(“–>Data access violationrn”); if((CFSRValue & (1 << 8)) == (1 << 8)) //IBUSERR printErrorMsg(“–>Instruction bus errorrn”); if((CFSRValue & (1 << 9)) == (1 << 9)) //PRECISERR printErrorMsg(“–>Precise data bus errorrn”); if((CFSRValue & (1 << 10)) == (1 << 10)) //PRECISERR printErrorMsg(“–>Imprecise data bus errorrn”); if((CFSRValue & (1 << 11)) == (1 << 11)) //UNSTKERR printErrorMsg(“–>Bus fault on unstacking for a return from exceptionrn”); if((CFSRValue & (1 << 12)) == (1 << 12)) //STKERR printErrorMsg(“–>Bus fault on stacking for exception entryrn”); if((CFSRValue & (1 << 13)) == (1 << 13)) //LSPERR printErrorMsg(“–>Bus fault on floating-point lazy state preservationrn”); if((CFSRValue & (1 << 15)) == (1 << 15)) //BFARVALID { printErrorMsg(“–>Bus fault adress register validrn”); sprintk(buf, “—-> 0x%08X <—— Fault addressrn”, SCB->BFAR); printErrorMsg(buf); } printErrorMsg( “Bus faults occur when an error response is received on the AHB bus. The common causes are as follows:nr” “Attempts to access an invalid memory region (for example, a memory location with no memory attached)nr” “The device is not ready to accept a transfer (for example, trying to access SDRAM without initializing thenr” “SDRAM controller)nr” “Attempts to carry out a transfer with a transfer size not supported by the target device (for example, doing anr” “byte access to a peripheral register that must be accessed as a word)nr” “The device does not accept the transfer for various reasons (for example, a peripheral that can only benr” “programmed at the privileged access level)nr”); } void printMemoryManagementErrorMsg(uint32_t CFSRValue) { static char buf[200]; printErrorMsg(“Memory Management fault: rn”); CFSRValue &= 0x000000FF; // mask just mem faults if((CFSRValue & 0x01) == 0x01) //IACCVIOL printErrorMsg(“–>Instruction access violationrn”); if((CFSRValue & 0x02) == 0x02) //DACCVIOL printErrorMsg(“–>Data access violationrn”); if((CFSRValue & 0x08) == 0x08) //MUNSTKERR printErrorMsg(“–>Memory manager fault on unstacking for a return from exceptionrn”); if((CFSRValue & 0x10) == 0x10) //MSTKERR printErrorMsg(“tMemory manager fault on stacking for exception entryrn”); if((CFSRValue & 0x20) == 0x20) //MLSPERR printErrorMsg(“–>Memory manager fault on floating point lazy state preservationrn”); if((CFSRValue & 0x80) == 0x80) //MMARVALID { printErrorMsg(“–>Memory manager fault adress register validrn”); sprintk(buf, “—-> 0x%08X <—— Fault addressrn”, SCB->MMFAR); printErrorMsg(buf); } } enum { r0, r1, r2, r3, r12, lr, pc, psr}; void stackDump(uint32_t stack[]) { static char msg[200]; sprintk(msg, “r0 = 0x%08xrn”, stack[r0]); printErrorMsg(msg); sprintk(msg, “r1 = 0x%08xrn”, stack[r1]); printErrorMsg(msg); sprintk(msg, “r2 = 0x%08xrn”, stack[r2]); printErrorMsg(msg); sprintk(msg, “r3 = 0x%08xrn”, stack[r3]); printErrorMsg(msg); sprintk(msg, “r12 = 0x%08xrn”, stack[r12]); printErrorMsg(msg); sprintk(msg, “lr = 0x%08x <– In gdb “list *0x%08x” to get the source code rn”, stack[lr], stack[lr]); printErrorMsg(msg); sprintk(msg, “pc = 0x%08x <– In gdb “list *0x%08x” to get the source code rn”, stack[pc], stack[pc]); printErrorMsg(msg); sprintk(msg, “psr = 0x%08xrn”, stack[psr]); printErrorMsg(msg); } // Use the ‘naked’ attribute so that C stacking is not used. attribute((naked)) void HardFault_Handler(void){ /* * Get the appropriate stack pointer, depending on our mode, * and use it as the parameter to the C handler. This function * will never return */ __asm( “TST lr, #4 rn” “ITE EQ rn” “MRSEQ r0, MSP rn” “MRSNE r0, PSP rn” “B Hard_Fault_Handler rn”); } attribute((naked)) void MemManage_Handler(void){ /* * Get the appropriate stack pointer, depending on our mode, * and use it as the parameter to the C handler. This function * will never return */ __asm( “TST lr, #4 rn” “ITE EQ rn” “MRSEQ r0, MSP rn” “MRSNE r0, PSP rn” “B Hard_Fault_Handler rn”); } void BusFault_Handler(void) { /* * Get the appropriate stack pointer, depending on our mode, * and use it as the parameter to the C handler. This function * will never return */ __asm( “TST lr, #4 rn” “ITE EQ rn” “MRSEQ r0, MSP rn” “MRSNE r0, PSP rn” “B Hard_Fault_Handler rn”); } void BusFault_Handler(void) { /* * Get the appropriate stack pointer, depending on our mode, * and use it as the parameter to the C handler. This function * will never return */ __asm( “TST lr, #4 rn” “ITE EQ rn” “MRSEQ r0, MSP rn” “MRSNE r0, PSP rn” “B Hard_Fault_Handler rn”); } //// end of code ////// This should help you a little bit. You will need to adapt it to your env though. Hope this will help ! Jonathan Le 2014-01-09 10:23, Mike Heise a écrit : >
Hello, I’m implementing a “blue screen of death” for a PSoC-5LP (ARM Cortex M3 core) based product, and I’ve encountered a problem identifying which task was the offender that caused the stack overflow or other memory protection exception that I’m reporting in my BSoD. I’m using FreeRTOS v7.2.0 . Right now, I’m catching stack overflows and other memory errors based on a two-tiered approach: I have a vApplicationStackOverflowHook() function implemented (with configCHECKFORSTACK_OVERFLOW set to 2 in FreeRTOSConfig.h) so that the RTOS can do its stack size and stack canary checks when each task is swapped out, and I have an interrupt handler set up for the MPU exception interrupt (position 3 in the interrupt vector) to catch MPU exceptions (“hard faults”). The MPU exception interrupt handler triggers reliably, but I can’t get it to identify the offending task properly. I’m using xTaskGetCurrentTaskHandle() and pcGetTaskName() to spit the task name to the display, and manually triggering faults for testing purposes by reducing the task stack of a selected task to something tiny (eg. 10 words for a task whose stack usage is around 320 words) which is sure to cause a fault. The problem is the task name I get never matches the task that I intentionally made to overflow its stack, and sometimes is gibberish (which I assume means either that xTaskGetCurrentTaskHandle returned the handle for the idle task, which doesn’t have a valid task name string pointer in its tcb, or that the tcb for the task which overflowed its stack has been corrupted). Nb. the vApplicationStackOverflowHook function never seems to be called at all, even when I purposefully trigger a much smaller stack overflow (eg 4 bytes) which “shouldn’t” make the MPU angry – it’s always the interrupt handler triggered by an MPU exception that gets called. I’ve got a few speculative ideas as to what might be going on (along the lines of “the mis-sized stack for TaskA is really causing TaskB to access memory in a region it can’t”), but I haven’t dug into the guts of the RTOS to the point where I’m confident my mental model of what’s going on is right in this case, and the debugger that Cypress ships with the IDE is nearly worthless. Does anyone have an idea as to what else might be causing my problem? Thanks.
CortexM3/PSoC-5LP : Trouble identifying the task which caused an MPU exception https://sourceforge.net/p/freertos/discussion/382005/thread/43c183a5/?limit=25#383f
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/freertos/discussion/382005/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ — This message has been scanned for viruses and dangerous content by MailScanner http://www.mailscanner.info/, and is believed to be clean.

Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com

CortexM3/PSoC-5LP : Trouble identifying the task which caused an MPU exception

Thanks for the reply, jonathan. I didn’t wind up doing as comprehensive of an info dump as you did in my MPU exception handler(my output destination for the BSoD of the device in question is a rather small LCD), but seeing your implementation was helpful. I’m still curious why the task handle that the API function returned when called from the ISR doesn’t match the task that I expect caused the fault, but I’ve kind of punted on the issue under time pressure from $BOSS, and now I’m working on adding stack high water marks to vTaskGetRunTimeStats() such that I can allocate task stacks a little more intelligently and (hopefully) avoid BSoDs “in the wild” altogether.

CortexM3/PSoC-5LP : Trouble identifying the task which caused an MPU exception

Ref why the task name can be corrupted: Depending on the direction of stack growth, that can occur when the stack overflow hits the task control block (in which the task name is stored). It is a good point, ideally the order in which the stack and stack control block are allocated should depend on the direction of stack growth to ensure that never happens. You can always get the handle of the offending task by inspecting the pxCurrentTCB variable – which can be externed as a void * outside of the tasks.c file. Ref why a different task handle would be returned: No idea I’m afraid. Ref why the memory fault exception occurs before stack overflow detection: The fault is a hardware trap that occurs before the stack overflow occurs, and because of that, can technically be recovered from. The software stack overflow occurs after the stack has already overflowed (or at least come extremely close to overflowing), so can’t really be recovered from as you may not know what was corrupted by the overflow. Regards.