Cortex-M0 LPC1114 port fails its testsuite

It appears that the FreeRTOS 7.3.0 port of CORTEX_M0_LPC1114 LPCXpresso board fails the built-in test suite during endurance tests. It fails after a varying time ranging from a couple of hours to about 36 hours in the cases I have seen so far. I have used the  LPCXpresso tool chain  v5.0.12_1083 from CodeRed and an unmodified LPCXpresso devkit LPC1114 from Embeded Artists. I extract the relevant demo from the FreeRTOS 7.3.0 distribution, run the CreateProjectDirStruct.bat and import the files into a fresh workspace. Remove the two instances of #error warnings. Build and debug. I do not change any compiler options – i.e. as much out-of-the box as possible. The LED blinks as expected. Theendurance tests are performed with this programmed image but without an attached debugger – the board simply receives power via an USB charger and is checked visually now and then. After a varying time – as described, the LED blinks faster indicating that one of the tasks has failed. Which one, I don’t know for the moment – further tests will have to be done either with the debugger attached or changes to the code. It would be great if someone else could repeat this test. I have only one board so I can not exclude hardware related issues, although I think it is most likely to be a software issue, because a port of the same test suite to an LPC11U35 (also M0) (Embedded Artists QuickStart) board shows similar failures, albeit only having from minutes to couple of hours of operation before failure.
Thanks for any advice and test results you may have,
Henning

Cortex-M0 LPC1114 port fails its testsuite

If the LEDs are still blinking then the kernel is still running – but (as you say) it indicates that one of the tasks has reported an error.  I will need to know which task is reporting the error before making any conclusions.  If it is one of the reg test tasks then that is likely to be a genuine problem – otherwise it is more likely to be a self checking margin being breached in one of the test tasks. The demo applications tend to load the CPU heavily.  Several of the tasks will measure block times to ensure they fall within expected margins.  If a block time is deemed to be too long then the task reports an error.  However, the execution pattern depends on several things.  For example, self checking tasks of this type have to be run at a high priority relative to other tasks to ensure they have a chance of running within their expected margin of error. If you are able to isolate which task is reporting a problem I can suggest ways of diagnosing the cause (be it a genuine issue or otherwise).  One way of doing that is to have each error condition in the check task (or timer) set a unique bit in a variable when it is triggered.  Then put a break point on the line of code at the end of the check task (or timer) that increases the LED toggle rate when an error is detected.  When the break point is hit inspect the variable used to log error bits to see test was the cause of the reported error. Regards.

Cortex-M0 LPC1114 port fails its testsuite

Thanks for a rapid reply.
In the test suite for LPC1114 LPCXpresso port of FreeRTOS: ‘InterruptQueueTasks, BlockTimeTestTasks, RecursiveMutexTasks’ all fails after few hours to days.
CountingSemaphoreTasks, Reg1TestTask and Reg2TestTask do not fail.
I can not tell in which order they fail as I am just latching the state into IO ports and measure with a voltmeter. The fail occurred after about 8 hours and the most recent test after just 2 hours.
Exactly same faulting tasks, as the LPC11U35 port I am working on  which however fails much more frequent. With LPC11U35 I found that excluding the InterruptQueueTasks eliminates further fails. InterruptQueueTasks alone fails as when all the other  tests are included. I never saw RegXTest failing.
Both the LPC11U35 and LPC1114 ports run with exactly the same clock, ram and test suites.
I did do a bit of debugging on LPC11U35 and there the faults are recoded to happen in the test InterruptQueueTasks at:
if( xQueueSend( xNormallyFullQueue, &uxValueToTx, intqSHORT_DELAY ) != pdPASS )
{
/* intqHIGH_PRIORITY_TASK2 is never suspended so we would not
expect it to ever time out. */
prvQueueAccessLogError( __LINE__ );
} alternating with fails in this code section: /* Start at 1 as we expect position 0 to be unused. */
for( ux = 1; ux < intqNUM_VALUES_TO_LOG; ux++ )
{
if( ucNormallyFullReceivedValues == 0 )
{
/* A value was missing. */
prvQueueAccessLogError( __LINE__ );
}
else if( ucNormallyFullReceivedValues == intqSECOND_INTERRUPT )
{
uxInterrupts++;
}
} Any suggestions of what it might be? best regards
ps. I can of course try to debug on lpc1114 target, but it takes a lot of time to reach the error condition.

Cortex-M0 LPC1114 port fails its testsuite

In the test suite for LPC1114 LPCXpresso port of FreeRTOS: ‘InterruptQueueTasks, BlockTimeTestTasks, RecursiveMutexTasks’ all fails after few hours to days.
Ok – more on that below.
CountingSemaphoreTasks, Reg1TestTask and Reg2TestTask do not fail.
That is good news. While the standard demo tasks predominantly test the scheduler, rather than the scheduler’s port to particular hardware, the RegxTestTasks very much test the hardware port itself.  If these are not failing then it shows that the context switched in when a task starts to run exactly matches that switched out when it last ran. The InterruptQueueTasks and the BlockTimeTasks are always the most likely to fail, and I think these are false positives.  The failure shows that the hardware is not keeping up with the tight demands of the test, rather than the port not working (hopefully, nothing is ever guaranteed). The BlockTimeTasks are one of the task sets that measures a time before blocking on various events under different conditions, then measures the time again when it next runs (unblocks) before comparing the time after to the time before to see if it is within a reasonable bound.  You can see that if the bounds are set to be marginal then things like higher priority tasks running, or additional interrupt loads can cause the bounds to be breached momentarily – and the error communicated to the check task.  This is a standard test, so the bounds set are common to all hardware, and are therefore more marginal on the smaller CPUs like the M0s than the larger CPUs like the M3s.  . The Int Queue tasks and interrupts inject a heavy(ish) interrupt load, but more importantly, a bit of pseudo randomness into the execution pattern.  The two errors you are seeing are in fact related – one causes the other.  The tests themselves are actually very convoluted and not easy to follow, but they do provide excellent test coverage.  They too are really designed to run on ports with a full nesting model, whereas the M0 has (if I recall correctly) a slightly simpler nesting model due to the limitations in the hardware itself (when compared to its larger M3 and M4 cousins). Regards.

Cortex-M0 LPC1114 port fails its testsuite

Ok, again thanks for a rapid and comprehensive reply. I am relieved that you think it is only the test suites which fails and not the OS port(s). Rather than I try with trial and error to figure out how to tweak the bounds in order to get a working test suite, can you give me a hint as to which values to try to change? These tests are pretty complex to follow and testing time can be long so a good starting point is essential. Thanks, Henning In intQueue.c I see those constants (copy paste from code): /* The number of values to send/receive before checking that all values were processed as expected. */
#define intqNUM_VALUES_TO_LOG ( 200 )
#define intqSHORT_DELAY ( 75 ) /* The value by which the value being sent to or received from a queue should
increment past intqNUM_VALUES_TO_LOG before we check that all values have been
sent/received correctly.  This is done to ensure that all tasks and interrupts
accessing the queue have completed their accesses with the
intqNUM_VALUES_TO_LOG range. */
#define intqVALUE_OVERRUN ( 50 )
#define intqONE_TICK_DELAY ( 1 ) //The delay used by the polling task.  A short delay for code coverage /* At least intqMIN_ACCEPTABLE_TASK_COUNT values should be sent to/received
from each queue by each task, otherwise an error is detected. */
#define intqMIN_ACCEPTABLE_TASK_COUNT ( 5 ) For the RecMutex.c I see those constants: #define recmuSHORT_DELAY ( 20 / portTICK_RATE_MS )
#define recmuNO_DELAY ( ( portTickType ) 0 )
#define recmuTWO_TICK_DELAY ( ( portTickType ) 2 ) For BlockTim.c: /* Task behaviour. */
#define bktQUEUE_LENGTH ( 5 )
#define bktSHORT_WAIT ( ( ( portTickType ) 20 ) / portTICK_RATE_MS )
#define bktPRIMARY_BLOCK_TIME ( 10 )
#define bktALLOWABLE_MARGIN ( 15 )
#define bktTIME_TO_BLOCK ( 175 )
#define bktDONT_BLOCK ( ( portTickType ) 0 )
#define bktRUN_INDICATOR ( ( unsigned portBASE_TYPE ) 0x55 )

Cortex-M0 LPC1114 port fails its testsuite

I am relieved that you think it is only the test suites which fails and not the OS port(s).
Yes that is my belief, and it comes from a lot of experience, but nothing can be guaranteed. It would take a while to work out the others, but for the block time test the relevant constant is bktALLOWABLE_MARGIN – the value should be increased. Regards.

Cortex-M0 LPC1114 port fails its testsuite

After having done some tests with different timing margins and timeout constants I can only conclude that it is very time consuming as an error may appear only after several hours. So a better thought out strategy than lets-see-what-happens-when is needed, and I relie on input from people with better insight in the test suites. As a comment to Richards note
The Int Queue tasks and interrupts inject a heavy(ish) interrupt load,..  They too are really designed to run on ports with a full nesting model, whereas the M0 has (if I recall correctly) a slightly simpler nesting model due to the limitations in the hardware itself (when compared to its larger M3 and M4 cousins).
The LPC1114 and LPC11Ux have programmable interrupt priorities of 4 levels or 2 bits. (”for the brave”, as Richard says in a recent thread, the following does provide better insight: http://www.freertos.org/RTOS-Cortex-M3-M4.html), so nesting is supported in LPC11x, however the current FreeRTOS 7.3.0 port of LPC1114 does not expoit this – priorities are left at default codevalue 0 which is highest priority..
In vPortEnterCritical/vPortExitCritical(), uxCriticalNesting is never seen to be more than 1 (after scheduler has been stated). This confirms the lack of irq nesting. Apart from Reset>NMI>HardFault nesting of course.
Maybe this fact should be borne in mind when evaluating the test suite. With time – a port which supports nesting would be great. My guess is that much from LPC’s based on CortexM3 can be reused.
Regards
Henning