Task stops running without reason ?

Using CC3200 with v8.2.3 with 7 user tasks. all running fine for a long time. Changed NTP from being a single function call at start to become a very simple independant task. Task interval starts at 1min, incrementing every time it is run with an extra minute until the interval is 16 minutes. What I find is that the task stops running after a random period or number of runs, sometimes after 1, 5 or 10 events, the last time ran 95 times over about 22 hours before stopping. All other tasks keep running with intervals ranging between 20mS and 60sec, I have looked at the task status (2=Blocked) all looks fine Every NTP event shows in syslog at the correct incrementing interval, right until when it stops. No errors are recorded by the very simple task, anywhere along the execution path. No sense in a breakpoint on the return from vTaskDelayUntil() sine it does not return Any suggestions where to start looking?

Task stops running without reason ?

There is always a reason, its just that sometimes we don’t know (at first!) what it is :o) The only part of the software you are running that is provided by ourselves is the RTOS – so I’m afraid we cannot even begin to consider what might be going on in the NTP software. Comments will therefore necessarily be limited to trying to determine what the task is doing, rather than what the NTP is doing. Have you read through the first page of the FAQ? http://www.freertos.org/FAQHelp.html In particular the points about setting interrupt priorities? Interrupt priorities are the normal source of such issues. Do you have configASSERT() defined, as that will catch most interrupt misconfiguration issues. http://www.freertos.org/a00110.html#configASSERT If its not the above, then read on: You imply that the task enters a call to vTaskDelay() but never leaves it again. If that is the case then we need to look deep into the internal structures so we can work out why that might be. It could be that another task has corrupted something, or that the task in question does not have the priority of any interrupts it is relying on set correctly. Let me know if this is the case and I can assist further. Are you sure it is inside the vTaskDelay() function? If not, then read on again: Some of the demos we provide ourselves using our own stack (http://www.freertos.org/tcp) run multiple tasks that make use of the TCP/IP stack. We have in the past seen situations where one task will apparently stop, while all the other tasks that are using the TCP/IP stack continue functioning correctly. When debugging this we have found that the faulty task has got stuck in driver code (not written by us I hasten to add ;o) that is performing a polling operation without a proper time out. Even when the poll operation is using a blocking call, so does not consume all the CPU time, it just effectively deadlocks. Is your task using a driver (maybe a clock, or something similar) that is not used by any other task?

Task stops running without reason ?

configASSERT is enabled and working, have tested by changing some priorities to trigger. definately not returning from vTaskDelay(), set breakpoints to check. The task runs perfectly, updates time and returns with no error. All socketes are closed at the end of the update, the task run counter is update,an NTP update message message is sent, then the task goes to sleep again. If any error occurs, the task error counter is incremented and the error time, code, module and line is logged to syslog. The example when it ran 95 times the sleep periods (after the 1st run) were: 1->2->3->4->5->6->7->8->9->10->11->12->13->14->15 then 79x 16sec over a ~22hr period. When I checked the task run counter was 95, the task error counter 0 and the syslog for this device showed no errors after the last successful NTP entry. Problem is it happens very randomly, so the question is when it happens, where do I start looking…

Task stops running without reason ?

You said in your first post that the task in question was reported as being in the Blocked state, so the first thing we need to do is determine what time the task is expecting to unblock. This is a little fiddly. 1) When you create the task save the task’s handle in a variable that can be viewed in the debugger. Lets assume the variable is called MyTask. 2) Run the code for a short while, then pause the debugger. 3) Inspect the MyTask variable in whichever window your debugger provides for this (watch/symbols/expressions/whatever). You want to see the variable as a structure – if the debugger instead shows it as a void pointer with a single address, then cast the variable to a tskTCB pointer, as shown in the attached image. 4) Expand the structure in the debugger, you are looking for the xGenericListItem->xItemValue member, again as shown in the image attached (the red number). Between calling vTaskDelayUntil and exiting the vTaskDelayUntil function the value highlighted is the tick time at which the task should leave the blocked state. 5) Once you are confident you can inspect the structure successfully, start the debugger running again, and wait for the error to occur. Once the error has occurred again pause the debugger and report back both the following values:
  • MyTask->xGenericListItem.xItemValue
  • xTickCount
  • MyTask->xGenericListItem.pvContainer
  • pxDelayedTaskList
  • pxOverflowDelayedTaskList

Task stops running without reason ?

Apologies, but cannot find the attached image… What I have done is below… ~~~~ void vNtpTask(void *pvParameters) { _i32 iRetVal ; TaskHandle_t * pMyTask ; pMyTask = TaskInfo[TASKIDX_NTP].TaskHandle ; f64MinMaxAvgInit(&mmaNtpRatio) ; TaskInfo[TASKIDX_NTP].LastWakeTime = xTaskGetTickCount(); // Get the current time as a reference to start our delays. TaskInfo[TASKIDX_NTP].Interval = 0 ; while (true) { // first wait till IP is up and running vIpStackWaitTillRunning() ; if ((iRetVal = xNtpGetTime(&CurSecs)) >= 0) { xReportEvent(EVENT_TIME_UPDATE_NTP) ; TaskInfo[TASKIDX_NTP].TaskCount++ ; if (TaskInfo[TASKIDX_NTP].Interval < NTP_TASK_PERIOD_MS_MAX) { TaskInfo[TASKIDX_NTP].Interval += NTP_TASK_PERIOD_MS_MIN ; } else { TaskInfo[TASKIDX_NTP].Interval = NTP_TASK_PERIOD_MS_MAX ; } } else { LOG_ERROR(iRetVal) ; xErrorHandlerAlert() ; TaskInfo[TASKIDX_NTP].Interval = NTP_TASK_PERIOD_MS_MIN ; } vTaskDelayUntil(&TaskInfo[TASKIDX_NTP].LastWakeTime, TaskInfo[TASKIDX_NTP].Interval) ; } } ~~~~ Using CCS 6.1.2.14 but cannot do the cast to (tskTCB *)
Tried with pMyTask as both local and global variable.
Get "Error: cannot load from non-primitive location"

Task stops running without reason ?

That is because I forgot to attach it :o) It is attached to this post.

Task stops running without reason ?

Got it.. results attached below

Task stops running without reason ?

Looks good – now we just need the number after the issue has occurred. The idea is to see when the task thinks it should be waking up, compared to the current time.

Task stops running without reason ?

Sorry hold a bit, need to get the xTickCount and the 2 task lists as well. How shall I capture the 2 tasks lists, what level of detail..

Task stops running without reason ?

Don’t forget to expand the xGenericTaskList structure in the debugger, so I can see the values of its members. The pxDelayedTaskList and pxOverflowDelayedTaskList values do not need to be expanded. I just want to know their values – that is – what they are pointing to. Some explanation: When the task blocks the xItemValue member is set to the time at which the task needs to unblock, and the task is placed into a delayed task list. There are two delayed task lists, one for wake times that have not wrapped around, and one for wake times that have wrapped around. The information you are going to provide will tell me the time at which the task should be unblocked, and the current time. To complete the picture I then need to know if the time at which the task should unblock has wrapped from the current time, or not – that information comes from MyTask->xGenericListItem.pvContainer value – which shows which delayed list the task is in.

Task stops running without reason ?

Here we go…

Task stops running without reason ?

Unfortunately you have expanded the xEventListItem rather than the xGenericListItem. Could you take another snapshot please.

Task stops running without reason ?

Apologies…

Task stops running without reason ?

What this tells us:
  • The task should unblock when the tick count is 180838.
  • The current tick count is 1099648.
  • The that is in the overflow delayed list.
The fact that the unblock time is less than the current tick count means the task should be in the overflow delayed list – which indeed it is. As it is, the task will unblock after the tick count has reached 0xffffffff (assuming a 32-bit tick count), wrapped back to 0, and then counted up to 180838 – which will be a very long time into the future (assuming a reasonable tick frequency). So at the moment it appears as if, when the task entered the blocked state, it asked for an extraordinarily long block time. Could this be an arithmetic overflow problem in your code – by which I mean, are you holding times in signed variables instead of unsigned variables? The TickTypet type should be used (which in your case is presumably equivalent to uint32t). I presume you are confident the tick rate is running at the expected value, because you have been observing the time at which the NTP actions are occurring, but what is the configTICKRATEHZ value?

Task stops running without reason ?

Just for info, in this instance the task only ran twice, once at startup and another time after 1 minute. The task was supposed to wake up and run after another 2 minutes, ie after a total of 3 minutes or ~ 180k ticks

Task stops running without reason ?

Tick rate is running at 1000Hz, I use the tick time to update both UpTime (sec and msec from boot) and CurSecs (Unix epoch sec and msec) values. Both Uptime and CurSecs run VERY close to actual NTP time, checked regularly. Both times are used in a rules engine that performs cloud controlled tasks that execute at 10mS accuracy. The 2 macros used to control the (escalating) time delay are below: ~~~~

define NTPTASKPERIODMSMIN (1 * 60 * 1000)

define NTPTASKPERIODMSMAX (16 * 60 * 1000)

~~~~ Both are stored in a 24 bit wide _u32 value, structure below ~~~~ typedef struct taskinfo { TaskHandlet TaskHandle ; QueueHandlet QueueHandle ; TimerHandlet TimerHandle ; TickTypet LastWakeTime ; volatile _u32 TaskCount ; volatile _u32 QueueCount ; volatile _u32 TimerCount ; volatile _u32 Interval : 24 ; volatile _u16 TaskError : 12 ; volatile _u16 QueueError : 12 ; volatile _u16 TimerError : 12 ; volatile _u8 TaskState : 4 ; } TaskInfot ; ~~~~ The value in my structure is very clearly still 2min. ~~~~ 33m19s318: –Name–|F#|Pc/Pb| S |LoStk|%Util|TimerTicks|TaCount|QuCount|TiCount|TaErr|QuErr|TiErr|Ph|Intvl| 33m19s319: MAC| 1| 7/ 7| 2 | 260| 0.00| 262| 211| 211| 0| 0| 0| 0| 0| 0 33m19s320: Con| 2| 1/ 1| 1 | 176| 0.01| 2740| 9993| 5| 0| 0| 0| 0| 0| 0s000 33m19s321: IP4| 3| 5/ 5| 2 | 440| 0.03| 7837| 2000| 0| 0| 0| 0| 0| 5| 1 33m19s321: CRx| 4| 3/ 3| 2 | 216| 0.00| 297| 9| 662| 0| 0| 0| 0| 0| 1 33m19s322: CTx| 5| 2/ 2| 2 | 432| 0.00| 681| 662| 662| 1997| 0| 0| 0| 0| 1 33m19s323: Eng| 6| 6/ 6| 2 | 936| 0.00| 1175| 2001| 0| 199931| 0| 0| 0| 0| 1 33m19s324: Sen| 7| 7/ 7| 2 | 472| 0.12| 24385| 195608| 0| 0| 0| 0| 0| 0| 0s000 33m19s325: NTP| 8| 1/ 1| 2 | 272| 0.00| 89| 2| 0| 0| 0| 0| 0| 0| 2m00 33m19s326: IDLE| 9| 0/ 0| 1 | 396|99.35| 19829905| 33m19s326: Tmr Svc|10| 2/ 2| 2 | 544| 0.45| 90524| ~~~~ Will check the delay calculations but it is very simple and has in some cases run for almost 24 hours before skipping a slot.

Task stops running without reason ?

I should have asked at the beginning – what do you have configCHECKFORSTACK_OVERFLOW set to? Also, how big is the interrupt stack? [configCHECKFORSTACK_OVERFLOW only checks the stack used by tasks, but the stack in use switches to the interrupt stack each time an interrupt is entered. The stack used by interrupts is the same stack as used by main() before the scheduler is started, so it is set up by your tools somewhere)]
From your last post I can infer the tick rate is 1KHz. Interesting. So we are expecting xItemValue to be approximately 180K, and indeed it is, so appears to have been calculated correctly. Why then, is it in the overflow list? Hmm…thinking. Something is very odd – the code that determines which list to put the task into is very simple: ~~~~ if( xTimeToWake < xTickCount ) { /* Wake time has overflowed. Place this item in the overflow list. / vListInsert( pxOverflowDelayedTaskList, &( pxCurrentTCB->xGenericListItem ) ); } else { / The wake time has not overflowed, so the current block list is used. */ vListInsert( pxDelayedTaskList, &( pxCurrentTCB->xGenericListItem ) ); ~~~~ …and we can already see that xTimeToWake was calculated correctly, as it is shown in the screen shot. Normally at this point I would ask for a cut down project, with just enough code to demonstrate the issue, so I could run it here – but in this case I don’t think that is going to be possible as you are using the NTP code. How about this – could you update your task so the task does not call any of the NTP code, but instead just increments a volatile variable every time it runs. So you keep the structure of the task the same, the bit that calls vTaskDelayUntil() and the bit that calculates the next wake time should remain unchanged, but the bit that calls the NTP code is commented out and replaced with a simple variable increment. Then run the code again. If the problem still exists then we would have something I could try running here.

Task stops running without reason ?

Our two last posts crossed over each other. Now replying to your post that states the NTPTASKPERIODMSMIN value. Your NTPTASKPERIODMSMIN and NTPTASKPERIODMSMAX values are defined as signed integers – a long shot – but could cause a problem. Can you add ‘ul’ to the end literal, as (1UL * 60UL * 1000UL) The 24-bits is also to be of suspicion.

Task stops running without reason ?

Have already added UL to all 3 components, made no difference. Falied after 2 executions. ~~~~

define NTPTASKPERIODMSMIN (1UL * 60UL * 1000UL)

define NTPTASKPERIODMSMAX (16UL * 60UL * 1000UL)

~~~~ 24 bits should handle ~280min, so we should be very safe…

Task stops running without reason ?

thanks for the help, no bug anywhere but rather an obscure NTP issue. Some “holier than thou” servers reserve the right to simply ignore a request, no warning. When using blocking socket option the cc3200 IP stack block with the rtos waiting for a response. Just removing these specific hosts from the list removed the problem but I will change the code to use nonblocking socket and handle the EAGAIN and EWOULDBLOCK returns.