TCP stack: task starvation

Hello all! I’d like to report an issue I’ve found in the recent weeks. Kinetis K64F (Cortex M4F), FreeRTOS-8.2.0, TCP-160112. The K64F is the server. There’s a persistent connection between the K64F and PC. The PC regularly sends 100-200 bytes long requests, and the K64F replies with packets smaller than 1500 bytes. An HTTP server is also running and can serve 3 parallel requests. The PC is executing many parallel wgets to get the homepage on K64F at the highest pace possible. When everything is up and running (1 persistent TCP connection + 3 continuously connecting, downloading and disconnecting HTTP clients) the machine gets rebooted by the HW watchdog after a while (usually within 1-5 minutes). The watchdog has a dedicated reset task, with a priority lower than the IP task’s priority. I’ve figured out that prvCalculateSleepTime(…) function sometimes calculates a 0 tick of sleep time, thus it doesn’t let lower priority tasks getting CPU time. Even worse, it appears that if prvCalculateSleepTime(…) once calculates a zero ms of sleep time, it will never recover and will calculate this bad value until eternity (or until the watchdog resets ;)). I’ve applied a simple fix to the end of the function: ~~~ if (xMaximumSleepTime == 0) xMaximumSleepTime = pdMSTOTICKS(10); ~~~ And in my case this makes the trick, the system no longer gets starved. I didn’t have time to dig into the code to find the real reasons of this; the fix above has been running for more than 10 days without a stop, so I hope it can’t be that bad… 😉

TCP stack: task starvation

Hi Tamas, Thanks for taking the time to report this – we can try and replicate the situation. I presume a calculated sleep time of 0 will be legitimate in some circumstances, so that in itself does not sound like a problem (I would have to check), but there should not be a persistent time of 0 after that but without being able to duplicate the problem first (and then step through the code) I could not be sure.

TCP stack: task starvation

Hi Tamás, There is a very recent post about the same subject. In the upcoming Labs release this will be solved. Your patch will also work. I chose for a pdMS_TO_MIN_TICKS() that returns minimal 1

TCP stack: task starvation

Hello Hein! No, my issue has nothing to do with the tick frequency of FreeRTOS. Since we’ve figured out that my TCP problems occur due to the 200Hz setting, I’m using FreeRTOS at 1000Hz; the problem I described above is really independent of this.

TCP stack: task starvation

An updated Labs release was uploaded yesterday. This new release is a maintenance release for existing code, rather than a release of the main development branch.

TCP stack: task starvation

I don’t know if this is the case here, but this sounds very much like the problem that happens with a delay-until loop when it falls behind. It wants to do extra work right now to catch up, but that can cause starvation of lower priority tasks.

TCP stack: task starvation

Tamás, are you maybe using FreeRTOS_select() ? If you let select() wake-up on a WRITE condition ( you may write to a socket ), then select will keep on returning without blocking until the output buffer is full. The same for the other conditions: if it unblocks because of a socket has an exception ( connection closure ), and you omit to handle that exception, the next select() will keep on returning without sleeping.

TCP stack: task starvation

Another question: when prvCalculateSleepTime() returns zero, can you say which timer had expired ? Was it ARP, DHCP, TCP, or DNS. If it was TCP, it looks much like the frequency problem. Have you tried this solution already?

TCP stack: task starvation

No, I’m not using FreeRTOS_select(…) at all. Only simple reads and writes on the socket, which has ~100-200ms timeout set. I can’t tell you if it was ARP, DHCP, etc. timer, since I didn’t begin figuring it out. My machine is running with 1000Hz tick; according to our earlier conversation in email, I thought I don’t have to fix the pdMSTOTICKS(…) issue (it may return 0) unless the tick rate isn’t 1000Hz. So do you still recommend me to apply this small fix despite the FreeRTOS is running at 1000Hz on my machine? Anyways, since my primitive fix has been applied it appears to be stable in my current usage scenario…

TCP stack: task starvation

If you use a tickrate of 1000 Hz, it is indeed unlikely that applying the pdMSTOMIN_TICKS macro will change anything to the problem. It would still be interesting to know for what protocol (ARP, DHCP, TCP, or DNS) xMaximumSleepTime was set to zero.

TCP stack: task starvation

After some further checking, we found a solution to the problem that Tamás reports here above, the starvation of lower-priority tasks. Please change the following static-inline function in include/FreeRTOSIPPrivate.h : ~~~~ static portINLINE UBaseTypet uxGetRxEventCount( void ) { – extern volatile UBaseTypet uxRxEventCount; – return uxRxEventCount; + return 0u; } ~~~~ In other words, let this function always return zero. Rationale : When +TCP was first developed, we thought it would be advantageous to constantly keep track of how many RX packets are queued-up for the IP-task. It would allow the IP-task to give priority to RX processing above sending TCP packets. The internal variable uxRxEventCount would keep track of the number of RX packets queued in xNetworkEventQueue. The starvation: as long as uxRxEventCount was non-zero, the IP-task wouldn’t block. xTCPTimerCheck() in FreeRTOS_Socket.c would return 0 ticks:
if( uxGetRxEventCount() != 0u )
{
    /* This was interrupted, but want to be called as soon as
    possible to finish checking the other sockets. * /
    xShortest = ( TickType_t ) 0;
    break;
}
The above code is not needed at all: as long as the xNetworkEventQueue is non-empty, the IP-task won’t block unless another task has work to do at a higher priority. In the latest release of +TCP, you will see in FreeRTOS_TCP_IP.c that uxGetRxEventCount() won’t be checked any more. That was thanks to Andrzej Burski, who noted that under some circumstances, the IP-task just stops sending TCP packets. Tamás, thanks for reporting the above. In the next release, I think that uxRxEventCount will have disappeared altogether 🙂

TCP stack: task starvation

A 160831 release is now available with all use of the uxRxEventCount variable removed.

TCP stack: task starvation

I can confirm that this little modification has solved my issues 🙂