Quality RTOS & Embedded Software

 Real time embedded FreeRTOS RSS feed 
Real time embedded FreeRTOS mailing list 
Quick Start Supported MCUs PDF Books Trace Tools Ecosystem TCP & FAT Training




Loading

FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by zbrozek on April 23, 2017

I'm writing a from-scratch (not using stm32f7xxhaletc.{c,h}) STM32F7 zero-copy Ethernet driver for FreeRTOS+UDP 9.0.0. It mostly works (and the bugs in my driver aren't what I'm hoping to address in this post), but I have some concern about a hack I seem to have to do.

In particular, the reference manual in the "Transmit checksum offload" section states: > You must make sure the Transmit FIFO is deep enough to store a complete frame before that frame is transferred to the MAC Core transmitter.

and it also says:

Note that: for ICMP-over-IPv4 packets, the checksum field in the ICMP packet must always be 0x0000 in both modes, because pseudo-headers are not defined for such packets. If it does not equal 0x0000, an incorrect checksum may be inserted into the packet.

But it doesn't appear like FreeRTOS+UDP provides four unused bytes at the end of an Ethernet buffer. In order to get the checksum to be automatically inserted, I must tell the DMA peripheral that the frame is four bytes bigger than what FreeRTOS+UDP suggests, and I also have to set the last four bytes to zeros. This of course is extremely dangerous and I shouldn't do it.

Have I missed something about the buffer allocation size that makes this OK? Or is there a way to specify end-of-buffer padding somewhere? I've noticed that this page is missing some details.


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by heinbali01 on April 23, 2017

Hi Sasha,

I'm writing a from-scratch STM32F7 zero-copy Ethernet driver for FreeRTOS+UDP 9.0.0.

Congratulations !

Do you really mean +UDP, or +TCP? FreeRTOS+TCP started in the /Labs section. It is the successor of +UDP. It is more complete, and more often used. If you want to use it UDP-only, that is still possible:

~~~ #define ipconfigUSE_TCP 0 ~~~

Moving from +UDP to +TCP shouldn't be hard. The principles are all the same.

The drivers for both releases are 99% the same. FreeRTOS+TCP has an extra parameter when calling xNetworkInterfaceOutput():

~~~ BaseTypet xNetworkInterfaceOutput( NetworkBufferDescriptort * const pxDescriptor, BaseType_t bReleaseAfterSend ); ~~~

When bReleaseAfterSend is true, the driver ( NetworkInterface ) is responsible for releasing pxDescriptor. When you have zero-copy transmissions:

~~~ #define ipconfigZEROCOPYTX_DRIVER 1 ~~~

... then bReleaseAfterSend will always be true, so it is safe to pass the memory to a DMA descriptor. Once the transfer is ready, the original Network Buffer must be released.

I like to work with a counting semaphore that keeps track of the number of available DMA descriptors for transmission (TX).

In the current STM32F4 driver, it looks like this:

~~~ /* As soon as the EMAC has sent a frame, the original Network Buffer must be released. */ ucPayLoad = ( uint8_t * )DMATxDescToClear->Buffer1Addr;

/* Look-up the Network buffer that owns a certain payload. */
pxNetworkBuffer = pxPacketBuffer_to_NetworkBuffer( ucPayLoad );

/* Release the Network buffer. */
vReleaseNetworkBufferAndDescriptor( pxNetworkBuffer ) ;

/* Mark that it has been released already, just to be sure. */
DMATxDescToClear->Buffer1Addr = ( uint32_t )0u;

/* Tell the counting semaphore that one more TX descriptor is available. */
xSemaphoreGive( xTXDescriptorSemaphore );

~~~

for ICMP-over-IPv4 packets, the checksum field in the ICMP packet must always be 0x0000 in both modes

I also encountered this 'feature' on STM32F4's EMAC :

~~~ if( pxPacket->xICMPPacket.xIPHeader.ucProtocol == ipPROTOCOLICMP ) { pxPacket->xICMPPacket.xICMPHeader.usChecksum = ( uint16t )0u; } ~~~

Just to summarise, for IPv4, the following check-sums exist:

1) IP-header has a checksum over a so-called pseudo header (16-bits) 2) Protocol-checksum (16-bits) 3) A 4-byte Frame Check Sequence (FCS) that we do not get to see in the library.

Checksums 1) and 2) can either be set and checked by the EMAC, or by the library. The last checksum can only be added by the EMAC.

Now when a packet has 100 bytes, we send 100 bytes to the EMAC, and the EMAC will secretly add 4-byte frame-check.

But there is a minimum frame length of 60 + 4 bytes. That is needed for a proper collision detection on the LAN.

In FreeRTOS+TCP, you can define the following:

~~~ #define ipconfigETHERNETMINIMUMPACKET_BYTES ( 60 ) ~~~

ARP and ICMP messages can be as short as 42 bytes. The 18 extra bytes will be set to zero. So 60 bytes will be sent to the EMAC.

But it doesn't appear like FreeRTOS+UDP provides four unused bytes at the end of an Ethernet buffer.

True because the EMAC adds 4 bytes on its own.

In order to get the checksum to be automatically inserted, I must tell the DMA peripheral that the frame is four bytes bigger than what FreeRTOS+UDP suggests

That is new for me, I haven't encountered that yet. But I haven't used an STM32F7 either.

Have I missed something about the buffer allocation size that makes this OK?

FreeRTOS+TCP will guarantee that all network buffers have a minimum size of ipconfigETHERNET_MINIMUM_PACKET_BYTES. That is particularly important when using BufferAllocation_2.c, which allocates just enough bytes for pucEthernetBuffer.

~~~ /* Adding pad bytes. */

if defined( ipconfigETHERNETMINIMUMPACKET_BYTES )

{ if( pxNetworkBuffer->xDataLength < ( sizet ) ipconfigETHERNETMINIMUMPACKETBYTES ) { BaseType_t xIndex;

    for( xIndex = ( BaseType_t ) pxNetworkBuffer->xDataLength;
         xIndex < ( BaseType_t ) ipconfigETHERNET_MINIMUM_PACKET_BYTES;
         xIndex++ )
    {
        /* FreeRTOS+TCP guarantees that pucEthernetBuffer is long enough. */
        pxNetworkBuffer->pucEthernetBuffer[ xIndex ] = 0u;
    }
    pxNetworkBuffer->xDataLength = ( size_t ) ipconfigETHERNET_MINIMUM_PACKET_BYTES;
}

}

endif

~~~

Or is there a way to specify end-of-buffer padding somewhere? I've noticed that this page is missing some details.

An EMAC will understand the trailing zero's at the end. They will be transmitted, followed by an FCS.

Here you find the latest NetworkInterface.c for the STM32F4.

This and other drivers will (hopefully) soon appear in the official FreeRTOS+TCP release. Regards.


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by zbrozek on April 24, 2017

I had written a much longer response, but sourceforge ate it. In short:

Yes, I'm using +UDP because it seemed like the intended IP stack (mainlined, as opposed in a Lab). As I've been doing research it's been getting more and more clear that I really should be using +TCP. Hopefully it plays nice with the 9.0.0 release. Right now I'm having some difficulty getting it to compile since IAR 7.80.1 seems to be ignoring the semicolon in packstructend.h. I'll fuss with it and see where I get.

Thank you so very much for your long and detailed response. The pointer to your F4 driver makes it clear that I was never getting hardware checksumming at all, and in fact my size error was due to the checksumming not adding the bytes at all. One of the bugs in the driver was that the FCS was always wrong. Now I understand that it was garbage data.


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by heinbali01 on April 24, 2017

Hi Sasha,

it [+UDP] seemed like the intended IP stack (mainlined, as opposed in a Lab).

FreeRTOS+TCP should have moved to mainline already. +TCP has been out and stable for a couple of years. The last stable release was 160919. A new version with some new features is due to be released soon.

As I've been doing research it's been getting more and more clear that I really should be using +TCP.

I also think so.

Hopefully it plays nice with the 9.0.0 release.

It works well with any kernel from 8.1.1 up to the latest. I use 9.0.0 for all platforms that I have. It needs event_groups.c, it does not make use of timers.c.

Right now I'm having some difficulty getting it to compile since IAR 7.80.1 seems to be ignoring the semicolon in packstructend.h. I'll fuss with it and see where I get.

That is a known issue. Have you found this and this post already?

+UDP didn't have portable\Compiler\IAR directory yet, how did you compile it?

Would it help if we add a macro ipconfigPACK_END, which will be placed as follows:

~~~

include "packstructstart.h"

/* contains '__packed' */

struct xETHHEADER { MACAddresst xDestinationAddress; /* 0 + 6 = 6 / MACAddress_t xSourceAddress; / 6 + 6 = 12 / uint16_t usFrameType; / 12 + 2 = 14 */ }

include "packstructend.h"

/* for IAR, an empty header file. */

ipconfigPACK_END

/* for IAR, defined as a semicolon. */ ~~~

Could you try that? Regards.


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by heinbali01 on April 24, 2017

There are 21 occurrence of

~~~

include "packstructend.h"

~~~

which should all be replaced by :

~~~

include "packstructend.h"

ipconfigPACK_END ~~~

And add a few lines in FreeRTOSIPConfigDefaults.h:

~~~

ifndef ipconfigPACK_END
#define ipconfigPACK_END
endif

~~~


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by zbrozek on April 24, 2017

I tried doing what +UDP does and added __attribute__( (packed) ); into the packstructend.h file. It works. As far as I can tell, IAR ignored the GCC-style attribute, but the additional text was enough to get around the preprocessor bug.

I also tried your workaround. No dice, including when trying to actually define ipconfigPACK_END as a semicolon. Preprocessor output looked the same as before - no semicolons at the end of struct definitions.

FreeRTOSIP.h needed to be #include in FreeRTOSTCPIP.h and FreeRTOSStream_Buffer.h.

The compiler emits a warning about FreeRTOSSockets.c having a implicit call to FreeRTOSminBaseType unless ipconfigHASINLINE_FUNCTIONS set to 1. I set it to 1 and all seems well, but it seems like that shouldn't happen when using options.


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by heinbali01 on April 25, 2017

Glad to hear that it is OK to write it as __attribute__( (packed) );, and thanks for reporting back.

I must admit that I can not test the code with IAR at this moment ( haven't installed it yet on my new laptop ).

The trick with ipconfigPACK_END did work properly with GCC, but that doesn't help the IAR projects.

Thanks for pointing-out that defining ipconfigHAS_INLINE_FUNCTIONS = 0 leads to compiler warnings. In fact some defines were missing. Here they are:

~~~ #define FreeRTOSmaxBaseType(a, b) ( ( ( BaseTypet ) ( a ) ) >= ( ( BaseTypet ) ( b ) ) ? ( ( BaseTypet ) ( a ) ) : ( ( BaseTypet ) ( b ) ) ) #define FreeRTOSmaxUBaseType(a, b) ( ( ( UBaseTypet ) ( a ) ) >= ( ( UBaseTypet ) ( b ) ) ? ( ( UBaseTypet ) ( a ) ) : ( ( UBaseTypet ) ( b ) ) ) #define FreeRTOSminBaseType(a, b) ( ( ( BaseTypet ) ( a ) ) <= ( ( BaseTypet ) ( b ) ) ? ( ( BaseTypet ) ( a ) ) : ( ( BaseTypet ) ( b ) ) ) #define FreeRTOSminUBaseType(a, b) ( ( ( UBaseTypet ) ( a ) ) <= ( ( UBaseTypet ) ( b ) ) ? ( ( UBaseTypet ) ( a ) ) : ( ( UBaseTypet ) ( b ) ) ) ~~~

( the casts are necessary just in case these macro's are being called with constant values ).

Something about "static inline":

The FreeRTOS kernel has never used static __inline functions. The reason is that some compilers wouldn't understand it.

When developing FreeRTOS+TCP, the need for static inline functions increased, The have several advantages compared to macro's:

● they allow proper type-checking of the parameters ● the compiler will decide whether the functions will either be inlined or linked, depending on the generated code-size ● they avoid a possible multiple evaluation of it's parameters ● The use of function-like macro's can lead to hard-to-detect errors

An example of the last :

~~~ /* What if 'ptr' is a number, in stead of a pointer? */

define STOREBYTE( ptr, value ) ( ( ( uint8t *) ptr ) = ( value ) )

~~~

Note that an older compiler for TI parts did generate erroneous ( crashing ) code in case static inline functions were used in FreeRTOS+FAT, by defining ffconfigINLINE_MEMORY_ACCESS = 1. TI solved that in a newer release of the compiler.

Regards.


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by zbrozek on April 25, 2017

Added those in and it's clean both with and without the definition - thanks!

I'm rewriting my driver for BufferAllocation_2.c and the deferred processing task it requires. I've got a couple of questions:

  • Why is it recommended not to release buffers to the stack on confirmation of transmission, but rather on the attempt to transmit again? It seems like that'll leave a lot of orphaned buffers, up to the high water mark during a transmit burst.
  • How does FreeRTOS work on processors with a data cache and DMA? With BufferAllocation1.c I can configure the MPU to force write-through in the appropriate directions, or to manually invalidate the cache as necessary. There doesn't seem like quite as elegant a way to do that with BufferAllocation2.c. Until I've figured out how I want to deal with it, I'm just not enabling the data cache.

Thanks again! -Sasha


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by heinbali01 on April 25, 2017

Hi Sasha,

Added those in and it's clean both with and without the definition - thanks!

Now added to mainline, thanks to you.

( some users encounter imperfections, correct them, but do not bother about reporting them on a forum )

I'm rewriting my driver for BufferAllocation_2.c and the deferred processing task it requires.

Rewriting? Does it need a different approach? I think that for a driver, the only difference is that you do not need vNetworkInterfaceAllocateRAMToBuffers() any more. Or do you have more changes?

There has been a driver using BufferAllocation_2.c, that would pass the pucEthernetBuffer field of a Network Buffer to DMA, assign NULL to it, and release the Network Buffer. That is OK, as long as BufferAllocation_2.c is used, and therefore I would not recommend that method. I prefer to hand over the Network Buffer to DMA (only for a few µS or 1 mS), and when transmission is ready, call pxPacketBuffer_to_NetworkBuffer() to get the pointer to the owning Network buffer. That method works for both BufferAllocation_1 and BufferAllocation_2.

BufferAllocation_2.c and the deferred processing task it requires

Why wouldn't BufferAllocation_1.c need a deferred processing task? I think it's all the same? Except for the trick described here above?

Why is it recommended not to release buffers to the stack on confirmation, of transmission but rather on the attempt to transmit again? It seems like that'll leave a lot of orphaned buffers, up to the high water mark during a transmit burst.

( it is not recommended, it was caused by laziness... or lack of time to write it better )

I think that you found this in an older driver ( indeed maybe from the latest release ). What you see there was not intentional.

In e.g. this recent driver:

https://sourceforge.net/p/freertos/discussion/382005/thread/39bcfcfb/3ee0/attachment/STM32FxxnetworkInterfaceand_iperf.zip

( I did add a link here above ) you will see that the Network Buffer is being released immediately after the EMAC confirms the (success or failure of a) transmission.

How does FreeRTOS work on processors with a data cache and DMA?

Good question !

With BufferAllocation1.c I can configure the MPU to force write-through in the appropriate directions, or to manually invalidate the cache as necessary. There doesn't seem like quite as elegant a way to do that with BufferAllocation2.c. Until I've figured out how I want to deal with it, I'm just not enabling the data cache.

( the following text is based on my experience with Xilinx Zynq, which has a zero-copy driver, and a lot of cached DRAM )

With BufferAllocation_2.c, memory will be obtained by calling pvPortMalloc(), which is normally cached memory. When using BufferAllocation_1.c, indeed you can choose where to allocate the memory, cached or uncached.

When using cached memory, before passing it to DMA, it must be flushed. And after receiving data, the memory area must be refreshed before it is up-to-date.

The mistake that I made ( I've had headaches about this ), was to flush or refresh parts of memory which had an overlap with other objects!

Lesson: you have to make sure that the starting offset and the length of the memory blocks occupied by pucEthernet are properly aligned with boundaries of the caching.

Using a length of 1536 ( 0x600 ) bytes worked wonderfully.

Later I decided to dedicate 1 MB of uncached DRAM to DMA, both for the MMC driver as well as ETH. Since then, the driver doesn't worry about flushing and refreshing any more. Regards.


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by glenenglish on April 25, 2017

For others reading this required reading "Cortex M7 programming manual ST doc PM0253 Sections 2.3 , 4.8, 4.9

http://www.st.com/content/ccc/resource/technical/document/programming_manual/group0/78/47/33/dd/30/37/4c/66/DM00237416/files/DM00237416.pdf/jcr:content/translations/en.DM00237416.pdf

It's updated quite regularly.


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by zbrozek on April 27, 2017

Rewriting? Does it need a different approach? I think that for a driver, the only difference is that you do not need vNetworkInterfaceAllocateRAMToBuffers() any more. Or do you have more changes?

Well, I had originally used BufferAllocation_1.c because it would allow me to do all of the buffer swapping in the ISR, thus not needing the deferred processing task at all. So beyond removing that call, I also have to reduce the scope of the ISR and move that functionality into a task. No big deal, just something that needs doing. I'm most-active on weekends (this is a hobby project), so I'm a bit behind on implementing everything.

Right now I'm still wrapping my head around the padding prior to the Ethernet buffers and when I want to make calls to pucGetNetworkBuffer() versus pxGetNetworkBufferWithDescriptor(). This section suggests the former, but at the bottom of this section the example uses the latter. I assume that's because when I populated the RX DMA descriptors, that I used pucGetNetworkBuffer() and the IP stack needs a populated descriptor to be able to parse the packet.

I'm especially confused by this: ~~~ *( ( NetworkBufferDescriptort ** ) ( pxDMARxDescriptor->pucEthernetBuffer - ipBUFFERPADDING ) ) = pxDMARxDescriptor; ~~~

But I haven't spent much time on it; it'll probably become clear when I get to studying it more closely.

I started looking in to how to forcibly clean and forcibly invalidate ranges of addresses from the D-cache. Looks like core_cm7.h defines SCB_InvalidateDCache_by_Addr and SCB_CleanDCache_by_Addr which should do just what I need. Yay!


FreeRTOS+UDP end-of-buffer padding to support store-and-forward hardware checksumming

Posted by zbrozek on April 29, 2017

OK; I'm now mostly operational with +TCP. My demo board (STM32F746 Discovery) is requesting, receiving, and ACKing a DHCP lease. I implemented your ICMP checksum workaround in my driver. It doesn't keel over and die when I ping flood the board. Success?!


[ Back to the top ]    [ About FreeRTOS ]    [ Sitemap ]    [ ]




Copyright (C) 2004-2010 Richard Barry. Copyright (C) 2010-2016 Real Time Engineers Ltd.
Any and all data, files, source code, html content and documentation included in the FreeRTOSTM distribution or available on this site are the exclusive property of Real Time Engineers Ltd.. See the files license.txt (included in the distribution) and this copyright notice for more information. FreeRTOSTM and FreeRTOS.orgTM are trade marks of Real Time Engineers Ltd.

Latest News:

FreeRTOS V9.0.0 is now available for download.


Free TCP/IP and file system demos for the RTOS


Sponsored Links

⇓ Now With No Code Size Limit! ⇓
⇑ Free Download Without Registering ⇑


FreeRTOS Partners

ARM Connected RTOS partner for all ARM microcontroller cores

Renesas Electronics Gold Alliance RTOS Partner.jpg

Microchip Premier RTOS Partner

RTOS partner of NXP for all NXP ARM microcontrollers

Atmel RTOS partner supporting ARM Cortex-M3 and AVR32 microcontrollers

STMicro RTOS partner supporting ARM7, ARM Cortex-M3, ARM Cortex-M4 and ARM Cortex-M0

Xilinx Microblaze and Zynq partner

Silicon Labs low power RTOS partner

Altera RTOS partner for Nios II and Cortex-A9 SoC

Freescale Alliance RTOS Member supporting ARM and ColdFire microcontrollers

Infineon ARM Cortex-M microcontrollers

Texas Instruments MCU Developer Network RTOS partner for ARM and MSP430 microcontrollers

Cypress RTOS partner supporting ARM Cortex-M3

Fujitsu RTOS partner supporting ARM Cortex-M3 and FM3

Microsemi (previously Actel) RTOS partner supporting ARM Cortex-M3

Atollic Partner

IAR Partner

Keil ARM Partner

Embedded Artists