Quality RTOS & Embedded Software

 Real time embedded FreeRTOS RSS feed 
Quick Start Supported MCUs PDF Books Trace Tools Ecosystem


Loading

Zynq - TCP: Improve speed

Posted by hannes23 on June 20, 2018

Hello,

herewith I'd like to share my experience with improving the TCP communication. In my case I could get more than 20 % gain in TCP speed at 1000 Mbps.

Following things are necessay: 1st: Re-map the OCM (on-chip-memory) from bottom to top of address-space. 2nd: Force the linker to place the ucNetworkPackets - buffers into the OCM space.

The remapping is done by a macro calling some certain assembler code directly after starting in main().

My code is :

int main( void ) { xilprintf( "Hello from FreeRTOS mainrn" ); configASSERT( configUSETASKFPUSUPPORT == 2 ); xilprintf( "configUSETASKFPUSUPPORT (FreeRTOS.h) is set to %drn", configUSETASKFPU_SUPPORT );

	// Remap all 4 64KB blocks of OCM to top of memory and enable DDR address filtering
	MY_REMAP();

...
...

The configUSETASKFPU_SUPPORT - part could of course be omitted if not used.

The code (found somewhere in the Xilinx forum) for the MY_REMAP() define is:

define MY_REMAP() asm volatile(
"mov  r5, #0x03                                           \n"\
"mov  r6, #0                                              \n"\
"LDR  r7, =0xF8000000  /* SLCR base address    */         \n"\
"LDR  r8, =0xF8F00000  /* MPCORE base address  */         \n"\
"LDR  r9, =0x0000767B  /* SLCR lock key        */         \n"\
"mov  r10,#0x1F                                           \n"\
"LDR  r11,=0x0000DF0D  /* SLCR unlock key                 \n"\
"dsb                                                      \n"\
"isb                   /* make sure it completes */       \n"\
"pli  do_remap     /* preload the instruction cache */    \n"\
"pli  do_remap+32                                         \n"\
"pli  do_remap+64                                         \n"\
"pli  do_remap+96                                         \n"\
"pli  do_remap+128                                        \n"\
"pli  do_remap+160                                        \n"\
"pli  do_remap+192                                        \n"\
"isb                   /* make sure it completes */       \n"\
"b    do_remap                                            \n"\
".align 5, 0xFF         /* forces the next block to a cache line alignment */ \n"\
"do_remap:              /* Unlock SLCR                         */ \n"\
"str  r11, [r7, #0x8]   /* Configuring OCM remap value         */ \n"\
"str  r10, [r7, #0x910] /* Lock SLCR                           */ \n"\
"str  r9,  [r7, #0x4]   /* Disable SCU & address filtering     */ \n"\
"str  r6,  [r8, #0x0]   /* Set filter start addr to 0x00000000 */ \n"\
"str  r6,  [r8, #0x40]  /* Enable SCU & address filtering      */ \n"\
"str  r5,  [r8, #0x0]                                             \n"\
"dmb                                                              \n"\

);

Next step is to create a memory section ".ocm" by changing the linker-desciption file.

Following changes are to be done: In the memory section add: ps7_ocm : ORIGIN = 0xfffc0000, LENGTH = 0x3fe00

In the section description add: .ocm (NOLOAD) : { _ocmstart = .; *(.ocm) _ocmend = .; } > ps7_ocm

Final step is to inform the buffer definition, that the buffers should be placed into the ocm.

In file: NetworkInterface.c add the ocm-section attribute. It then should be: static uint8t ucNetworkPackets[ ipconfigNUMNETWORKBUFFERDESCRIPTORS * niBUFFER1PACKET_SIZE ] attribute ( ( aligned( 32

) ) ) attribute ((section (".ocm")));

After compile and link one could inspect the map-file to see, if the ocm section is successful generated and populated. It looks like: .ocm 0x00000000fffc0000 0x30000 0x00000000fffc0000 _ocmstart = . *(.ocm) .ocm 0x00000000fffc0000 0x30000 ./src/Ethernet/FreeRTOS-Plus-TCP/portable/NetworkInterface/Zynq/NetworkInterface.o 0x00000000ffff0000 _ocmend = .

All hints and changes are of course without my responsibility and warrenty If anybody has other or additional changes or hints to improve the speed in TCP communication let me please know. Especially someone could comment, if it makes sense to push other buffers or variables into the ocm.

Greetings to all.


Zynq - TCP: Improve speed

Posted by rtel on June 20, 2018

Really appreciate you taking the time to write this up.


Zynq - TCP: Improve speed

Posted by heinbali01 on June 20, 2018

Hi Johannes, thanks a lot for sharing this. I'm afraid I can not comment on it as I don't know enough about the Zynq memory handling. But I would be curious to see the results of a test with iperf3. I'll attach the latest version ( v3.0d ) of the iperf server to this message.

To activate the server wait for +TCP to be ready and call:

~~~ void vIPerfInstall( void ); ~~~

You can start a test on the host with this command:

~~~ iperf3 -c 192.168.2.114 --port 5001 --bytes 100M [ -R ] ~~~

The reverse flag ( -R ) causes the Zynq to send data, in stead of receiving data.

Attachments

iperf_task_v3_0d.c (26804 bytes)

Zynq - TCP: Improve speed

Posted by heinbali01 on June 20, 2018

I wrote:

I would be curious to see the results of a test with iperf3.

It would be great if you can start two sessions simultaneously in two directions: ~~~ iperf3 -c 192.168.2.114 --port 5001 --bytes 1G iperf3 -c 192.168.2.114 --port 5001 --bytes 1G -R ~~~ We recently saw a problem with Zynq: incoming packets were dropped under heavy traffic, causing very slow transfer speed for the first session ( the one without -R ). I am curious to see if a faster memory access will prevent these problems


Zynq - TCP: Improve speed

Posted by hannes23 on June 21, 2018

Hello Hein,

here are the results of the iperf3 tests I've done at my Zynq7000, running at 666MHz.

First I had to change the ip-address and port-number according to needs of our firewall. Then I got a stack-fault. Maybe that some other tasks wich are running on my system caused this. These other tasks don't use much CPU time, I believe, so I didn't change the software for the iperf tests.

After setting the stack-size to 1000 everything was fine.

Next I changed the window- and buffer settings to those values I used at my HTTP-server work. The code after change is:

ifndef ipconfigIPERFTXBUFSIZE
define mySETTINGS
ifdef mySETTINGS
#define ipconfigIPERF_TX_BUFSIZE				( 128 * 1024 )
#define ipconfigIPERF_TX_WINSIZE             	( 48 )
#define ipconfigIPERF_RX_BUFSIZE		        ( ( 80 * 1024 ) - 1 )
#define ipconfigIPERF_RX_WINSIZE				( 24 )
else
#define ipconfigIPERF_TX_BUFSIZE				( 65 * 1024 )	/* Units of bytes. */
#define ipconfigIPERF_TX_WINSIZE				( 4 )			/* Size in units of MSS */
#define ipconfigIPERF_RX_BUFSIZE				( ( 65 * 1024 ) - 1 )	/* Units of bytes. */
#define ipconfigIPERF_RX_WINSIZE				( 8 )			/* Size in units of MSS */
endif
endif

By the way: Why should or must the RX_BUFSIZE be uneven?

I've done tests with original bufsizes with and without -R, and also tests with my settings, also both variants.

Finally I did the concurrent test as you supposed and saw that indeed the performance of the one without -R really dropped. On my debug uart I got lots of messages like: SACK[4503,34508]: optlen 12 sending 14583407 - 14584867

The test results are: Original bufsize:

[ 4] 0.00-19.59 sec 1.00 GBytes 438 Mbits/sec sender

[ 4] 0.00-19.59 sec 1024 MBytes 438 Mbits/sec receiver and for the reverse mode: [ 4] 0.00-27.12 sec 37.0 Bytes 10.9 bits/sec 4294967295 sender

[ 4] 0.00-27.12 sec 1.00 GBytes 317 Mbits/sec receiver

mySettings bufsize:

[ 4] 0.00-16.43 sec 1.00 GBytes 523 Mbits/sec sender

[ 4] 0.00-16.43 sec 1024 MBytes 523 Mbits/sec receiver and for the reverse mode: [ 4] 0.00-13.81 sec 37.0 Bytes 21.4 bits/sec 4294967295 sender

[ 4] 0.00-13.81 sec 1.00 GBytes 622 Mbits/sec receiver

For completeness I added the results in a file.

Greetings

Attachments


Zynq - TCP: Improve speed

Posted by heinbali01 on June 23, 2018

Thanks Johannes, for these detailed and systematic measurements. It looks like using your memory settings makes the Ethernet communication faster, at least a 20%. But unfortunately, it does not help against the packet loss in this case:

~~~ My settings bufsize parallel: Connecting to host 169.254.79.19, port 4503 [ 4] local 169.254.214.213 port 35510 connected to 169.254.79.19 port 4503 [ ID] Interval Transfer Bandwidth [ 4] 3.00-4.00 sec 512 KBytes 4.19 Mbits/sec
[ 4] 4.00-5.00 sec 896 KBytes 7.34 Mbits/sec
~~~

In this case with heavy two-way traffic, incoming packets are being dropped. Earlier I found that it helps to decrease the packet size of TCP packets ( MSS ).

and saw that indeed the performance of the one without -R really dropped. On my debug uart I got lots of messages like: SACK[4503,34508]: optlen 12 sending 14583407 - 14584867

That is indeed a sign of packets being dropped


Zynq - TCP: Improve speed

Posted by heinbali01 on March 2, 2019

HI Johannes, I finally solved the problem of the lost packets during concurrent transmissions.

See this post

See xemacpsifdma.c in the function emacps_send_message((), it is very essential to read back the register that was just set:

~~~ XEmacPsWriteReg( ulBaseAddress, XEMACPSNWCTRLOFFSET, xxx ); + /* Reading it back is important compiler is optimised. */ + XEmacPsReadReg( ulBaseAddress, XEMACPSNWCTRLOFFSET ); ~~~

Now I started two concurrent sessions with iperf3, and both transported an equal amount of data.

This is the new Zynq driver


[ Back to the top ]    [ About FreeRTOS ]    [ Privacy ]    [ Sitemap ]    [ ]


Copyright (C) Amazon Web Services, Inc. or its affiliates. All rights reserved.

Latest News

FreeRTOS v10.2.1 is available for immediate download. MIT licensed, includes 64-bit RISC-V, NXP Cortex-M33 demo & Nuvoton Cortex-M23 demo.

NXP tweet showing LPC5500 (ARMv8-M Cortex-M33) running FreeRTOS.

View a recording of the "OTA Update Security and Reliability" webinar, presented by TI and AWS.


Careers

FreeRTOS and other embedded software careers at AWS.



FreeRTOS Partners

ARM Connected RTOS partner for all ARM microcontroller cores

Cadence Tensilica Cortes

Espressif ESP32

IAR Partner

Microchip Premier RTOS Partner

RTOS partner of NXP for all NXP ARM microcontrollers

Mediatek

Renesas

RISC-V

SiFIve RISC-V

STMicro RTOS partner supporting ARM7, ARM Cortex-M3, ARM Cortex-M4 and ARM Cortex-M0

Texas Instruments MCU Developer Network RTOS partner for ARM and MSP430 microcontrollers

OpenRTOS and SafeRTOS

Xilinx Microblaze and Zynq partner