Quality RTOS & Embedded Software

 Real time embedded FreeRTOS RSS feed 
Quick Start Supported MCUs PDF Books Trace Tools Ecosystem


memcpy in ISR

Posted by michaeln32 on August 22, 2017


Is it ok to use memcpy() in ISR ?

Is it ok to use strcpy() in ISR ?


memcpy in ISR

Posted by hs2sf on August 22, 2017

Why not ? Both are stateless (reentrant) functions.

memcpy in ISR

Posted by michaeln32 on August 22, 2017

Hi HS2,

Thanks for your answer.

I think that the execution time of memcpy (or strcpy) is unknown so

it is better not to use in in ISR.

Am I right/wrong ?


memcpy in ISR

Posted by davidbrown on August 22, 2017

The execution time might be unknown to you, but it is certainly clear and deterministic. A simple memcpy() implementation will copy the given number of characters, one by one. You have the call overhead, and you have the loop for each character - the loop count is known when you call memcpy(). With strcpy(), the loop count is the length of the string, which may or may not be known by you.

If you have a more sophisticated compiler and enable optimisations, then memcpy is often inlined. If the count is known at compile time, and alignments are known to the compiler, then it can use larger moves than characters, unroll the loop, and in some cases it can omit the memcpy altogether if it can see it is unnecessary.

memcpy in ISR

Posted by richard_damon on August 22, 2017

memcpy/strcpyy for small buffers shouldn't be a time issue, if you need to move the data, it is probably the fast option.

The one issue is that on some processors and some compilers, memcpy might be placed inline with code that uses some floating point registers. There exists a few corner cases where the FP registers in the ABI are considered 'caller save' (normally called functions are allowed to trash these registers), but the standard interrupt prolog doesn't save them, as it isn't expected for ISRs to use floating point. (I beleive this is the case with the Cortex M4F). There tends to be an option to prevent this optimization.

memcpy in ISR

Posted by gezab on August 22, 2017

Check the implementation of your standard library, there are some variants that are not reentrant. However, even in this case you can still use the above functions with wrappers that revet them in critical sections.


alternate (2007 bytes)

memcpy in ISR

Posted by hs2sf on August 22, 2017

I fully agree with David. Both functions are used very often, are implemented in an optimized way and get some spezial support from compilers. You won't be better when rolling your own copy routines. Runtime behaviour is fixed/deterministic and only depends on input parameters since there is no internal locking or unknown code paths.

memcpy in ISR

Posted by michaeln32 on August 22, 2017

Ok. Thank you very much !

memcpy in ISR

Posted by rtel on August 22, 2017

This is an interesting one - and the answer is not as straight forward as some of the posts in this thread make out. It is something we have often considered ourselves and concluded that, as you don't know in advance how many bytes are to be copied, the best and most efficient in the general case way of doing this is to call the standard C memcpy() function.

memcpy() needs to be efficient, so their implementations are normally intricate, but optimised for moving large amounts of data. That means they are not necessarily optimised for moving small amounts of data - where a byte by byte copy would be the most efficient. The need to be efficient and intricacy means assembly language is used (even required) with an excellent knowledge of the hardware architecture and characteristics. That means that, unless you know in advance what the maximum number of bytes that are to be copied are, it is extremely unlikely you will come up with a more efficient general memcpy() algroithm.

memcpy will typically perform byte copies to get the to/from addresses word aligned, then word copies until the to/from addresses are aligned to the requirements of any other more efficient move implementation that might be available on the architecture in question. That might be moving instructions by using push multiple and pop multiple instructions, or, as David B noted in this thread already, using wide floating point registers.

Using floating point registers is where it gets interesting, and gets back to Michael's original question about if it is ok to use memcpy() in an ISR.

Some FreeRTOS ports require the application writer to specify which FreeRTOS tasks use flop registers, and FreeRTOS will then only store a flop context for those tasks (because flop context's can be very expensive in memory and time). However, if the standard C library uses flop registers for memory operations then every task will need a flop context, and if flop registers are used in an ISR, then each ISR will need to save/restore flop registers too. Luckily I have only seen this be a problem once, and never on a Cortex-M.

memcpy in ISR

Posted by michaeln32 on August 22, 2017

Now I am a little bit confused.

I understand that it is not 100% safe to use memcpy in ISR.

Should I better copy memory in ISR in the next way (instead of using memcpy) ?

void ISR(void) { for(i=;i<200;i++) buff1[i]=buff2[i]; }

memcpy in ISR

Posted by rtel on August 22, 2017

Which architecture and compiler are you using?

memcpy in ISR

Posted by michaeln32 on August 22, 2017

STM micro controller - STM32L433.

Compiler - TrueSTUDIO - Atollic.

memcpy in ISR

Posted by rtel on August 22, 2017

In which case I would say I am 99.9% sure using memcpy() will be fine.

memcpy in ISR

Posted by michaeln32 on August 22, 2017

Thanks !

memcpy in ISR

Posted by davidbrown on August 23, 2017

Yes, with gcc on a Cortex M then memcpy will be either done inline, or using a fully re-entrant library call. It will be safe to use in an interrupt.

As always when you are looking for efficient code (and you always want efficient code in interrupts), make sure optimisation is enabled, and give the compiler as much information as you can. memcpy will be more efficient if the size of the copy is known at compile time, and if your source and destination are nicely aligned then the compiler can use 16-bit or 32-bit transfers rather than doing everything byte by byte.

memcpy in ISR

Posted by heinbali01 on August 23, 2017

Hi Michael, lots of responses about a simple memcpy(), apparently it is an interesting subject.

Why were you asking the question, out-of theoretical interest, or did you encounter a problem? Did you see instabilities or crashes?

In case you do encounter problems, you might want to try the attached module memcpy.c ( see below ). It is pretty well optimised and it is absolutely ISR-safe. Attached memcpy.c is part of the FreeRTOS/plus release,

There is the "automatic inlining" of memcpy(), in case the actual length is small and known at compile time. Please be aware that compilers sometimes make erroneous assumptions about the alignment:

This memcpy() :

~~~~ memcpy( target, source, 4 ); ~~~~

May not always be replaced with :

~~~~ *( ( uint32t * ) target ) = *( ( uint32t * ) source ); ~~~~

I have seen crashes ( exceptions ) because of this. A memcpy() function is smart about alignment. It will test the memory locations of both source and target. GCC has the -fno-builtin-memcpy option which will avoid automatic in-lining. I tend to use it ( and also -fno-builtin-memset ) in all of my projects.

And if you ask me: I would try to avoid massive memory copies from within an ISR :-) Good luck.


memcpy.c (9849 bytes)

memcpy in ISR

Posted by davidbrown on August 24, 2017

The large memcpy implementation here is unlikely to be inlined automatically on many compilers, even when the actual length and alignements are known at compile time. More sophisticated compilers (like later versions of gcc) will do the constant propagation first, then see that the resulting function is short enough to inline - with less sophisticated automatic inlining, the compiler will see the size of the full memcpy() function and decide it is too big to inline. And inlining will not occur anyway if the compiler does not have the source of memcpy() on hand when it is used.

gcc's builtin memcpy will inline correctly and optimally when information is known at compile time. It will do a better job than you will get with this memcpy() implementation.

Additionally, gcc's builtin (and library) memcpy is correct. Your one here has a fundamental error. It is not defined behaviour to access data via pointers of incompatible types. If this memcpy is called with sources or destinations that are, say, 16-bit types, then you are not allowed to access the data as 32-bit types. A union like this does not give you that ability - the compiler knows that the pointers involved cannot alias, and it can assume that the 16-bit data is not affected by any 32-bit writes. Moreover, it can make the same assumption about incompatible types that are the same size - if "uint32_t" is a typedef for "unsigned long", then it is incompatible with "unsigned int" even if that also is 32 bits.

As long as the function is compiled and called as a separate function, this type-based aliasing information will be lost and thus the compiler will do the copying. But if it is inlined (or you have link-time optimisation enabled), then the compiler can use this aliasing information to skip memory accesses that it knows cannot legally happen.

So how do you reliably and correctly copy chunks of data in C? There are three ways. One is to use character pointers and do it byte for byte - such accesses are always allowed to alias. Another is to use implementation-specific techniques, such as gcc's "may_alias" type attribute. The standard method is to use "memcpy".

Remember, the memcpy that comes with the implementation is guaranteed to be correct for that implementation - it can use whatever tricks needed to avoid any aliasing issues even if it copies in larger lumps.

Note that when you use memcpy on gcc without the -fno-builtin-memcpy flag, gcc will generate inline code when appropriate. This inlining can be so efficient that it is removed entirely, or done as simple register-to-register movement. It means that memcpy() can be used to make code clear, correct and safe, without worrying about efficiency, rather than using unions or pointer casts that often are not fully safe (like in your code here).

Finally, remember that even if you implement your own memcpy(), and even if you use the -fno-builtin-memcpy flag, the compiler is free to assume that memcpy works exactly according to the standards. If you write this:

static uint32_t memcpyBytes;

void *memcpy( void *pvDest, const void *pvSource, size_t ulBytes ) { memcpyBytes += ulBytes; // rest of implementation }

static uint8t d[100]; static uint8t s[100];

uint32t test(sizet n) { memcpyBytes = 123; memcpy(d, s, n); return memcpyBytes; }

then the compiler is free to return the fixed value 123 from this test() function. It knows memcpy cannot affect the value of memcpyBytes.

(As a side issue, the correct signature for memcpy() has "restrict" in the pointers.)

memcpy in ISR

Posted by davidbrown on August 24, 2017

The summary here is use the compiler's memcpy. It is safe and efficient.

Do not disable its useful and safe optimisations. Do not mess with defining your own versions of such standard library functions - they will not be better than the implementations versions, and they may have subtle risks. It may be a different matter if you are using a poor quality or limited compiler, but that is not the case here.

memcpy in ISR

Posted by hs2sf on August 24, 2017

Again - I couldn't agree more. Great posts David !!

memcpy in ISR

Posted by davidbrown on August 24, 2017

C pedantry is a hobby of mine - I am glad it can sometimes be of interest or use to others.

memcpy in ISR

Posted by rtel on August 24, 2017

David - love your post - I guess that makes me a geek ;o) If C pedantry is a hobby then I recommend trying LLVM, if you are not already a user.

You do however paint a picture of a perfect world. There was one particular platform I used that had two huge compiler bugs. I forget exactly which it was but it might have been a GCC IA32 COFF compiler (although maybe not as alignment was one of the issues).

One of the issues was attribute((packed)) didn't work. Didn't generate any compiler errors or warnings, just didn't work. Took a while to track down as I was compiling code I trusted as it had run on my architectures.

Relevant to the post though was that memcpy() was broken and caused faults due to alignment. Also took time to track down as it was assumed to be a good implementation.

Had a really hard time fixing that, hence Hein's references to the compiler options that prevent GCC using its inlined version. We provided our own implementation, but the only way we could get GCC to use our own in all cases was to explicitly tell it not to use its own built-in equivalent. I also learned then that it was particularly clever in noticing the signature of the memcpy() function - even when we called ours something else it would sometime replace it with its own version. Grr.

Sometimes I wonder if, considering the laws of diminishing returns, extracting the last n'th of optimization by pedantic scrutiny of the C standard [by compiler vendors] is worth the hassle it causes.

memcpy in ISR

Posted by heinbali01 on August 25, 2017

David, thanks for your comments and insights!

I also found the incorrect in-lining behaviour of both memcpy() as well as memset() in the GCC AVR32 cross compiler. I must admit that it was a couple of years ago ( time flies ). My application called the functions with a constant length, a multiple of 4. The compiler inserted 32-bit memory moves. When I saw the bogus assembly code ( sh...t ! ), I decided to use mentioned compiler options.

I hope that this bug has been solved in all current releases of all compilers that we use to test the /Labs demo's :-)

There was another case where the standard GCC implementation of memcpy() causing head-aches: on Cortex-A9 ( Richard mentioned it briefly here above). That implementation uses 64-bit floating point registers. But by default, the FPU registers are not stored on stack as part of the task context ( because storing FPU registers on stack is very expensive ). At that point we decided to supply a memcpy.c, that only uses standard registers.

( before writing memcpy.c, I did study the assembler sources of many GCC implementations )

I was not suggesting that the generic /Labs implementation in memcpy.c is "better", or "to be preferred". I wrote that if the standard memcpy() is suspect, give it a try with this hand-made memcpy(), to see if the problem gets solved.

memcpy in ISR

Posted by hs2sf on August 25, 2017

In fact also the GCC implementation of memcpy is port specific. A (cross) GCC port has to make assumptions regarding the target runtime environment/EABI. So the GCC port used e.g. for a bare metal Cortex-A9 application was obviously built with the assumption (and GCC build config) it's allowed to use FP registers e.g. in memcpy which might be not compatible with a specific runtime environment (e.g. FreeRTOS) the GCC toolchain was not really built for. GCC bug or feature ? As always - things get difficult when diggin' into details ;) Sometimes it's worth checking the compiler builtin config before being trapped by those pretty nasty compatibilty issues. For sure there were and are bugs even in compilers. It's just software, but luckily very, very, very well tested :) In my experience (nowadays) quite often compiler bugs reported by people are user code bugs resp. misunderstandings of the C/C++ standard and sometimes target compatibilty issues.

memcpy in ISR

Posted by davidbrown on August 25, 2017

Yes, I understand what you mean about compiler bugs and perfect worlds. Compiler bugs can be a real pain! It is particularly difficult for you folks writing something like FreeRTOS where you need to have code that works on a wide range of platforms - it is a lot easier for the user who only needs it to work on their particular compiler version and target. I make a point of consider the exact toolchain version as part of the build for a project - once I have used a version of gcc + libraries (or whatever other compiler), that is archived along with the project. Clearly that is a luxury you don't have for the FreeRTOS source that works on dozens of targets and many more toolchains and versions.

Correctness always trumps speed. (Of course I am generalising here, and speed may also be part of making the device run correctly according to requirements.) It is better to have a simple but known safe solution, than a complex one that might be faster, but may have problems.

It is also not easy writing optimal code, and not easy writing portable code - and pretty much impossible to write optimal portable code! On gcc, memcpy() is usually very efficient, especially when it can use the builtin version. But messing with casted pointers to move data in larger chunks can easily fall foul of aliasing rules and end up with code that the compiler happily accepts but the results don't match expectations. On other platforms, such manual casting might be the fastest method while memcpy() gives slow library calls. There really is no easy answer.

When considering compiler options, you might like to recommend "-fno-strict-alias" in gcc. That makes it safe to use pointers of different sizes and types to access other data, at the cost of some optimisations.

(Regarding attribute((packed)) - I am not a fan of this, and prefer to avoid it, partly because compiler bugs have been known with packed structs. I'd rather add any padding manually with "dummy" fields, check none are missing with "-Wpadded", and check the size of types using static assertions. Again, this is easier when you don't need to try to be portable on a range of targets.)

memcpy in ISR

Posted by davidbrown on August 25, 2017

I've only used the AVR32 briefly, but I think it did not allow unaligned accesses. So if memcpy() is doing that, it's a bug. And compiler bugs are a pain :-(

As for using things like floating point registers, that is a more tricky area. I can well appreciate that it can be a headache - but it is not a compiler bug as such. The compiler's job is to generate code for the cpu. If that cpu has floating point registers, then it can use them. You might be able to control things to some extent, such as with compiler options, but it very difficult to try to say "I want to use floating point registers for these things but not those things." That should usually be a choice left to the compiler. Sometimes non-floating point code can be made more efficient using floating point registers. Similar issues can apply to other bits of hardware - the msp430's hardware multiplier peripheral springs to mind.

For the ARM devices, lazy context switching of the floating point registers can be an answer. (Yes, I know that's easy for me to say, and far from easy to implement - especially in a way that avoids too much wasted ram space.)

[ Back to the top ]    [ About FreeRTOS ]    [ Privacy ]    [ Sitemap ]    [ ]

Copyright (C) Amazon Web Services, Inc. or its affiliates. All rights reserved.

Latest News

FreeRTOS v10.2.1 is available for immediate download. MIT licensed, includes 64-bit RISC-V, NXP Cortex-M33 demo & Nuvoton Cortex-M23 demo.

NXP tweet showing LPC5500 (ARMv8-M Cortex-M33) running FreeRTOS.

View a recording of the "OTA Update Security and Reliability" webinar, presented by TI and AWS.


FreeRTOS and other embedded software careers at AWS.

FreeRTOS Partners

ARM Connected RTOS partner for all ARM microcontroller cores

Cadence Tensilica Cortes

Espressif ESP32

IAR Partner

Microchip Premier RTOS Partner

RTOS partner of NXP for all NXP ARM microcontrollers





STMicro RTOS partner supporting ARM7, ARM Cortex-M3, ARM Cortex-M4 and ARM Cortex-M0

Texas Instruments MCU Developer Network RTOS partner for ARM and MSP430 microcontrollers

OpenRTOS and SafeRTOS

Xilinx Microblaze and Zynq partner