Unsafe Functions In C And Their Safer Replacements: Strings Part I

Subscribe To Our Feed | Follow Us On Twitter | Get Updates on Email

A string is a fundamental part of programs all around us. Data exchange in many forms happens in strings (e.g. user input, command line arguments, web forms, text protocols and what not.) But most programs written in C are plagued by security issues because of their usage of unsafe functions. A string is not a built-in data type in C, instead it is termed as a continguous sequence of characters terminated by a NULL character (‘\0’). Now, many of the “standard” string manipulation functions written in early part of C development took this definition by heart, assumed that a programmer always knows what he is doing (though I agree that this MUST be true), and put out a code meant to be used in an everyone-is-good world. Subsequently, the shortcomings were noticed, stronger sibling functions were created but the older ones are still supported because they are “standard”. This means that naive programmers continue to use them and put their programs’ security into jeopardy. This series will do an in-depth analysis of such unsafe functions, tell you why they are unsafe, and bring out what alternatives you have in-built and what alternatives you can create.

Our first candidate is the very famous “strcpy()”. Lets see why it is unsafe.

char *strcpy(char *dest, const char *src);

Its basic implementation is:

char *strcpy(char *dest, const char *src)
{
  while(NULL != *src)
  {
    *dest++ = *src++;
  }
}

Basically, this function copies the characters from *src pointer to *dest pointer, incrementing one by one, until it encounters a NULL character in the source. Now, I cannot stress this enough: Consider every source as tainted.

Here, as well, we immediately see the problem. What if there is no NULL in the source string? Or if the string is too long to be copied in the destination? strcpy wouldn’t bat an eyelid before overwriting the memory area that comes after the end of destination. Hmm, then what do we do? Idea!! Let’s limit the number of characters that are copied and thus “strncpy()” comes into picture:

char *strncpy(char *dest, const char *src, size_t n);

Here, a parameter “n” is given as the number of characters that must be copied. One would think that would make the code safe because no one can put in an arbitrarily long string and get away with it. WRONG!! it is still flawed because:

  • If the source string is larger than “n”, then this function will copy only n bytes and not null-terminate it. Hence, it may result in crashes / buffer overflows when you try to “use” the destination.
  • It is very easy to miscalculate “n” (strlen idiosyncracies and integer overflows, but thats for another post) and this length of bytes to be copied can many times turn out to be larger than the actual size of destination buffer.

What should you use then. Answer: “strlcpy()”.

size_t strlcpy(char *dst, const char *src, size_t size);

The differences here from strncpy are:

  1. It takes a “size of destination” instead “number of bytes to be copied”.
  2. It returns the number of bytes that were actually copied.

The benefits this fucntion gives us is that:

  • It makes sure that it NEVER writes out of bounds of the destination buffer because it knows the destination size.
  • It will properly null-terminate the string

So, this would pretty much save your skin because it will nullify the impact of “bad” source strings.

But are you completely covered? No. There are still a few issues left that we will discuss in coming weeks (amongst other things). One is how to “get” the string from the user safely in the first place. And another is source/destination buffer overlapping issues. Keep tuned to learn about these and much more and don’t forget to write in if you want to know about something in specific.

© Safer Code | Unsafe Functions In C And Their Safer Replacements: Strings Part I

Liked this post? Get FREE Updates
Subscribe to RSS feed

Or
Enter Your E-mail ID below

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • StumbleUpon
  • Reddit
  • Print this article!

Related posts

Tags: , , , , , , , ,

7 Comments

  • [...] In C And Their Safer Replacements: Strings Part II Subscribe To Our Feed | Follow Us On Twitter—-Last time, we advised you to use ditch the unsafe functions like strcpy and strcat, and use their safer [...]

  • Mark says:

    I know this is an aside from your point but the line:

    while(NULL != *src)

    is incorrect because NULL is a special pointer value indicating the pointer has no value. You are comparing a pointer type with a char type – although it will often compile correctly if NULL is defined as 0. It won’t compile if NULL is defined as (void *)0). You mean to use ” (often hash defined as nul) and is the character which terminates a C string.

  • Mark, you are correct. Thanks for pointing out the oversight on my part.

  • Dummy00001 says:

    strlcpy() is non-standard and non-portable. Thanks to erratic *BSD folks, there are already at least two variants in the wild.

    There is already better, standard, safe function serving the purpose perfectly – snprintf().

    Tnstead of strcpy(a,b) – snprintf(a,sizeof(a),”%s”,b). Added bonus – snprintf( a, sizeof(a), “%.*s”, len_of_b, b ) – becomes critical when handling intermediate non-0 terminate strings.

    Trivial snprintf() does miracles when you code safe string operations. For those in know – it does that for like ages now.

  • norbert says:

    “Trivial snprintf() does miracles when you code safe string operations. For those in know – it does that for like ages now.”

    the printf familly of functions is everything but trivial. It is surely convienient, but the proposed abuse fall into the category of ‘when all you have a hammer, everythings is a nail’.

    strncpy is usually enough

    char buffer [MAX_SIZE + 1]
    strncpy(buffer, input, MAX_SIZE)
    buffer[MAX_SIZE = 0;

    but more careful is
    if((size = strnlen(input, max_size)) < max_size)
    {
    memcpy(buffer, input, size);
    buffer[size] = 0;
    }
    else
    {
    /* error processing */
    }

    For info:
    #include
    #include
    #include

    int main(int argc, char**argv)
    {
    char* source = argv[1];
    char buffer[200];
    int i;
    int size;

    if(argc > 2 && *argv[2] == ’s’)
    {
    for(i = 0; i < 100000000; i++)
    {
    snprintf(buffer, sizeof(buffer), “%s”, source);
    source[10] |= i;
    }
    }
    else if(source)
    {
    for(i = 0; i < 100000000; i++)
    {
    if((size = strnlen(source, 15)) < 15)
    {
    memcpy(buffer, source, size);
    buffer[size] = 0;
    source[10] |= i;
    }
    }
    }
    return 0;
    }

    $ time ./a.out “foo”

    real 0m2.214s
    user 0m2.210s
    sys 0m0.000s
    $ time ./a.out “foo” “s”

    real 0m12.675s
    user 0m12.593s
    sys 0m0.000s

    the printf version is 6 times!!! more expensive.

    If that kind of performance waste is irrelevant to your program (and it could very well be), maybe you should not write it in C….

  • Zombie No. 5 says:

    Dummy00001: You sure know how to reduce your credibility to zero and make people not listen to you.

    “*BSD folks” are no more erratic than GNU, Linux or Solaris folks. In fact, they tend to be much more conservative and emphasize clean design. After all, GNU and Linux are nothing but a glorified BSD clone. It’s easy to avoid mistakes by taking advantage of decade long experience. Not that this would be automatic. Actually the GNU and Linux folks repeated a lot mistakes they could have avoided by learning from BSD history.

    Anyway, there are actually two versions of snprintf(). A newer POSIX variant and one that has been provided by Solaris as an extension long before. The latter returns the resulting – possibly truncated – string length. The same you get by using strlen() with the result. The POSIX variant returns the length of the string if the buffer had been sufficient. So on truncation you could use that value to allocate a sufficient buffer and then repeat the call with this buffer to get an untruncatened string.

    Also note that “%*.s” takes an “int” and not size_t as parameter and also returns an int. So it is limited to strings of a maximum length of INT_MAX. Something many 32-bit developers tend to ignore but is a easily an exploitable vulnerability nowadays on 64-bit systems with many gibibytes of RAM.

    Last but not least, the printf family is a rather high-level interface and has – considering the alternatives – a lot of overhead internally.

    In my opinion, a professional seasoned C developer will rarely rely on these silly C library functions except in small, trivial pieces of code. Rather they’ll use – or even roll their own – string library that takes care of memory allocation, truncation prevention and optimizations. Also strlcpy(), strncpy() etc. usually have subtle semantics – which you know if you bother reading their specifications – that are not desirable in many contexts when you actually use them. Examples:

    strncpy() pads the whole destination buffer with NULs but does not guarantee NUL-termination. strncat() DOES guarantee NUL-termination. The padding may seem stupid and an inefficient but it makes a lot of sense in the intented contexts. Nobody ever claimed these are general purpose functions. Hence, don’t be shy to roll your own functions which make more sense in your context. Of course, make sure you specify them cleanly including corner-cases and test them. In C strings are not something abstract which is both an advantage and disadvantage.

    strlcpy() can’t accept non-terminated data as source string and will also traverse the whole source string even if only a tiny part is copied. The latters makes it in efficient and an the former even insecure in many contexts.

  • Zombie No. 5 says:

    strcpy() is certainly not implemented as you claim because it actually works for empty strings (“”) unlike your code.

Leave a Reply