Wednesday, October 19, 2016

Why the warning "implicit declaration of function" in C should really draw your attention

If a C code uses a function "foo" but does not specify the header declaring "foo", it may still compile, but the compiler will assign a default type to foo:
int foo();
ie, a function taking an unspecified list of arguments and returning an int.

If the function foo (as defined in its library) returns indeed an int, then the program will work (eg: printf, strcmp, ..., will work). But here is an example of what happen if the function does not return an int, but an unsigned long long int, as for strtoull.

Sample code:
#include <stdio.h>
#include <inttypes.h>

// omitting the header declaring strtoull
// #include <stdlib.h>

int main (int argc, char *argv[]) {
    char *str_number = "3000111000";
    uint64_t an_uint64 = 0;
    an_uint64 = strtoull(str_number, NULL, 10);
    printf("value: %" PRIu64 "\n", an_uint64);
    return 0;
}

Let's compile this code on a x86_64 linux machine. As expected gcc displays a warning because it can't find any declaration for strtoull:
jma@antec:~/tmp$ gcc --std=c99 test.c -o test
test.c: In function ‘main’:
test.c:10:2: warning: implicit declaration of function ‘strtoull’ [-Wimplicit-function-declaration]
  an_uint64 = strtoull(str_number, NULL, 10);
  ^
Nevertheless, the program compiles and we can execute the executable:
jma@antec:~/tmp$ ./test
value: 18446744072414695320
The program does not work as expected. The output we wanted was:
value: 3000111000

So what happened ?
gcc believes the return code of strtoull lies in a 32 bit register because it's (inferred) type says that it return an int (32 bit). But that return code has to be stored in "an_uint64", which is a uint64_t. So gcc generates an instruction (cltq) that will extend the 32 bit register onto 64 bit:
(gdb) disas
Dump of assembler code for function main:
   0x0000000000400556 <+0>: push   %rbp
   0x0000000000400557 <+1>: mov    %rsp,%rbp
   0x000000000040055a <+4>: sub    $0x20,%rsp
   0x000000000040055e <+8>: mov    %edi,-0x14(%rbp)
   0x0000000000400561 <+11>: mov    %rsi,-0x20(%rbp)
   0x0000000000400565 <+15>: movq   $0x400634,-0x8(%rbp)
   0x000000000040056d <+23>: mov    -0x8(%rbp),%rax
   0x0000000000400571 <+27>: mov    $0xa,%edx
   0x0000000000400576 <+32>: mov    $0x0,%esi
   0x000000000040057b <+37>: mov    %rax,%rdi
   0x000000000040057e <+40>: mov    $0x0,%eax
   0x0000000000400583 <+45>: callq  0x400440 <strtoull@plt>
=> 0x0000000000400588 <+50>: cltq   
   0x000000000040058a <+52>: mov    %rax,-0x10(%rbp)
   0x000000000040058e <+56>: mov    -0x10(%rbp),%rax
   0x0000000000400592 <+60>: mov    %rax,%rsi
   0x0000000000400595 <+63>: mov    $0x40063f,%edi
   0x000000000040059a <+68>: mov    $0x0,%eax
   0x000000000040059f <+73>: callq  0x400420 <printf@plt>
   0x00000000004005a4 <+78>: mov    $0x0,%eax
   0x00000000004005a9 <+83>: leaveq 
   0x00000000004005aa <+84>: retq
The expected result was found in %rax before the execution of cltq:
(gdb) info register rax
rax            0xb2d20f98 3000111000
(gdb) stepi
0x000000000040058a in main ()
(gdb) info register rax
rax            0xffffffffb2d20f98 -1294856296
(gdb)
When we include the header that declares strtoull, gcc generates an ASM without the cltq instruction, because it knows the result is already in %rax, on 64 bits.

Conclusion:
Even if it might look "convenient" that gcc automatically declares a function, it is not a robust way to code and should be avoided.