Ticket #39174

Invalid code generated for extern thead_local variables

Open Date: 2019-04-29 19:23 Last Update: 2019-04-29 22:25

Reporter:
Owner:
(None)
Type:
Status:
Closed
Component:
(None)
MileStone:
(None)
Priority:
5 - Medium
Severity:
5 - Medium
Resolution:
None
File:
1
Vote
Score: 1
100.0% (1/1)
0.0% (0/1)

Details

Compiler produces some strange code when using C++11 global thread local variables. The produced binary also permamently crashes when built in some circumstances. The problem is visible on all optimization levels above -O0, the binary compiled with -O0 looks good.

I use MSYS2 build environment with gcc 8.3.0 installed. Extract the attached minimal example and run the following commands:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

Then inspect the produced code with the GDB:

gdb sample.exe

(gdb) disas main
Dump of assembler code for function main():
   0x0000000000402d20 <+0>:     sub    $0x28,%rsp
   0x0000000000402d24 <+4>:     callq  0x401610 <__main>
   0x0000000000402d29 <+9>:     lea    -0x402d30(%rip),%rax        # 0x0
   0x0000000000402d30 <+16>:    test   %rax,%rax
   0x0000000000402d33 <+19>:    je     0x402d3a <main()+26>
   0x0000000000402d35 <+21>:    callq  0x0
   0x0000000000402d3a <+26>:    mov    0x160f(%rip),%rcx        # 0x404350 <.refptr.__emutls_v.tl>
   0x0000000000402d41 <+33>:    callq  0x402af0 <__emutls_get_address>
   0x0000000000402d46 <+38>:    movl   $0x2a,(%rax)
   0x0000000000402d4c <+44>:    xor    %eax,%eax
   0x0000000000402d4e <+46>:    add    $0x28,%rsp
   0x0000000000402d52 <+50>:    retq

The strange thing is instruction at <+9> "lea -0x703813bb(%rip),%rax" which calculates some address, the result is tested against zero at <+16> and call to this address is executed at <+21> in case it is non-zero. In this particular example the result is zero, so it does not crash. But in my case the result was not zero and still invalid which causes a crash. Unfortunately my example is not easy reproducible, it probably was triggered after I statically linked my DLL with large library (ffmpeg) and the calculated address became non-zero. Besides that even in the attached example the idea behind this strange address manipulation is not clear and looks like some code optimization bug.

   0x000000007771425d <+13>:    lea    -0x6f254264(%rip),%rax        # 0x84c0000 
   0x0000000077714264 <+20>:    mov    %rcx,%rbx
   0x0000000077714267 <+23>:    mov    %rdx,%rdi
   0x000000007771426a <+26>:    mov    %r8,%rbp
   0x000000007771426d <+29>:    mov    %r9,%r12
   0x0000000077714270 <+32>:    test   %rax,%rax
   0x0000000077714273 <+35>:    je     0x7771427a <Func()+42>
   0x0000000077714275 <+37>:    callq  0x84c0000 // <<<<<<<< Crash here!
   0x000000007771427a <+42>:    mov    0x813e3f(%rip),%rcx        # 0x77f280c0 <.refptr.__emutls_v._ZN4java6jniEnvE>

Compare this code with one compiled with -O0:

cmake -DCMAKE_BUILD_TYPE=Debug ..
make
   
(gdb) disas main
Dump of assembler code for function main():
   0x0000000000401560 <+0>:     push   %rbp
   0x0000000000401561 <+1>:     mov    %rsp,%rbp
   0x0000000000401564 <+4>:     sub    $0x20,%rsp
   0x0000000000401568 <+8>:     callq  0x401640 <__main>
   0x000000000040156d <+13>:    callq  0x402d50 <_ZTW2tl> // "TLS wrapper function for tl"
   0x0000000000401572 <+18>:    movl   $0x2a,(%rax)
   0x0000000000401578 <+24>:    mov    $0x0,%eax
   0x000000000040157d <+29>:    add    $0x20,%rsp
   0x0000000000401581 <+33>:    pop    %rbp
   0x0000000000401582 <+34>:    retq
   
(gdb) disas _ZTW2tl
Dump of assembler code for function _ZTW2tl:
   0x0000000000402d50 <+0>:     push   %rbp
   0x0000000000402d51 <+1>:     mov    %rsp,%rbp
   0x0000000000402d54 <+4>:     sub    $0x20,%rsp
   0x0000000000402d58 <+8>:     mov    0x15a1(%rip),%rax        # 0x404300 <.refptr._ZTH2tl> //"TLS init function for tl"
   0x0000000000402d5f <+15>:    test   %rax,%rax
   0x0000000000402d62 <+18>:    je     0x402d69 <_ZTW2tl+25>
   0x0000000000402d64 <+20>:    callq  0x0
   0x0000000000402d69 <+25>:    mov    0x15e0(%rip),%rcx        # 0x404350 <.refptr.__emutls_v.tl>
   0x0000000000402d70 <+32>:    callq  0x402b20 <__emutls_get_address>
   0x0000000000402d75 <+37>:    add    $0x20,%rsp
   0x0000000000402d79 <+41>:    pop    %rbp
   0x0000000000402d7a <+42>:    retq

It is seen that TLS wrapper is not inlined in this case and instead of calculating some address, some real variable read is generated at <+8> which looks more correct. The calling address at <+20> is probably replaced with a correct one when runtime-linked to some process context. In my case it looks like this:

(gdb) disas _ZTWN4java6jniEnvE                                                                                     
Dump of assembler code for function _ZTWN4java6jniEnvE:                                                            
   0x000000005c4aa790 <+0>:     push   %rbp                                                                        
   0x000000005c4aa791 <+1>:     mov    %rsp,%rbp                                                                   
   0x000000005c4aa794 <+4>:     sub    $0x20,%rsp                                                                  
   0x000000005c4aa798 <+8>:     mov    0xa1cf1(%rip),%rax        # 0x5c54c490 <.refptr._ZTHN4java6jniEnvE>         
   0x000000005c4aa79f <+15>:    test   %rax,%rax                                                                   
   0x000000005c4aa7a2 <+18>:    je     0x5c4aa7a9 <_ZTWN4java6jniEnvE+25>                                          
   0x000000005c4aa7a4 <+20>:    callq  0xffffffffec9a0000                                                          
   0x000000005c4aa7a9 <+25>:    mov    0xa1ff0(%rip),%rcx        # 0x5c54c7a0 <.refptr.__emutls_v._ZN4java6jniEnvE>
   0x000000005c4aa7b0 <+32>:    callq  0x5c1a0268 <__emutls_get_address>                                           
   0x000000005c4aa7b5 <+37>:    add    $0x20,%rsp                                                                  
   0x000000005c4aa7b9 <+41>:    pop    %rbp                                                                        
   0x000000005c4aa7ba <+42>:    retq

And it does not crash and works as expected, and crashing if compiled with -O1, -O2 or -O3. It would be nice to know what is the idea behind the address calculation instead of variable reading in debug build, which also significantly changes application logic between release and debug build which is very suspicious. For now, it looks like some optimization bug.

gcc -v
Using built-in specs.
COLLECT_GCC=C:\msys\mingw64\bin\gcc.exe
COLLECT_LTO_WRAPPER=C:/msys/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../gcc-8.3.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=ada,c,lto,c++,objc,obj-c++,fortran --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev2, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
Thread model: posix
gcc version 8.3.0 (Rev2, Built by MSYS2 project)

uname -a
MSYS_NT-6.1 artyom-VM 2.11.2(0.329/5/3) 2018-11-26 09:22 x86_64 Msys

I also tried to compile the sample with clang, it failed to link:

[100%] Linking CXX executable sample.exe
/usr/bin/cmake.exe -E cmake_link_script CMakeFiles/sample.dir/link.txt --verbose=1
/mingw64/bin/clang++.exe  -g -O3 -DNDEBUG  -Wl,--enable-auto-import CMakeFiles/sample.dir/main.cpp.o CMakeFiles/sample.dir/tls.cpp.o  -o sample.exe -Wl,--out-implib,libsample.dll.a -Wl,--major-image-version,0,--minor-image-version,0
CMakeFiles/sample.dir/tls.cpp.o:(.text+0x0): multiple definition of `TLS wrapper function for tl'
CMakeFiles/sample.dir/main.cpp.o:(.text+0x50): first defined here
clang++.exe: error: linker command failed with exit code 1 (use -v to see invocation)

$ clang++ -v
clang version 7.0.1 (tags/RELEASE_701/final)
Target: x86_64-w64-windows-gnu
Thread model: posix
InstalledDir: C:\msys\mingw64\bin

Ticket History (3/3 Histories)

2019-04-29 19:23 Updated by: vagran
  • New Ticket "Invalid code generated for extern thead_local variables" created
2019-04-29 22:25 Updated by: keith
  • Status Update from Open to Closed
Comment

I'm closing this, as invalid, for the following reasons:—

  • The code you illustrate is for a 64-bit host, so it most definitely was not generated by any compiler originating from this project.
  • GCC often uses lea instructions, as multi-byte nop filler code; that may, or may not be the intent here, but it's something to bear in mind.

In any case, since you are using tools originating from some other project, (which, BTW, is abusing our registered trademark, without authorization), we cannot help you.

Attachment File List

Edit

Please login to add comment to this ticket » Login