Valgrind:C/C++分析工具

January 13, 2017

Valgrind:C/C++分析工具

Valgrind是開源的測試框架，可以用來動態分析記憶體配置、快取使用、多執行序bug。

安裝

$ sudo pacman -Sy valgrind

基本用法

$ valgrind 程式名稱 args

預設會是用memcheck工具分析，在這個工具下她會匯出heap使用、memory leak、還有記憶體使用錯誤的部份backtrace。

進階用法

$ valgrind --tool=toolname 程式名稱 args

這是valgrind最有方便的地方，valgrind旗下有九個使用者端的工具，和幾個開發者工具。

1. Memcheck:記憶體錯誤分析工具。

2.Cachegrind:預測你的cache使用。

3.Callgrind:分析程式的function call次數，還有call graph，可以幫助快取分析。

4.Helgrind:多執行序錯誤分析工具，有race condition檢測功能。

5.DRD:另一個多執行序分析工具。

6.Massif:分析heap的使用，在一個程式執行中她會測量多次。

7.DHAT:另一種heap分析工具。

8.SGcheck:實驗性的全域變數與stack分析工具。

9.BBV:實驗性SimPoint相關工具。

這些工具比較常用的是前七個。接下來看一下個別的使用。

memcheck

$ valgrind --tool=memcheck 程式名稱 args

首先我先寫一個廢物code來做實驗。以下code new完之後沒有delete。

#include<iostream>  
using namespace std;  
int main()  
{  
    int *ptr;  
    ptr = new int;  
    *ptr = 2;  
    cout << *ptr << endl;  
}

然後我們來看一下分析結果。

==12313== Memcheck, a memory error detector  
==12313== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.  
==12313== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==12313== Command: ./garbage  
==12313==  
2  
==12313==  
==12313== HEAP SUMMARY:  
==12313== in use at exit: 4 bytes in 1 blocks  
==12313== total heap usage: 3 allocs, 2 frees, 73,732 bytes allocated  
==12313==  
==12313== LEAK SUMMARY:  
==12313== definitely lost: 4 bytes in 1 blocks  
==12313== indirectly lost: 0 bytes in 0 blocks  
==12313== possibly lost: 0 bytes in 0 blocks  
==12313== still reachable: 0 bytes in 0 blocks  
==12313== suppressed: 0 bytes in 0 blocks  
==12313== Rerun with --leak-check=full to see details of leaked memory  
==12313==  
==12313== For counts of detected and suppressed errors, rerun with: -v  
==12313== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

她寫了definitely lost: 4 bytes in 1 blocks，int是4 byte，我new完之後沒有回收所以memory leak了4 byte，不過由於這程式很間單所以很好找到錯誤來源，當程式變得複雜時我們可以加上--leak-check=full來找到源頭，來看一加完的結果。

==14268== Memcheck, a memory error detector  
==14268== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.  
==14268== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==14268== Command: ./garbage  
==14268==  
2  
==14268==  
==14268== HEAP SUMMARY:  
==14268== in use at exit: 4 bytes in 1 blocks  
==14268== total heap usage: 3 allocs, 2 frees, 73,732 bytes allocated  
==14268==  
==14268== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1  
==14268== at 0x4C2B1EC: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)  
==14268== by 0x4007C7: main (in /home/tommycc/garbage)  
==14268==  
==14268== LEAK SUMMARY:  
==14268== definitely lost: 4 bytes in 1 blocks  
==14268== indirectly lost: 0 bytes in 0 blocks  
==14268== possibly lost: 0 bytes in 0 blocks  
==14268== still reachable: 0 bytes in 0 blocks  
==14268== suppressed: 0 bytes in 0 blocks  
==14268==  
==14268== For counts of detected and suppressed errors, rerun with: -v  
==14268== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

可以發現這兩行，指出了錯誤點。

==14268== at 0x4C2B1EC: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)  
==14268== by 0x4007C7: main (in /home/tommycc/garbage)

Cachegrind

$valgrind --tool=cachegrind

它可以分析你程式的快取優化程度。它的結果總共有以下類別。以下擷取至valgrind manual

I cache reads (Ir, which equals the number of instructions executed), I1 cache read misses (I1mr) and LL cache instruction read misses (ILmr).
D cache reads (Dr, which equals the number of memory reads), D1 cache read misses (D1mr), and LL cache data read misses (DLmr).
D cache writes (Dw, which equals the number of memory writes), D1 cache write misses (D1mw), and LL cache data write misses (DLmw).
Conditional branches executed (Bc) and conditional branches mispredicted (Bcm).
Indirect branches executed (Bi) and indirect branches mispredicted (Bim).

一般來說I和D是首要分析的部份。以下的範例是二維陣列存取，我們都知道二維陣列在記憶體上其實是一維的，所以(row major)先row在column會比(column major)先column在row快。我們先執行row major的code如以下。

 for (int i = 0 ; i < 1000 ; i++ ){                                                                      
     for (int j = 0 ; j < 1000 ; j++ ){                                        
         a[i][j] = i+j;                                                        
     }                                                                         
 }

結果如下。

==11719== Cachegrind, a cache and branch-prediction profiler  
==11719== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.  
==11719== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==11719== Command: ./tt  
==11719==   
--11719-- warning: L3 cache found, using its data for the LL simulation.  
==11719==   
==11719== I   refs:      16,204,946  
==11719== I1  misses:         1,401  
==11719== LLi misses:         1,357  
==11719== I1  miss rate:       0.01%  
==11719== LLi miss rate:       0.01%  
==11719==   
==11719== D   refs:       7,729,575  (6,536,262 rd   + 1,193,313 wr)  
==11719== D1  misses:        78,407  (   13,698 rd   +    64,709 wr)  
==11719== LLd misses:        71,617  (    7,734 rd   +    63,883 wr)  
==11719== D1  miss rate:        1.0% (      0.2%     +       5.4%  )  
==11719== LLd miss rate:        0.9% (      0.1%     +       5.4%  )  
==11719==   
==11719== LL refs:           79,808  (   15,099 rd   +    64,709 wr)  
==11719== LL misses:         72,974  (    9,091 rd   +    63,883 wr)  
==11719== LL miss rate:         0.3% (      0.0%     +       5.4%  )

特別注意D1 miss rate是1.0%。接下來是column major的code，如下。

for (int i = 0 ; i < 1000 ; i++ ){                                                              
    for (int j = 0 ; j < 1000 ; j++ ){                                                                
        a[j][i] = i+j;                                                                          
    }                                                                                                 
}

結果如下。

==11522== Cachegrind, a cache and branch-prediction profiler  
==11522== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.  
==11522== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==11522== Command: ./tt  
==11522==   
--11522-- warning: L3 cache found, using its data for the LL simulation.  
==11522==   
==11522== I   refs:      16,204,946  
==11522== I1  misses:         1,401  
==11522== LLi misses:         1,357  
==11522== I1  miss rate:       0.01%  
==11522== LLi miss rate:       0.01%  
==11522==   
==11522== D   refs:       7,729,575  (6,536,262 rd   + 1,193,313 wr)  
==11522== D1  misses:     1,015,906  (   13,698 rd   + 1,002,208 wr)  
==11522== LLd misses:        71,617  (    7,734 rd   +    63,883 wr)  
==11522== D1  miss rate:       13.1% (      0.2%     +      84.0%  )  
==11522== LLd miss rate:        0.9% (      0.1%     +       5.4%  )  
==11522==   
==11522== LL refs:        1,017,307  (   15,099 rd   + 1,002,208 wr)  
==11522== LL misses:         72,974  (    9,091 rd   +    63,883 wr)  
==11522== LL miss rate:         0.3% (      0.0%     +       5.4%  )

可以發現D1 miss rate大幅上升至13.1%。另外cachegrind一樣會產生output file，可以透過cg_annotate分析，或是用Kcahcegrind分析。

Callgrind

用來分析整個程式的function call數目。

$valgrind --tool=callgrind 程式名稱

這會開始執行你的程式，通常會執行的比較慢。在執行過程中，你可以透過$callgrind_control -e -b或是$callgrind_control -b 來看程式當下執行的function call backtrace。以下是費氏數列遞迴版結果。

 PID 6989: ./fb  
sending command status internal to pid 6989  
  
  Totals:           Ir   
   Th 1  2,560,275,485   
  
  Frame:             Ir Backtrace for Thread 1  
   [ 0]  23,685,553,356 fb(long long) (58138276 x)  
   [ 1]  41,379,358,433 fb(long long) (58138297 x)  
   [ 2]  41,379,358,449 fb(long long) (58138297 x)  
   [ 3]  41,379,358,465 fb(long long) (58138297 x)  
   [ 4]  41,379,358,481 fb(long long) (58138297 x)  
   [ 5]  41,379,358,497 fb(long long) (58138297 x)  
   [ 6]  41,379,358,513 fb(long long) (58138297 x)  
   [ 7]  41,379,358,529 fb(long long) (58138297 x)  
   [ 8]  41,379,358,545 fb(long long) (58138297 x)  
   [ 9]  23,685,553,523 fb(long long) (58138276 x)  
   [10]  41,379,364,892 fb(long long) (58138297 x)  
   [11]  41,379,364,908 fb(long long) (58138297 x)  
   [12]  41,379,364,924 fb(long long) (58138297 x)  
   [13]  23,685,559,902 fb(long long) (58138276 x)  
   [14]  23,685,630,165 fb(long long) (58138276 x)  
   [15]  41,379,619,162 fb(long long) (58138297 x)  
   [16]  41,379,619,178 fb(long long) (58138297 x)  
   [17]  41,379,619,194 fb(long long) (58138297 x)  
   [18]  41,379,619,210 fb(long long) (58138297 x)  
   [19]  23,685,814,188 fb(long long) (58138276 x)  
   [20]  41,382,920,321 fb(long long) (58138297 x)  
   [21]  41,382,920,337 fb(long long) (58138297 x)  
   [22]  23,689,115,315 fb(long long) (58138276 x)  
   [23]  41,405,546,424 fb(long long) (58138297 x)  
   [24]  41,405,546,440 fb(long long) (58138297 x)  
   [25]  23,711,741,418 fb(long long) (58138276 x)  
   [26]  41,560,627,883 fb(long long) (58138297 x)  
   [27]  23,866,822,861 fb(long long) (58138276 x)  
   [28]  24,523,758,344 fb(long long) (58138276 x)  
   [29]  43,937,442,813 fb(long long) (58138297 x)  
   [30]   2,558,084,449 fb(long long) (1 x)  
   [31]   2,558,084,465 fb(long long) (1 x)  
   [32]   2,558,084,472 main (1 x)  
   [33]   2,558,187,428 (below main) (1 x)  
   [34]   2,558,187,439 _start (1 x)  
   [35]               . 0x0000000000000d70

可以看到執行當下的遞迴到哪裡，不過由於費氏數列浮動快，這用callgrind_control -e -b的深度會不同。要看到整個function的呼叫次數，可以透過callgrind執行完產生的calgrind.out.pid檔案，這時有兩種分析這個檔案的方式，一是透過callgrind_annotate callgrind.out.pid 來看結果，另一是用KCachegrind (KDE應用程式)。以下是KCachegrind結果，可以看到左下角的框框，fb被call了1+331160280次(第一層call+第二層以上call)。

helgrind

$valgrind --tool=helgrind

這次我們使用官方範例。

#include <pthread.h>  
  
int var = 0;  
  
void* child_fn ( void* arg ) {  
   var++; /* Unprotected relative to parent */ /* this is line 6 */  
   return NULL;  
}  
  
int main ( void ) {  
   pthread_t child;  
   pthread_create(&child, NULL, child_fn, NULL);  
   var++; /* Unprotected relative to child */ /* this is line 13 */  
   pthread_join(child, NULL);  
   return 0;  
}

很明顯的var沒有做mutex lock，會有race。結果如下。

==19156== Helgrind, a thread error detector  
==19156== Copyright (C) 2007-2015, and GNU GPL'd, by OpenWorks LLP et al.  
==19156== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==19156== Command: ./race  
==19156==   
==19156== ---Thread-Announcement------------------------------------------  
==19156==   
==19156== Thread #1 is the program's root thread  
==19156==   
==19156== ---Thread-Announcement------------------------------------------  
==19156==   
==19156== Thread #2 was created  
==19156==    at 0x51427AE: clone (in /usr/lib/libc-2.24.so)  
==19156==    by 0x4E431A9: create_thread (in /usr/lib/libpthread-2.24.so)  
==19156==    by 0x4E44C12: pthread_create@@GLIBC_2.2.5 (in /usr/lib/libpthread-2.24.so)  
==19156==    by 0x4C31810: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)  
==19156==    by 0x4C328FD: pthread_create@* (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)  
==19156==    by 0x4005C6: main (in /home/tommycc/race)  
==19156==   
==19156== ----------------------------------------------------------------  
==19156==   
==19156== Possible data race during read of size 4 at 0x60103C by thread #1  
==19156== Locks held: none  
==19156==    at 0x4005C7: main (in /home/tommycc/race)  
==19156==   
==19156== This conflicts with a previous write of size 4 by thread #2  
==19156== Locks held: none  
==19156==    at 0x400597: child_fn (in /home/tommycc/race)  
==19156==    by 0x4C31A04: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)  
==19156==    by 0x4E44453: start_thread (in /usr/lib/libpthread-2.24.so)  
==19156==  Address 0x60103c is 0 bytes inside data symbol "var"  
==19156==   
==19156== ----------------------------------------------------------------  
==19156==   
==19156== Possible data race during write of size 4 at 0x60103C by thread #1  
==19156== Locks held: none  
==19156==    at 0x4005D0: main (in /home/tommycc/race)  
==19156==   
==19156== This conflicts with a previous write of size 4 by thread #2  
==19156== Locks held: none  
==19156==    at 0x400597: child_fn (in /home/tommycc/race)  
==19156==    by 0x4C31A04: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)  
==19156==    by 0x4E44453: start_thread (in /usr/lib/libpthread-2.24.so)  
==19156==  Address 0x60103c is 0 bytes inside data symbol "var"  
==19156==   
==19156==   
==19156== For counts of detected and suppressed errors, rerun with: -v  
==19156== Use --history-level=approx or =none to gain increased speed, at  
==19156== the cost of reduced accuracy of conflicting-access information  
==19156== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

她很明確得告訴我們有可能的race問題。而且名稱叫做var。結果十分直觀。

Massif

$ valgrind --tool=massif 程式名稱`

他是用來分析data segement 中heap使用。
heap在data sgment通常是由malloc等函式創造的，是適當的優化減少heap可以paging和盡量避開使用swap space。
在執行完massif後會有一個massif.out.pid檔，可以透過ms_print來印出。不過一般高速運行的程式，他的malloc和free執行很快，在輸出圖形的時候看起來是一條線，所以可以透過valgrind --tool=massif --time-unit=B 程式名稱 讓massif透過allocate與deallcoate記憶體大小作圖。範例程式如下。

#include<stdio.h>  
#include<stdlib.h>  
  
int main()  
{  
    malloc(1000);  
    int* a[10];  
    for (int i = 0 ; i < 10 ; i++ )  
        a[i] = malloc(1000);  
    for (int i = 0 ; i < 10 ; i++ )  
        free(a[i]);  
}

結果如下。

--------------------------------------------------------------------------------  
Command:            ./heap  
Massif arguments:   --time-unit=B  
ms_print arguments: massif.out.2213  
--------------------------------------------------------------------------------  
  
  
    KB  
10.91^                                     ####                                 
     |                                     #                                    
     |                                  :::#   :::                              
     |                                  :  #   :                                
     |                              @@@@:  #   :  ::::                          
     |                              @   :  #   :  :                             
     |                           :::@   :  #   :  :   :::                       
     |                           :  @   :  #   :  :   :                         
     |                        ::::  @   :  #   :  :   :  :::                    
     |                        :  :  @   :  #   :  :   :  :                      
     |                    :::::  :  @   :  #   :  :   :  :  ::::                
     |                 ::::   :  :  @   :  #   :  :   :  :  :   :::             
     |                 :  :   :  :  @   :  #   :  :   :  :  :   :               
     |             :::::  :   :  :  @   :  #   :  :   :  :  :   :  ::::         
     |             :   :  :   :  :  @   :  #   :  :   :  :  :   :  :            
     |          ::::   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :::      
     |          :  :   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :        
     |      :::::  :   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :  :::   
     |      :   :  :   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :  :     
     |   ::::   :  :   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :  :  @  
   0 +----------------------------------------------------------------------->KB  
     0                                                                   20.84  
  
Number of snapshots: 23  
 Detailed snapshots: [9, 12 (peak), 22]  
  
--------------------------------------------------------------------------------  
  n        time(B)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)  
--------------------------------------------------------------------------------  
  0              0                0                0             0            0  
  1          1,016            1,016            1,000            16            0  
  2          2,032            2,032            2,000            32            0  
  3          3,048            3,048            3,000            48            0  
  4          4,064            4,064            4,000            64            0  
  5          5,080            5,080            5,000            80            0  
  6          6,096            6,096            6,000            96            0  
  7          7,112            7,112            7,000           112            0  
  8          8,128            8,128            8,000           128            0  
  9          9,144            9,144            9,000           144            0  
98.43% (9,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.  
->87.49% (8,000B) 0x400569: main (in /home/tommycc/heap)  
|   
->10.94% (1,000B) 0x400556: main (in /home/tommycc/heap)  
  
--------------------------------------------------------------------------------  
  n        time(B)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)  
--------------------------------------------------------------------------------  
 10         10,160           10,160           10,000           160            0  
 11         11,176           11,176           11,000           176            0  
 12         11,176           11,176           11,000           176            0  
98.43% (11,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.  
->89.48% (10,000B) 0x400569: main (in /home/tommycc/heap)  
|   
->08.95% (1,000B) 0x400556: main (in /home/tommycc/heap)  
  
--------------------------------------------------------------------------------  
  n        time(B)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)  
--------------------------------------------------------------------------------  
 13         12,192           10,160           10,000           160            0  
 14         13,208            9,144            9,000           144            0  
 15         14,224            8,128            8,000           128            0  
 16         15,240            7,112            7,000           112            0  
 17         16,256            6,096            6,000            96            0  
 18         17,272            5,080            5,000            80            0  
 19         18,288            4,064            4,000            64            0  
 20         19,304            3,048            3,000            48            0  
 21         20,320            2,032            2,000            32            0  
 22         21,336            1,016            1,000            16            0  
98.43% (1,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.  
->98.43% (1,000B) 0x400556: main (in /home/tommycc/heap)  
|   
->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%

可以看到malloc和free的作用。

其他工具

有些是valgrind的開發工具所以就跳過了。
ref:http://valgrind.org/docs/manual/manual.html
ㄧ
ㄧ
ㄧ

Written with StackEdit.