Valgrind:C/C++分析工具

Valgrind:C/C++分析工具

Valgrind是開源的測試框架,可以用來動態分析記憶體配置、快取使用、多執行序bug。

安裝

$ sudo pacman -Sy valgrind

基本用法

$ valgrind 程式名稱 args
預設會是用memcheck工具分析,在這個工具下她會匯出heap使用、memory leak、還有記憶體使用錯誤的部份backtrace。

進階用法

$ valgrind --tool=toolname 程式名稱 args
這是valgrind最有方便的地方,valgrind旗下有九個使用者端的工具,和幾個開發者工具。

1. Memcheck:記憶體錯誤分析工具。

2.Cachegrind:預測你的cache使用。

3.Callgrind:分析程式的function call次數,還有call graph,可以幫助快取分析。

4.Helgrind:多執行序錯誤分析工具,有race condition檢測功能。

5.DRD:另一個多執行序分析工具。

6.Massif:分析heap的使用,在一個程式執行中她會測量多次。

7.DHAT:另一種heap分析工具。

8.SGcheck:實驗性的全域變數與stack分析工具。

9.BBV:實驗性SimPoint相關工具。

這些工具比較常用的是前七個。接下來看一下個別的使用。

memcheck

$ valgrind --tool=memcheck 程式名稱 args

首先我先寫一個廢物code來做實驗。以下code new完之後沒有delete。

#include<iostream>  
using namespace std;  
int main()  
{  
    int *ptr;  
    ptr = new int;  
    *ptr = 2;  
    cout << *ptr << endl;  
}  

然後我們來看一下分析結果。

==12313== Memcheck, a memory error detector  
==12313== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.  
==12313== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==12313== Command: ./garbage  
==12313==  
2  
==12313==  
==12313== HEAP SUMMARY:  
==12313== in use at exit: 4 bytes in 1 blocks  
==12313== total heap usage: 3 allocs, 2 frees, 73,732 bytes allocated  
==12313==  
==12313== LEAK SUMMARY:  
==12313== definitely lost: 4 bytes in 1 blocks  
==12313== indirectly lost: 0 bytes in 0 blocks  
==12313== possibly lost: 0 bytes in 0 blocks  
==12313== still reachable: 0 bytes in 0 blocks  
==12313== suppressed: 0 bytes in 0 blocks  
==12313== Rerun with --leak-check=full to see details of leaked memory  
==12313==  
==12313== For counts of detected and suppressed errors, rerun with: -v  
==12313== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

她寫了definitely lost: 4 bytes in 1 blocks,int是4 byte,我new完之後沒有回收所以memory leak了4 byte,不過由於這程式很間單所以很好找到錯誤來源,當程式變得複雜時我們可以加上--leak-check=full來找到源頭,來看一加完的結果。

==14268== Memcheck, a memory error detector  
==14268== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.  
==14268== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==14268== Command: ./garbage  
==14268==  
2  
==14268==  
==14268== HEAP SUMMARY:  
==14268== in use at exit: 4 bytes in 1 blocks  
==14268== total heap usage: 3 allocs, 2 frees, 73,732 bytes allocated  
==14268==  
==14268== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1  
==14268== at 0x4C2B1EC: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)  
==14268== by 0x4007C7: main (in /home/tommycc/garbage)  
==14268==  
==14268== LEAK SUMMARY:  
==14268== definitely lost: 4 bytes in 1 blocks  
==14268== indirectly lost: 0 bytes in 0 blocks  
==14268== possibly lost: 0 bytes in 0 blocks  
==14268== still reachable: 0 bytes in 0 blocks  
==14268== suppressed: 0 bytes in 0 blocks  
==14268==  
==14268== For counts of detected and suppressed errors, rerun with: -v  
==14268== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)  

可以發現這兩行,指出了錯誤點。

==14268== at 0x4C2B1EC: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)  
==14268== by 0x4007C7: main (in /home/tommycc/garbage)  

Cachegrind

$valgrind --tool=cachegrind

它可以分析你程式的快取優化程度。它的結果總共有以下類別。以下擷取至valgrind manual

一般來說I和D是首要分析的部份。以下的範例是二維陣列存取,我們都知道二維陣列在記憶體上其實是一維的,所以(row major)先row在column會比(column major)先column在row快。我們先執行row major的code如以下。

 for (int i = 0 ; i < 1000 ; i++ ){                                                                      
     for (int j = 0 ; j < 1000 ; j++ ){                                        
         a[i][j] = i+j;                                                        
     }                                                                         
 } 

結果如下。

==11719== Cachegrind, a cache and branch-prediction profiler  
==11719== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.  
==11719== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==11719== Command: ./tt  
==11719==   
--11719-- warning: L3 cache found, using its data for the LL simulation.  
==11719==   
==11719== I   refs:      16,204,946  
==11719== I1  misses:         1,401  
==11719== LLi misses:         1,357  
==11719== I1  miss rate:       0.01%  
==11719== LLi miss rate:       0.01%  
==11719==   
==11719== D   refs:       7,729,575  (6,536,262 rd   + 1,193,313 wr)  
==11719== D1  misses:        78,407  (   13,698 rd   +    64,709 wr)  
==11719== LLd misses:        71,617  (    7,734 rd   +    63,883 wr)  
==11719== D1  miss rate:        1.0% (      0.2%     +       5.4%  )  
==11719== LLd miss rate:        0.9% (      0.1%     +       5.4%  )  
==11719==   
==11719== LL refs:           79,808  (   15,099 rd   +    64,709 wr)  
==11719== LL misses:         72,974  (    9,091 rd   +    63,883 wr)  
==11719== LL miss rate:         0.3% (      0.0%     +       5.4%  )  

特別注意D1 miss rate是1.0%。接下來是column major的code,如下。

for (int i = 0 ; i < 1000 ; i++ ){                                                              
    for (int j = 0 ; j < 1000 ; j++ ){                                                                
        a[j][i] = i+j;                                                                          
    }                                                                                                 
} 

結果如下。

==11522== Cachegrind, a cache and branch-prediction profiler  
==11522== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.  
==11522== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==11522== Command: ./tt  
==11522==   
--11522-- warning: L3 cache found, using its data for the LL simulation.  
==11522==   
==11522== I   refs:      16,204,946  
==11522== I1  misses:         1,401  
==11522== LLi misses:         1,357  
==11522== I1  miss rate:       0.01%  
==11522== LLi miss rate:       0.01%  
==11522==   
==11522== D   refs:       7,729,575  (6,536,262 rd   + 1,193,313 wr)  
==11522== D1  misses:     1,015,906  (   13,698 rd   + 1,002,208 wr)  
==11522== LLd misses:        71,617  (    7,734 rd   +    63,883 wr)  
==11522== D1  miss rate:       13.1% (      0.2%     +      84.0%  )  
==11522== LLd miss rate:        0.9% (      0.1%     +       5.4%  )  
==11522==   
==11522== LL refs:        1,017,307  (   15,099 rd   + 1,002,208 wr)  
==11522== LL misses:         72,974  (    9,091 rd   +    63,883 wr)  
==11522== LL miss rate:         0.3% (      0.0%     +       5.4%  )  

可以發現D1 miss rate大幅上升至13.1%。另外cachegrind一樣會產生output file,可以透過cg_annotate分析,或是用Kcahcegrind分析。

Callgrind

用來分析整個程式的function call數目。

$valgrind --tool=callgrind 程式名稱

這會開始執行你的程式,通常會執行的比較慢。在執行過程中,你可以透過$callgrind_control -e -b或是$callgrind_control -b 來看程式當下執行的function call backtrace。以下是費氏數列遞迴版結果。

 PID 6989: ./fb  
sending command status internal to pid 6989  
  
  Totals:           Ir   
   Th 1  2,560,275,485   
  
  Frame:             Ir Backtrace for Thread 1  
   [ 0]  23,685,553,356 fb(long long) (58138276 x)  
   [ 1]  41,379,358,433 fb(long long) (58138297 x)  
   [ 2]  41,379,358,449 fb(long long) (58138297 x)  
   [ 3]  41,379,358,465 fb(long long) (58138297 x)  
   [ 4]  41,379,358,481 fb(long long) (58138297 x)  
   [ 5]  41,379,358,497 fb(long long) (58138297 x)  
   [ 6]  41,379,358,513 fb(long long) (58138297 x)  
   [ 7]  41,379,358,529 fb(long long) (58138297 x)  
   [ 8]  41,379,358,545 fb(long long) (58138297 x)  
   [ 9]  23,685,553,523 fb(long long) (58138276 x)  
   [10]  41,379,364,892 fb(long long) (58138297 x)  
   [11]  41,379,364,908 fb(long long) (58138297 x)  
   [12]  41,379,364,924 fb(long long) (58138297 x)  
   [13]  23,685,559,902 fb(long long) (58138276 x)  
   [14]  23,685,630,165 fb(long long) (58138276 x)  
   [15]  41,379,619,162 fb(long long) (58138297 x)  
   [16]  41,379,619,178 fb(long long) (58138297 x)  
   [17]  41,379,619,194 fb(long long) (58138297 x)  
   [18]  41,379,619,210 fb(long long) (58138297 x)  
   [19]  23,685,814,188 fb(long long) (58138276 x)  
   [20]  41,382,920,321 fb(long long) (58138297 x)  
   [21]  41,382,920,337 fb(long long) (58138297 x)  
   [22]  23,689,115,315 fb(long long) (58138276 x)  
   [23]  41,405,546,424 fb(long long) (58138297 x)  
   [24]  41,405,546,440 fb(long long) (58138297 x)  
   [25]  23,711,741,418 fb(long long) (58138276 x)  
   [26]  41,560,627,883 fb(long long) (58138297 x)  
   [27]  23,866,822,861 fb(long long) (58138276 x)  
   [28]  24,523,758,344 fb(long long) (58138276 x)  
   [29]  43,937,442,813 fb(long long) (58138297 x)  
   [30]   2,558,084,449 fb(long long) (1 x)  
   [31]   2,558,084,465 fb(long long) (1 x)  
   [32]   2,558,084,472 main (1 x)  
   [33]   2,558,187,428 (below main) (1 x)  
   [34]   2,558,187,439 _start (1 x)  
   [35]               . 0x0000000000000d70 

可以看到執行當下的遞迴到哪裡,不過由於費氏數列浮動快,這用callgrind_control -e -b的深度會不同。要看到整個function的呼叫次數,可以透過callgrind執行完產生的calgrind.out.pid檔案,這時有兩種分析這個檔案的方式,一是透過callgrind_annotate callgrind.out.pid 來看結果,另一是用KCachegrind (KDE應用程式)。以下是KCachegrind結果,可以看到左下角的框框,fb被call了1+331160280次(第一層call+第二層以上call)。

helgrind

$valgrind --tool=helgrind

這次我們使用官方範例。

#include <pthread.h>  
  
int var = 0;  
  
void* child_fn ( void* arg ) {  
   var++; /* Unprotected relative to parent */ /* this is line 6 */  
   return NULL;  
}  
  
int main ( void ) {  
   pthread_t child;  
   pthread_create(&child, NULL, child_fn, NULL);  
   var++; /* Unprotected relative to child */ /* this is line 13 */  
   pthread_join(child, NULL);  
   return 0;  
}  

很明顯的var沒有做mutex lock,會有race。結果如下。

==19156== Helgrind, a thread error detector  
==19156== Copyright (C) 2007-2015, and GNU GPL'd, by OpenWorks LLP et al.  
==19156== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info  
==19156== Command: ./race  
==19156==   
==19156== ---Thread-Announcement------------------------------------------  
==19156==   
==19156== Thread #1 is the program's root thread  
==19156==   
==19156== ---Thread-Announcement------------------------------------------  
==19156==   
==19156== Thread #2 was created  
==19156==    at 0x51427AE: clone (in /usr/lib/libc-2.24.so)  
==19156==    by 0x4E431A9: create_thread (in /usr/lib/libpthread-2.24.so)  
==19156==    by 0x4E44C12: pthread_create@@GLIBC_2.2.5 (in /usr/lib/libpthread-2.24.so)  
==19156==    by 0x4C31810: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)  
==19156==    by 0x4C328FD: pthread_create@* (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)  
==19156==    by 0x4005C6: main (in /home/tommycc/race)  
==19156==   
==19156== ----------------------------------------------------------------  
==19156==   
==19156== Possible data race during read of size 4 at 0x60103C by thread #1  
==19156== Locks held: none  
==19156==    at 0x4005C7: main (in /home/tommycc/race)  
==19156==   
==19156== This conflicts with a previous write of size 4 by thread #2  
==19156== Locks held: none  
==19156==    at 0x400597: child_fn (in /home/tommycc/race)  
==19156==    by 0x4C31A04: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)  
==19156==    by 0x4E44453: start_thread (in /usr/lib/libpthread-2.24.so)  
==19156==  Address 0x60103c is 0 bytes inside data symbol "var"  
==19156==   
==19156== ----------------------------------------------------------------  
==19156==   
==19156== Possible data race during write of size 4 at 0x60103C by thread #1  
==19156== Locks held: none  
==19156==    at 0x4005D0: main (in /home/tommycc/race)  
==19156==   
==19156== This conflicts with a previous write of size 4 by thread #2  
==19156== Locks held: none  
==19156==    at 0x400597: child_fn (in /home/tommycc/race)  
==19156==    by 0x4C31A04: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)  
==19156==    by 0x4E44453: start_thread (in /usr/lib/libpthread-2.24.so)  
==19156==  Address 0x60103c is 0 bytes inside data symbol "var"  
==19156==   
==19156==   
==19156== For counts of detected and suppressed errors, rerun with: -v  
==19156== Use --history-level=approx or =none to gain increased speed, at  
==19156== the cost of reduced accuracy of conflicting-access information  
==19156== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)  

她很明確得告訴我們有可能的race問題。而且名稱叫做var。結果十分直觀。

Massif

$ valgrind --tool=massif 程式名稱`

他是用來分析data segement 中heap使用。
heap在data sgment通常是由malloc等函式創造的,是適當的優化減少heap可以paging和盡量避開使用swap space。
在執行完massif後會有一個massif.out.pid檔,可以透過ms_print來印出。不過一般高速運行的程式,他的malloc和free執行很快,在輸出圖形的時候看起來是一條線,所以可以透過valgrind --tool=massif --time-unit=B 程式名稱 讓massif透過allocate與deallcoate記憶體大小作圖。範例程式如下。

#include<stdio.h>  
#include<stdlib.h>  
  
int main()  
{  
    malloc(1000);  
    int* a[10];  
    for (int i = 0 ; i < 10 ; i++ )  
        a[i] = malloc(1000);  
    for (int i = 0 ; i < 10 ; i++ )  
        free(a[i]);  
}  

結果如下。

--------------------------------------------------------------------------------  
Command:            ./heap  
Massif arguments:   --time-unit=B  
ms_print arguments: massif.out.2213  
--------------------------------------------------------------------------------  
  
  
    KB  
10.91^                                     ####                                 
     |                                     #                                    
     |                                  :::#   :::                              
     |                                  :  #   :                                
     |                              @@@@:  #   :  ::::                          
     |                              @   :  #   :  :                             
     |                           :::@   :  #   :  :   :::                       
     |                           :  @   :  #   :  :   :                         
     |                        ::::  @   :  #   :  :   :  :::                    
     |                        :  :  @   :  #   :  :   :  :                      
     |                    :::::  :  @   :  #   :  :   :  :  ::::                
     |                 ::::   :  :  @   :  #   :  :   :  :  :   :::             
     |                 :  :   :  :  @   :  #   :  :   :  :  :   :               
     |             :::::  :   :  :  @   :  #   :  :   :  :  :   :  ::::         
     |             :   :  :   :  :  @   :  #   :  :   :  :  :   :  :            
     |          ::::   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :::      
     |          :  :   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :        
     |      :::::  :   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :  :::   
     |      :   :  :   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :  :     
     |   ::::   :  :   :  :   :  :  @   :  #   :  :   :  :  :   :  :   :  :  @  
   0 +----------------------------------------------------------------------->KB  
     0                                                                   20.84  
  
Number of snapshots: 23  
 Detailed snapshots: [9, 12 (peak), 22]  
  
--------------------------------------------------------------------------------  
  n        time(B)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)  
--------------------------------------------------------------------------------  
  0              0                0                0             0            0  
  1          1,016            1,016            1,000            16            0  
  2          2,032            2,032            2,000            32            0  
  3          3,048            3,048            3,000            48            0  
  4          4,064            4,064            4,000            64            0  
  5          5,080            5,080            5,000            80            0  
  6          6,096            6,096            6,000            96            0  
  7          7,112            7,112            7,000           112            0  
  8          8,128            8,128            8,000           128            0  
  9          9,144            9,144            9,000           144            0  
98.43% (9,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.  
->87.49% (8,000B) 0x400569: main (in /home/tommycc/heap)  
|   
->10.94% (1,000B) 0x400556: main (in /home/tommycc/heap)  
  
--------------------------------------------------------------------------------  
  n        time(B)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)  
--------------------------------------------------------------------------------  
 10         10,160           10,160           10,000           160            0  
 11         11,176           11,176           11,000           176            0  
 12         11,176           11,176           11,000           176            0  
98.43% (11,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.  
->89.48% (10,000B) 0x400569: main (in /home/tommycc/heap)  
|   
->08.95% (1,000B) 0x400556: main (in /home/tommycc/heap)  
  
--------------------------------------------------------------------------------  
  n        time(B)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)  
--------------------------------------------------------------------------------  
 13         12,192           10,160           10,000           160            0  
 14         13,208            9,144            9,000           144            0  
 15         14,224            8,128            8,000           128            0  
 16         15,240            7,112            7,000           112            0  
 17         16,256            6,096            6,000            96            0  
 18         17,272            5,080            5,000            80            0  
 19         18,288            4,064            4,000            64            0  
 20         19,304            3,048            3,000            48            0  
 21         20,320            2,032            2,000            32            0  
 22         21,336            1,016            1,000            16            0  
98.43% (1,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.  
->98.43% (1,000B) 0x400556: main (in /home/tommycc/heap)  
|   
->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%  

可以看到malloc和free的作用。

其他工具

有些是valgrind的開發工具所以就跳過了。
ref:http://valgrind.org/docs/manual/manual.html


Written with StackEdit.

comments powered by Disqus