Belajar Profiling dengan pprof di Go

Pendahuluan: Memahami Mengapa Profiling Penting

Sebelum kita menyelami teknis profiling, penting untuk memahami konteksnya terlebih dahulu. Bayangkan Anda membangun sebuah aplikasi web yang melayani jutaan pengguna. Aplikasi tersebut tiba-tiba menjadi lambat, mengonsumsi memori berlebihan, atau bahkan crash di production. Tanpa profiling, Anda seperti dokter yang mencoba mendiagnosis penyakit tanpa alat medis—hanya bisa menebak-nebak.

Profiling adalah proses sistematis untuk mengukur dan menganalisis perilaku program Anda saat runtime. Dalam ekosistem Go, pprof adalah tool standar yang sangat powerful untuk melakukan ini. Tool ini dikembangkan oleh Google dan terintegrasi langsung dengan runtime Go, memberikan Anda visibilitas mendalam tentang bagaimana program Anda benar-benar berjalan.

Bagian 1: Fondasi Teori Profiling

Konsep Dasar: Apa yang Bisa Diprofiling?

Go menyediakan beberapa jenis profiling yang masing-masing mengukur aspek berbeda dari program Anda. Mari kita pahami setiap jenis ini dengan mendalam.

CPU Profiling mengukur di mana program Anda menghabiskan waktu CPU. Bayangkan sebuah stopwatch yang mengambil snapshot dari call stack Anda setiap beberapa milidetik (default 100 kali per detik). Dari ribuan snapshot ini, pprof bisa menghitung fungsi mana yang paling sering muncul, yang berarti fungsi tersebut mengonsumsi CPU paling banyak. Ini berbeda dari simple timing karena CPU profiling tidak terpengaruh oleh I/O wait—ia hanya mengukur waktu CPU aktif.

Memory Profiling melacak alokasi memori dalam program Anda. Go runtime mencatat setiap alokasi heap yang terjadi, beserta lokasi kode yang melakukan alokasi tersebut. Ada nuansa penting di sini: memory profiling menggunakan sampling untuk mengurangi overhead. Secara default, Go mencatat satu alokasi per 512KB yang dialokasikan. Ini berarti alokasi kecil yang sangat frequent mungkin tidak terlihat jika total bytes-nya kecil, namun pola ini justru bisa dideteksi dengan profiling yang tepat.

Goroutine Profiling memberikan snapshot dari semua goroutine yang sedang berjalan, beserta stack trace mereka. Ini sangat berguna untuk debugging deadlock atau menemukan goroutine leak—situasi di mana goroutine dibuat tapi tidak pernah selesai, menyebabkan memory leak.

Block Profiling melacak di mana goroutine Anda menghabiskan waktu menunggu pada synchronization primitives seperti channel, mutex, atau select statement. Ini membantu mengidentifikasi contention dan bottleneck dalam program concurrent Anda.

Mutex Profiling khusus melacak contention pada mutex. Berbeda dengan block profiling yang lebih umum, mutex profiling fokus pada berapa lama goroutine menunggu untuk acquire mutex yang sedang di-hold oleh goroutine lain.

Sampling vs Tracing: Memahami Metodologi

Penting untuk memahami bahwa pprof menggunakan pendekatan sampling, bukan tracing lengkap. Sampling berarti mengambil pengukuran pada interval tertentu, bukan merekam setiap event. Ini adalah trade-off: overhead yang rendah dengan detail yang cukup untuk identifikasi masalah. Tracing lengkap (seperti yang dilakukan oleh trace tool Go yang terpisah) memberikan detail sempurna tetapi dengan overhead yang lebih tinggi.

Bagian 2: Setup Environment dan Instrumentasi Dasar

Persiapan Proyek

Mari kita mulai dengan membuat proyek Go yang akan kita gunakan untuk eksplorasi mendalam profiling. Saya akan membuat contoh yang mencerminkan masalah real-world.

// main.go - Aplikasi simulasi pemrosesan data
package main

import (
    "fmt"
    "math/rand"
    "time"
)

// DataProcessor mensimulasikan pemrosesan data dengan karakteristik yang berbeda
type DataProcessor struct {
    data []int
}

// ProcessData melakukan operasi CPU-intensive
func (dp *DataProcessor) ProcessData() int {
    sum := 0
    // Simulasi perhitungan kompleks
    for i := 0; i < len(dp.data); i++ {
        for j := 0; j < 1000; j++ {
            sum += dp.data[i] * j
        }
    }
    return sum
}

// AllocateMemory mensimulasikan alokasi memori yang berbeda-beda
func (dp *DataProcessor) AllocateMemory() [][]byte {
    // Pattern 1: Banyak alokasi kecil
    smallAllocations := make([][]byte, 0)
    for i := 0; i < 10000; i++ {
        smallAllocations = append(smallAllocations, make([]byte, 100))
    }
    
    // Pattern 2: Beberapa alokasi besar
    largeAllocations := make([][]byte, 0)
    for i := 0; i < 10; i++ {
        largeAllocations = append(largeAllocations, make([]byte, 1024*1024)) // 1MB each
    }
    
    return append(smallAllocations, largeAllocations...)
}

func main() {
    // Inisialisasi dengan data random
    rand.Seed(time.Now().UnixNano())
    data := make([]int, 10000)
    for i := range data {
        data[i] = rand.Intn(100)
    }
    
    processor := &DataProcessor{data: data}
    
    // Simulasi workload
    for i := 0; i < 5; i++ {
        result := processor.ProcessData()
        _ = processor.AllocateMemory()
        fmt.Printf("Iteration %d completed with result: %d\n", i, result)
    }
}

Program sederhana ini sudah memberikan kita playground untuk eksplorasi profiling. Ia memiliki dua karakteristik yang menarik: CPU-intensive computation di ProcessData dan memory allocation patterns yang berbeda di AllocateMemory.

Metode 1: Profiling via Testing

Cara paling mudah untuk memulai profiling adalah melalui testing framework Go. Mari kita buat benchmark test yang akan kita profile.

// main_test.go
package main

import (
    "math/rand"
    "testing"
)

func BenchmarkProcessData(b *testing.B) {
    // Setup: buat data untuk benchmark
    data := make([]int, 10000)
    for i := range data {
        data[i] = rand.Intn(100)
    }
    processor := &DataProcessor{data: data}
    
    // Reset timer untuk tidak menghitung setup time
    b.ResetTimer()
    
    // Loop benchmark - akan dijalankan b.N kali
    for i := 0; i < b.N; i++ {
        processor.ProcessData()
    }
}

func BenchmarkAllocateMemory(b *testing.B) {
    data := make([]int, 10000)
    processor := &DataProcessor{data: data}
    
    b.ResetTimer()
    
    for i := 0; i < b.N; i++ {
        _ = processor.AllocateMemory()
    }
}

// Benchmark dengan sub-benchmarks untuk membandingkan scenarios
func BenchmarkDataProcessing(b *testing.B) {
    scenarios := []struct {
        name string
        size int
    }{
        {"Small", 100},
        {"Medium", 1000},
        {"Large", 10000},
    }
    
    for _, scenario := range scenarios {
        b.Run(scenario.name, func(b *testing.B) {
            data := make([]int, scenario.size)
            for i := range data {
                data[i] = rand.Intn(100)
            }
            processor := &DataProcessor{data: data}
            
            b.ResetTimer()
            for i := 0; i < b.N; i++ {
                processor.ProcessData()
            }
        })
    }
}

Sekarang kita bisa menjalankan benchmark dengan profiling:

# CPU profiling
go test -bench=. -cpuprofile=cpu.prof

# Memory profiling
go test -bench=. -memprofile=mem.prof

# Keduanya sekaligus
go test -bench=. -cpuprofile=cpu.prof -memprofile=mem.prof

# Dengan tambahan informasi alokasi
go test -bench=. -benchmem -memprofile=mem.prof

Flag -benchmem memberikan statistik alokasi memori langsung di output benchmark, yang sangat membantu untuk quick analysis sebelum deep dive dengan pprof.

Metode 2: Profiling Runtime dengan net/http/pprof

Untuk aplikasi yang berjalan sebagai service (web server, daemon, dll), kita bisa menggunakan package net/http/pprof yang menyediakan HTTP endpoint untuk profiling.

// server.go - Web server dengan pprof endpoints
package main

import (
    "fmt"
    "log"
    "math/rand"
    "net/http"
    _ "net/http/pprof" // Import untuk side effect: registers handlers
    "sync"
    "time"
)

type Server struct {
    processor *DataProcessor
    mu        sync.Mutex
    requests  int
}

func NewServer() *Server {
    data := make([]int, 10000)
    for i := range data {
        data[i] = rand.Intn(100)
    }
    
    return &Server{
        processor: &DataProcessor{data: data},
    }
}

func (s *Server) handleProcess(w http.ResponseWriter, r *http.Request) {
    // Simulasi request handling dengan CPU work
    s.mu.Lock()
    s.requests++
    currentRequest := s.requests
    s.mu.Unlock()
    
    result := s.processor.ProcessData()
    
    fmt.Fprintf(w, "Request #%d processed: %d\n", currentRequest, result)
}

func (s *Server) handleAllocate(w http.ResponseWriter, r *http.Request) {
    // Endpoint yang melakukan banyak alokasi
    allocations := s.processor.AllocateMemory()
    
    fmt.Fprintf(w, "Allocated %d chunks of memory\n", len(allocations))
}

func (s *Server) simulateLoad() {
    // Goroutine untuk mensimulasikan background load
    ticker := time.NewTicker(100 * time.Millisecond)
    defer ticker.Stop()
    
    for range ticker.C {
        s.processor.ProcessData()
    }
}

func main() {
    server := NewServer()
    
    // Start background load simulation
    go server.simulateLoad()
    
    // Register application handlers
    http.HandleFunc("/process", server.handleProcess)
    http.HandleFunc("/allocate", server.handleAllocate)
    
    // pprof handlers sudah teregister otomatis di /debug/pprof/
    // karena import _ "net/http/pprof"
    
    fmt.Println("Server starting on :6060")
    fmt.Println("Profiling endpoints available at:")
    fmt.Println("  http://localhost:6060/debug/pprof/")
    fmt.Println("Application endpoints:")
    fmt.Println("  http://localhost:6060/process")
    fmt.Println("  http://localhost:6060/allocate")
    
    log.Fatal(http.ListenAndServe(":6060", nil))
}

Dengan server ini berjalan, Anda bisa mengakses berbagai profiling endpoints. Mari kita pahami setiap endpoint yang tersedia.

Endpoint /debug/pprof/ memberikan overview HTML yang user-friendly. Ini adalah starting point yang baik untuk melihat apa yang tersedia. Endpoint /debug/pprof/profile melakukan CPU profiling selama 30 detik secara default (bisa diubah dengan parameter ?seconds=60). Endpoint /debug/pprof/heap memberikan heap profile snapshot. Endpoint /debug/pprof/goroutine menunjukkan semua goroutine yang sedang berjalan. Endpoint /debug/pprof/block dan /debug/pprof/mutex memberikan blocking dan mutex contention profiles.

Metode 3: Profiling Manual dengan runtime/pprof

Untuk kontrol penuh, Anda bisa menggunakan package runtime/pprof secara langsung.

// profiling_manual.go
package main

import (
    "fmt"
    "math/rand"
    "os"
    "runtime"
    "runtime/pprof"
    "time"
)

func runWithCPUProfiling() {
    // Buat file untuk menyimpan CPU profile
    f, err := os.Create("cpu_manual.prof")
    if err != nil {
        panic(err)
    }
    defer f.Close()
    
    // Start CPU profiling
    if err := pprof.StartCPUProfile(f); err != nil {
        panic(err)
    }
    defer pprof.StopCPUProfile()
    
    // Jalankan workload yang ingin diprofiling
    data := make([]int, 10000)
    for i := range data {
        data[i] = rand.Intn(100)
    }
    processor := &DataProcessor{data: data}
    
    for i := 0; i < 100; i++ {
        processor.ProcessData()
    }
    
    fmt.Println("CPU profiling completed, saved to cpu_manual.prof")
}

func runWithMemoryProfiling() {
    // Workload
    data := make([]int, 10000)
    processor := &DataProcessor{data: data}
    
    for i := 0; i < 50; i++ {
        _ = processor.AllocateMemory()
    }
    
    // Ambil memory profile setelah workload
    f, err := os.Create("mem_manual.prof")
    if err != nil {
        panic(err)
    }
    defer f.Close()
    
    // Force garbage collection untuk profiling yang lebih akurat
    runtime.GC()
    
    // Write heap profile
    if err := pprof.WriteHeapProfile(f); err != nil {
        panic(err)
    }
    
    fmt.Println("Memory profiling completed, saved to mem_manual.prof")
}

func runWithGoroutineProfiling() {
    // Buat beberapa goroutines dengan behavior berbeda
    done := make(chan bool)
    
    // Goroutines yang cepat selesai
    for i := 0; i < 10; i++ {
        go func(id int) {
            time.Sleep(100 * time.Millisecond)
            fmt.Printf("Quick goroutine %d done\n", id)
        }(i)
    }
    
    // Goroutines yang blocking
    blockChan := make(chan int)
    for i := 0; i < 5; i++ {
        go func(id int) {
            <-blockChan // Will block indefinitely
            fmt.Printf("Blocked goroutine %d done\n", id)
        }(i)
    }
    
    // Goroutines dengan computation
    for i := 0; i < 3; i++ {
        go func(id int) {
            sum := 0
            for j := 0; j < 100000000; j++ {
                sum += j
            }
            done <- true
        }(i)
    }
    
    // Tunggu sebentar untuk goroutines berjalan
    time.Sleep(200 * time.Millisecond)
    
    // Capture goroutine profile
    f, err := os.Create("goroutine.prof")
    if err != nil {
        panic(err)
    }
    defer f.Close()
    
    profile := pprof.Lookup("goroutine")
    if err := profile.WriteTo(f, 0); err != nil {
        panic(err)
    }
    
    fmt.Println("Goroutine profiling completed, saved to goroutine.prof")
    
    // Cleanup: close channel untuk release blocked goroutines
    close(blockChan)
    
    // Tunggu computing goroutines selesai
    for i := 0; i < 3; i++ {
        <-done
    }
}

func main() {
    fmt.Println("Running manual profiling examples...")
    
    fmt.Println("\n1. CPU Profiling...")
    runWithCPUProfiling()
    
    fmt.Println("\n2. Memory Profiling...")
    runWithMemoryProfiling()
    
    fmt.Println("\n3. Goroutine Profiling...")
    runWithGoroutineProfiling()
    
    fmt.Println("\nAll profiles generated. Use 'go tool pprof' to analyze them.")
}

Bagian 3: Analisis Mendalam dengan pprof Tool

Sekarang kita memiliki profile data, saatnya belajar membaca dan menginterpretasikannya. Tool go tool pprof adalah interface utama untuk analisis ini, dan ia memiliki banyak mode dan commands yang powerful.

Interactive Mode: Command-line Interface

Mode paling fleksibel dari pprof adalah interactive mode. Mari kita mulai dengan contoh CPU profile.

# Buka profile dalam interactive mode
go tool pprof cpu.prof

Anda akan melihat prompt (pprof). Ini adalah shell interaktif di mana Anda bisa menjalankan berbagai commands. Mari kita eksplorasi command-command penting.

Command top menampilkan fungsi-fungsi yang mengonsumsi CPU paling banyak. Output-nya terlihat seperti ini:

(pprof) top
Showing nodes accounting for 2.5s, 89.29% of 2.8s total
Dropped 15 nodes (cum <= 0.014s)
      flat  flat%   sum%        cum   cum%
     1.2s 42.86% 42.86%      1.8s 64.29%  main.(*DataProcessor).ProcessData
     0.8s 28.57% 71.43%      0.8s 28.57%  runtime.memmove
     0.3s 10.71% 82.14%      0.3s 10.71%  runtime.mallocgc
     0.2s  7.14% 89.29%      0.2s  7.14%  runtime.scanobject

Mari kita pahami setiap kolom ini dengan detail. Kolom flat menunjukkan waktu CPU yang dihabiskan langsung dalam fungsi tersebut, tidak termasuk fungsi-fungsi yang dipanggil olehnya. Kolom flat% adalah persentase dari total waktu profiling. Kolom sum% adalah persentase kumulatif. Kolom cum (cumulative) menunjukkan waktu yang dihabiskan dalam fungsi tersebut PLUS semua fungsi yang dipanggil olehnya. Kolom cum% adalah persentase cumulative time.

Perbedaan antara flat dan cum sangat penting. Jika sebuah fungsi memiliki flat time tinggi, berarti fungsi itu sendiri yang melakukan banyak computation. Jika memiliki cum time tinggi tetapi flat time rendah, berarti fungsi tersebut memanggil fungsi-fungsi lain yang expensive.

Command list menampilkan source code dari fungsi dengan annotasi profiling data:

(pprof) list ProcessData
Total: 2.8s
ROUTINE ======================== main.(*DataProcessor).ProcessData
     1.2s      1.8s (flat, cum) 64.29% of Total
         .          .     10:func (dp *DataProcessor) ProcessData() int {
         .          .     11:   sum := 0
         .          .     12:   // Simulasi perhitungan kompleks
      20ms       20ms     13:   for i := 0; i < len(dp.data); i++ {
     1.1s      1.7s     14:           for j := 0; j < 1000; j++ {
      70ms       70ms     15:                   sum += dp.data[i] * j
         .          .     16:           }
         .          .     17:   }
         .          .     18:   return sum
         .          .     19:}

Ini sangat powerful karena menunjukkan line-by-line di mana waktu CPU dihabiskan. Kita bisa lihat bahwa nested loop di line 14-15 mengonsumsi hampir semua waktu.

Command peek memberikan caller dan callee information untuk sebuah fungsi:

(pprof) peek ProcessData
Showing nodes accounting for 1.8s, 64.29% of 2.8s total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                             1.8s   100% |   main.main
     1.2s 42.86% 42.86%      1.8s 64.29%                | main.(*DataProcessor).ProcessData
                                             0.3s 16.67% |   runtime.memmove
                                             0.2s 11.11% |   runtime.mallocgc
----------------------------------------------------------+-------------

Ini menunjukkan bahwa ProcessData dipanggil dari main.main dan memanggil runtime.memmove dan runtime.mallocgc.

Command tree memberikan call tree visualization:

(pprof) tree
Showing nodes accounting for 2.5s, 89.29% of 2.8s total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                               2.8s   100% |   runtime.main
     0.0s     0% 89.29%      2.8s   100%                | main.main
                                             1.8s 64.29% |   main.(*DataProcessor).ProcessData
                                             0.6s 21.43% |   main.(*DataProcessor).AllocateMemory

Web UI: Visualisasi Grafis

Salah satu fitur paling impressive dari pprof adalah web UI-nya yang menggunakan graphviz untuk visualisasi call graph.

# Buka web UI (akan membuka browser otomatis)
go tool pprof -http=:8080 cpu.prof

# Atau dari remote server
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

Web UI memberikan beberapa view yang berbeda. View Graph menampilkan call graph di mana box size dan warna merepresentasikan resource consumption. Panah menunjukkan call relationships. View Flame Graph menampilkan stack traces sebagai flame graph, sangat berguna untuk melihat hot paths. View Peek dan Source memberikan informasi yang sama dengan command-line tetapi dengan UI yang lebih friendly.

Flame Graph: Memahami Visual Pattern

Flame graph adalah salah satu cara paling intuitif untuk memahami CPU profile. Bayangkan flame graph sebagai stack trace yang "dibakar" secara horizontal. Setiap bar horizontal merepresentasikan sebuah fungsi dalam call stack. Width dari bar proporsional dengan berapa lama fungsi tersebut ada dalam stack (waktu CPU). Stack ditumpuk secara vertikal, jadi fungsi di bagian bawah adalah callers, dan fungsi di bagian atas adalah callees.

Yang membuat flame graph powerful adalah Anda bisa langsung melihat "hot paths"—jalur eksekusi yang mengonsumsi CPU paling banyak—sebagai "tower" yang tinggi dan lebar. Fungsi di puncak tower ini adalah leaf functions yang benar-benar melakukan computation.

Bagian 4: Memory Profiling Deep Dive

Memory profiling lebih complex daripada CPU profiling karena ada beberapa metric yang berbeda untuk dilacak: allocation count, allocation bytes, in-use objects, dan in-use bytes.

Memahami Metric Memory

Mari kita analisis memory profile dengan detail:

go tool pprof mem.prof

Secara default, pprof memory menampilkan inuse_space—berapa banyak memori yang saat ini digunakan oleh objek-objek yang masih allocated. Namun ada metric lain yang bisa kita lihat:

(pprof) top
Showing nodes accounting for 10.5MB, 100% of 10.5MB total
      flat  flat%   sum%        cum   cum%
    10MB 95.24% 95.24%     10MB 95.24%  main.(*DataProcessor).AllocateMemory
     0.5MB  4.76%   100%    0.5MB  4.76%  runtime.allocm

Kita bisa mengubah sample type untuk melihat metric berbeda:

# Lihat semua sample type yang tersedia
(pprof) sample_index
alloc_objects
alloc_space
inuse_objects
inuse_space

# Switch ke allocated space (total bytes yang pernah dialokasi)
(pprof) sample_index = alloc_space

# Switch ke allocation count
(pprof) sample_index = alloc_objects

Perbedaan antara inuse dan alloc sangat penting. Metric alloc_space menunjukkan total semua alokasi yang pernah terjadi sejak program dimulai, termasuk yang sudah di-GC. Ini bagus untuk menemukan allocation hotspots. Metric inuse_space menunjukkan memori yang saat ini masih digunakan. Ini bagus untuk menemukan memory leaks. Metric alloc_objects menunjukkan jumlah objek yang dialokasi, berguna untuk menemukan frequent small allocations. Metric inuse_objects menunjukkan jumlah objek yang masih hidup.

Studi Kasus: Memory Leak Detection

Mari kita buat contoh yang lebih realistis dengan potential memory leak:

// memory_leak_example.go
package main

import (
    "fmt"
    "net/http"
    _ "net/http/pprof"
    "sync"
    "time"
)

// ConnectionPool mensimulasikan connection pool yang bocor
type ConnectionPool struct {
    connections []*Connection
    mu          sync.Mutex
}

type Connection struct {
    ID      int
    Buffer  []byte // Simulasi buffer untuk data
    Active  bool
}

func NewConnectionPool() *ConnectionPool {
    return &ConnectionPool{
        connections: make([]*Connection, 0),
    }
}

// BUG: Method ini tidak pernah merelease connections
func (cp *ConnectionPool) AcquireConnection() *Connection {
    cp.mu.Lock()
    defer cp.mu.Unlock()
    
    // Selalu buat connection baru (ini adalah leak!)
    conn := &Connection{
        ID:     len(cp.connections),
        Buffer: make([]byte, 1024*1024), // 1MB buffer per connection
        Active: true,
    }
    
    cp.connections = append(cp.connections, conn)
    return conn
}

// Method ini ada tapi tidak dipanggil dengan benar
func (cp *ConnectionPool) ReleaseConnection(conn *Connection) {
    cp.mu.Lock()
    defer cp.mu.Unlock()
    
    conn.Active = false
    // Dalam implementasi yang benar, kita harus menghapus dari slice
}

var globalPool *ConnectionPool

func handleRequest(w http.ResponseWriter, r *http.Request) {
    // BUG: Acquire connection tapi tidak pernah release
    conn := globalPool.AcquireConnection()
    
    // Simulasi pekerjaan dengan connection
    time.Sleep(10 * time.Millisecond)
    
    fmt.Fprintf(w, "Request handled with connection %d\n", conn.ID)
    // MISSING: globalPool.ReleaseConnection(conn)
}

func simulateTraffic() {
    ticker := time.NewTicker(100 * time.Millisecond)
    defer ticker.Stop()
    
    for range ticker.C {
        resp, err := http.Get("http://localhost:6060/request")
        if err == nil {
            resp.Body.Close()
        }
    }
}

func main() {
    globalPool = NewConnectionPool()
    
    http.HandleFunc("/request", handleRequest)
    
    // Start traffic simulator
    go simulateTraffic()
    
    fmt.Println("Server with memory leak running on :6060")
    fmt.Println("Monitor memory at: http://localhost:6060/debug/pprof/heap")
    http.ListenAndServe(":6060", nil)
}

Untuk mendeteksi leak ini, kita bisa:

# Ambil baseline heap profile
curl http://localhost:6060/debug/pprof/heap > heap_before.prof

# Tunggu beberapa menit agar leak terakumulasi
sleep 180

# Ambil heap profile setelah load
curl http://localhost:6060/debug/pprof/heap > heap_after.prof

# Bandingkan keduanya
go tool pprof -http=:8080 -base heap_before.prof heap_after.prof

Comparison profile akan menunjukkan apa yang bertambah. Dalam kasus ini, kita akan melihat AcquireConnection mengalokasi memori yang terus bertambah tanpa pernah berkurang.

Escape Analysis: Memahami Stack vs Heap Allocation

Go compiler melakukan escape analysis untuk menentukan apakah variable bisa dialokasi di stack (cepat, otomatis dibersihkan saat function return) atau harus di heap (lebih lambat, memerlukan GC). Kita bisa melihat hasil escape analysis:

# Compile dengan menampilkan escape analysis
go build -gcflags="-m -m" main.go

Output akan menunjukkan keputusan compiler:

./main.go:10:6: can inline (*DataProcessor).ProcessData
./main.go:25:26: make([]byte, 100) escapes to heap
./main.go:30:33: make([]byte, 1024 * 1024) escapes to heap

Kata "escapes to heap" berarti variable tersebut dialokasi di heap. Ini terjadi karena beberapa alasan: variable di-return dari function, variable disimpan di struct yang di-return, variable digunakan oleh closure yang outlives function, variable terlalu besar untuk stack, compiler tidak bisa membuktikan variable tidak akan escape.

Bagian 5: Goroutine dan Concurrency Profiling

Goroutine Profiling: Menemukan Leaks dan Deadlocks

Goroutine leaks adalah salah satu bug paling subtle dalam Go programs. Mari kita buat contoh yang mendemonstrasikan berbagai pattern problematic:

// goroutine_patterns.go
package main

import (
    "context"
    "fmt"
    "net/http"
    _ "net/http/pprof"
    "sync"
    "time"
)

// Pattern 1: Goroutine leak karena channel tidak pernah di-close
func leakyChannelPattern() {
    ch := make(chan int)
    
    // Goroutine ini akan block selamanya
    go func() {
        for val := range ch {
            fmt.Println("Received:", val)
        }
    }()
    
    // Channel tidak pernah di-close, goroutine di atas leak
}

// Pattern 2: Goroutine leak karena missing context cancellation
func leakyContextPattern() {
    ctx := context.Background() // Seharusnya WithCancel
    
    go func() {
        ticker := time.NewTicker(1 * time.Second)
        defer ticker.Stop()
        
        for {
            select {
            case <-ticker.C:
                // Do some work
                fmt.Println("Working...")
            case <-ctx.Done():
                return // Tidak pernah terjadi karena context tidak pernah cancelled
            }
        }
    }()
}

// Pattern 3: Goroutine leak karena WaitGroup tidak di-Done
type WorkerPool struct {
    wg   sync.WaitGroup
    jobs chan int
}

func (wp *WorkerPool) leakyWorker() {
    wp.wg.Add(1)
    // MISSING: defer wp.wg.Done()
    
    go func() {
        for job := range wp.jobs {
            time.Sleep(100 * time.Millisecond)
            fmt.Printf("Processing job %d\n", job)
        }
    }()
}

// Pattern 4: CORRECT - Properly managed goroutines
type ProperWorkerPool struct {
    ctx    context.Context
    cancel context.CancelFunc
    wg     sync.WaitGroup
    jobs   chan int
}

func NewProperWorkerPool() *ProperWorkerPool {
    ctx, cancel := context.WithCancel(context.Background())
    return &ProperWorkerPool{
        ctx:    ctx,
        cancel: cancel,
        jobs:   make(chan int, 100),
    }
}

func (pwp *ProperWorkerPool) Start(numWorkers int) {
    for i := 0; i < numWorkers; i++ {
        pwp.wg.Add(1)
        go pwp.worker(i)
    }
}

func (pwp *ProperWorkerPool) worker(id int) {
    defer pwp.wg.Done()
    
    for {
        select {
        case job, ok := <-pwp.jobs:
            if !ok {
                fmt.Printf("Worker %d shutting down (channel closed)\n", id)
                return
            }
            time.Sleep(50 * time.Millisecond)
            fmt.Printf("Worker %d processed job %d\n", id, job)
            
        case <-pwp.ctx.Done():
            fmt.Printf("Worker %d shutting down (context cancelled)\n", id)
            return
        }
    }
}

func (pwp *ProperWorkerPool) Shutdown() {
    close(pwp.jobs)
    pwp.cancel()
    pwp.wg.Wait()
    fmt.Println("All workers shut down properly")
}

func main() {
    // Trigger leaks untuk demonstrasi
    for i := 0; i < 10; i++ {
        leakyChannelPattern()
        leakyContextPattern()
    }
    
    // Proper worker pool untuk contrast
    pool := NewProperWorkerPool()
    pool.Start(5)
    
    // Send some jobs
    go func() {
        for i := 0; i < 50; i++ {
            pool.jobs <- i
            time.Sleep(20 * time.Millisecond)
        }
    }()
    
    // Setup HTTP server untuk profiling
    fmt.Println("Server running on :6060")
    fmt.Println("Check goroutines at: http://localhost:6060/debug/pprof/goroutine")
    
    // Cleanup setelah beberapa waktu
    time.AfterFunc(10*time.Second, func() {
        pool.Shutdown()
    })
    
    http.ListenAndServe(":6060", nil)
}

Untuk menganalisis goroutine profile:

# Ambil goroutine profile
curl http://localhost:6060/debug/pprof/goroutine > goroutine.prof

# Analisis dengan pprof
go tool pprof goroutine.prof

Dalam interactive mode, kita bisa melihat:

(pprof) top
Showing nodes accounting for 23, 100% of 23 total
      flat  flat%   sum%        cum   cum%
        20 86.96% 86.96%         20 86.96%  main.leakyChannelPattern.func1
         2  8.70% 95.65%          2  8.70%  main.leakyContextPattern.func1
         1  4.35%   100%          1  4.35%  runtime.gopark

Kita bisa melihat stack trace dari goroutines yang leak:

(pprof) list leakyChannelPattern
Total: 23
ROUTINE ======================== main.leakyChannelPattern.func1
        20         20 (flat, cum) 86.96% of Total
         .          .     12:   // Goroutine ini akan block selamanya
         .          .     13:   go func() {
        20         20     14:           for val := range ch {
         .          .     15:                   fmt.Println("Received:", val)
         .          .     16:           }
         .          .     17:   }()

Block Profiling: Menemukan Contention

Block profiling sangat berguna untuk menemukan di mana goroutines menghabiskan waktu waiting. Kita perlu mengaktifkannya secara eksplisit karena memiliki overhead:

// block_profiling_example.go
package main

import (
    "fmt"
    "math/rand"
    "net/http"
    _ "net/http/pprof"
    "runtime"
    "sync"
    "time"
)

// Contention scenario 1: Hot mutex
type Counter struct {
    mu    sync.Mutex
    value int64
}

func (c *Counter) Increment() {
    c.mu.Lock()
    // Simulasi work yang lama while holding lock
    time.Sleep(time.Microsecond * time.Duration(rand.Intn(100)))
    c.value++
    c.mu.Unlock()
}

// Contention scenario 2: Channel contention
type ChannelProcessor struct {
    ch chan int
}

func (cp *ChannelProcessor) Process() {
    // Multiple goroutines competing untuk read dari channel yang sama
    for val := range cp.ch {
        // Simulasi processing
        time.Sleep(time.Millisecond * time.Duration(rand.Intn(10)))
        _ = val * 2
    }
}

func main() {
    // CRITICAL: Set block profile rate
    // Rate 1 berarti record semua blocking events
    // Rate 0 berarti disable (default)
    runtime.SetBlockProfileRate(1)
    
    // Scenario 1: Mutex contention
    counter := &Counter{}
    for i := 0; i < 50; i++ {
        go func() {
            for {
                counter.Increment()
            }
        }()
    }
    
    // Scenario 2: Channel contention
    processor := &ChannelProcessor{ch: make(chan int, 10)}
    for i := 0; i < 20; i++ {
        go processor.Process()
    }
    
    go func() {
        for {
            processor.ch <- rand.Intn(1000)
            time.Sleep(time.Microsecond * 100)
        }
    }()
    
    fmt.Println("Server with contention running on :6060")
    fmt.Println("Block profile at: http://localhost:6060/debug/pprof/block")
    http.ListenAndServe(":6060", nil)
}

Analisis block profile:

# Ambil block profile setelah server berjalan beberapa menit
curl http://localhost:6060/debug/pprof/block > block.prof

go tool pprof block.prof

Output akan menunjukkan di mana blocking terjadi:

(pprof) top
Showing nodes accounting for 45.2s, 98.48% of 45.9s total
      flat  flat%   sum%        cum   cum%
    25.3s 55.12% 55.12%     25.3s 55.12%  sync.(*Mutex).Lock
    19.9s 43.36% 98.48%     19.9s 43.36%  runtime.chanrecv1

Kita bisa melihat detail stack trace:

(pprof) list Increment
Total: 45.9s
ROUTINE ======================== main.(*Counter).Increment
    25.3s     25.3s (flat, cum) 55.12% of Total
         .          .     18:func (c *Counter) Increment() {
    25.3s     25.3s     19:   c.mu.Lock()
         .          .     20:   // Simulasi work yang lama
         .          .     21:   time.Sleep(time.Microsecond * time.Duration(rand.Intn(100)))
         .          .     22:   c.value++
         .          .     23:   c.mu.Unlock()
         .          .     24:}

Mutex Profiling: Contention Analysis

Mutex profiling lebih spesifik daripada block profiling—ia fokus pada mutex contention saja:

runtime.SetMutexProfileFraction(1) // Enable mutex profiling

Kemudian kita bisa mengakses /debug/pprof/mutex endpoint dan menganalisisnya sama seperti block profile.

Bagian 6: Advanced Profiling Techniques

Differential Profiling: Membandingkan Profiles

Salah satu teknik paling powerful adalah membandingkan dua profile untuk melihat apa yang berubah:

# Ambil baseline
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu_baseline.prof

# Lakukan perubahan pada code atau trigger different load pattern

# Ambil profile baru
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu_after.prof

# Bandingkan
go tool pprof -http=:8080 -base cpu_baseline.prof cpu_after.prof

Differential profile akan menunjukkan delta—apa yang bertambah atau berkurang. Nilai positif berarti function mengonsumsi lebih banyak resources, nilai negatif berarti berkurang. Ini sangat berguna untuk validating optimization atau regression testing.

Continuous Profiling: Production Monitoring

Untuk production systems, kita bisa mengimplementasikan continuous profiling dengan mengambil snapshots secara periodik:

// continuous_profiler.go
package main

import (
    "fmt"
    "os"
    "path/filepath"
    "runtime/pprof"
    "time"
)

type ContinuousProfiler struct {
    outputDir string
    interval  time.Duration
    stopCh    chan struct{}
}

func NewContinuousProfiler(outputDir string, interval time.Duration) *ContinuousProfiler {
    return &ContinuousProfiler{
        outputDir: outputDir,
        interval:  interval,
        stopCh:    make(chan struct{}),
    }
}

func (cp *ContinuousProfiler) Start() {
    // Ensure output directory exists
    os.MkdirAll(cp.outputDir, 0755)
    
    go cp.run()
}

func (cp *ContinuousProfiler) run() {
    ticker := time.NewTicker(cp.interval)
    defer ticker.Stop()
    
    for {
        select {
        case <-ticker.C:
            cp.captureSnapshot()
        case <-cp.stopCh:
            return
        }
    }
}

func (cp *ContinuousProfiler) captureSnapshot() {
    timestamp := time.Now().Format("2006-01-02_15-04-05")
    
    // Capture heap profile
    heapFile := filepath.Join(cp.outputDir, fmt.Sprintf("heap_%s.prof", timestamp))
    f, err := os.Create(heapFile)
    if err != nil {
        fmt.Printf("Error creating heap profile: %v\n", err)
        return
    }
    defer f.Close()
    
    if err := pprof.WriteHeapProfile(f); err != nil {
        fmt.Printf("Error writing heap profile: %v\n", err)
    }
    
    // Capture goroutine profile
    goroutineFile := filepath.Join(cp.outputDir, fmt.Sprintf("goroutine_%s.prof", timestamp))
    g, err := os.Create(goroutineFile)
    if err != nil {
        fmt.Printf("Error creating goroutine profile: %v\n", err)
        return
    }
    defer g.Close()
    
    profile := pprof.Lookup("goroutine")
    if err := profile.WriteTo(g, 0); err != nil {
        fmt.Printf("Error writing goroutine profile: %v\n", err)
    }
    
    fmt.Printf("Snapshot captured at %s\n", timestamp)
}

func (cp *ContinuousProfiler) Stop() {
    close(cp.stopCh)
}

// Implementasi di aplikasi production
func main() {
    profiler := NewContinuousProfiler("./profiles", 5*time.Minute)
    profiler.Start()
    
    // Your application code here
    
    // Cleanup
    defer profiler.Stop()
}

Custom Profiling: Instrumentasi Manual

Kadang kita perlu profiling yang lebih spesifik untuk business logic kita. Go memungkinkan kita membuat custom profiles:

// custom_profiling.go
package main

import (
    "fmt"
    "os"
    "runtime/pprof"
    "time"
)

// Custom profile untuk tracking request latency distribution
var requestLatencyProfile = pprof.NewProfile("request_latency")

type Request struct {
    ID        int
    StartTime time.Time
}

func (r *Request) Complete() {
    duration := time.Since(r.StartTime)
    
    // Record ke custom profile
    // Note: pprof.Profile expects stack traces, jadi kita perlu creative
    // Untuk latency tracking, lebih baik menggunakan metrics library seperti Prometheus
    // Tapi ini mendemonstrasikan konsep custom profiles
    
    if duration > 100*time.Millisecond {
        // Record slow requests
        requestLatencyProfile.Add(r, 1)
    }
}

func processRequest(id int) {
    req := &Request{
        ID:        id,
        StartTime: time.Now(),
    }
    defer req.Complete()
    
    // Simulasi processing dengan variability
    time.Sleep(time.Duration(id%200) * time.Millisecond)
}

func main() {
    // Process beberapa requests
    for i := 0; i < 1000; i++ {
        processRequest(i)
    }
    
    // Write custom profile
    f, err := os.Create("custom_latency.prof")
    if err != nil {
        panic(err)
    }
    defer f.Close()
    
    if err := requestLatencyProfile.WriteTo(f, 0); err != nil {
        panic(err)
    }
    
    fmt.Println("Custom profile written")
}

Bagian 7: Optimization Workflows

Workflow 1: Performance Investigation

Ketika Anda menghadapi performance problem, gunakan workflow sistematis ini:

Step 1: Establish Baseline Sebelum optimization, measure current performance. Buat benchmark yang representatif:

func BenchmarkCurrentImplementation(b *testing.B) {
    // Setup
    data := generateTestData(10000)
    
    b.ResetTimer()
    b.ReportAllocs() // Report memory allocations
    
    for i := 0; i < b.N; i++ {
        result := currentImplementation(data)
        _ = result // Prevent optimization
    }
}

Run dengan profiling:

go test -bench=. -benchmem -cpuprofile=cpu_before.prof -memprofile=mem_before.prof

Step 2: Identify Hotspots Analisis profiles untuk menemukan bottleneck:

go tool pprof -http=:8080 cpu_before.prof

Cari functions dengan high flat time—ini adalah opportunities untuk optimization.

Step 3: Form Hypothesis Berdasarkan profiling data, form hypothesis tentang penyebab slowness. Misalnya: "Function X slow karena melakukan banyak string concatenation", "Allocation rate tinggi karena escape analysis", "Lock contention pada shared resource Y".

Step 4: Implement Fix Implement optimization, misalnya:

// BEFORE: Banyak string concatenation
func slowStringBuilder(items []string) string {
    result := ""
    for _, item := range items {
        result += item + ", " // Creates new string each iteration
    }
    return result
}

// AFTER: Menggunakan strings.Builder
func fastStringBuilder(items []string) string {
    var builder strings.Builder
    for i, item := range items {
        builder.WriteString(item)
        if i < len(items)-1 {
            builder.WriteString(", ")
        }
    }
    return builder.String()
}

Step 5: Measure Improvement Run benchmark lagi dengan profiling:

go test -bench=. -benchmem -cpuprofile=cpu_after.prof -memprofile=mem_after.prof

Compare results:

# Menggunakan benchstat untuk statistical comparison
go install golang.org/x/perf/cmd/benchstat@latest
benchstat before.txt after.txt

# Compare profiles
go tool pprof -http=:8080 -base cpu_before.prof cpu_after.prof

Step 6: Validate in Production Jangan hanya percaya pada microbenchmarks. Deploy ke staging atau canary, dan monitor dengan real traffic.

Workflow 2: Memory Leak Investigation

Step 1: Confirm the Leak Monitor memory usage over time. Jika memory terus naik tanpa plateau, Anda likely memiliki leak.

# Capture heap at different times
curl http://localhost:6060/debug/pprof/heap > heap_0min.prof
sleep 600  # Wait 10 minutes
curl http://localhost:6060/debug/pprof/heap > heap_10min.prof
sleep 600
curl http://localhost:6060/debug/pprof/heap > heap_20min.prof

Step 2: Compare Snapshots Use differential profiling:

go tool pprof -http=:8080 -base heap_0min.prof heap_20min.prof

Look for allocations yang terus bertambah di delta view.

Step 3: Analyze Goroutines Memory leaks sering disebabkan oleh goroutine leaks:

curl http://localhost:6060/debug/pprof/goroutine?debug=2 > goroutines.txt

File goroutines.txt akan berisi full stack trace dari semua goroutines. Cari patterns of goroutines stuck di same location.

Step 4: Use Runtime Metrics Go runtime menyediakan banyak metrics yang bisa membantu:

var m runtime.MemStats
runtime.ReadMemStats(&m)

fmt.Printf("Alloc = %v MiB", bToMb(m.Alloc))
fmt.Printf("\tTotalAlloc = %v MiB", bToMb(m.TotalAlloc))
fmt.Printf("\tSys = %v MiB", bToMb(m.Sys))
fmt.Printf("\tNumGC = %v\n", m.NumGC)

Monitor Alloc (current heap allocation) dan NumGC (number of GC cycles). Jika Alloc naik tapi NumGC juga naik, GC working tapi tidak bisa reclaim memory—strong indication of leak.

Bagian 8: Real-World Case Studies

Case Study 1: API Server dengan High Latency

Problem: API server experiencing P99 latency of 500ms, unacceptable untuk users.

Investigation:

# Ambil CPU profile saat handling production traffic
curl http://api-server:6060/debug/pprof/profile?seconds=60 > api_cpu.prof

go tool pprof -http=:8080 api_cpu.prof

Findings: Flame graph menunjukkan 40% waktu dihabiskan di json.Marshal. Source view reveal bahwa response JSON di-marshal untuk setiap request, bahkan untuk identical responses.

Solution: Implement response caching untuk frequently requested data:

type CachedJSONHandler struct {
    cache map[string][]byte
    mu    sync.RWMutex
}

func (h *CachedJSONHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    cacheKey := r.URL.Path
    
    h.mu.RLock()
    if cached, ok := h.cache[cacheKey]; ok {
        h.mu.RUnlock()
        w.Header().Set("Content-Type", "application/json")
        w.Write(cached)
        return
    }
    h.mu.RUnlock()
    
    // Generate response
    data := generateResponse(r)
    jsonData, _ := json.Marshal(data)
    
    h.mu.Lock()
    h.cache[cacheKey] = jsonData
    h.mu.Unlock()
    
    w.Header().Set("Content-Type", "application/json")
    w.Write(jsonData)
}

Result: P99 latency turun ke 50ms, reduction sebesar 90%.

Case Study 2: Batch Processing dengan Memory Spike

Problem: Batch processor mengalami OOM (Out of Memory) saat processing large files.

Investigation:

# Memory profile saat processing
go test -bench=BenchmarkBatchProcess -memprofile=mem.prof

go tool pprof mem.prof

Findings: alloc_space view menunjukkan 10GB allocated untuk buffers, dengan majority di function processFile:

func processFile(filename string) error {
    // BUG: Read entire file into memory
    data, err := ioutil.ReadFile(filename)
    if err != nil {
        return err
    }
    
    lines := bytes.Split(data, []byte("\n"))
    for _, line := range lines {
        processLine(line)
    }
    return nil
}

Solution: Stream processing instead of loading everything:

func processFileStreaming(filename string) error {
    file, err := os.Open(filename)
    if err != nil {
        return err
    }
    defer file.Close()
    
    scanner := bufio.NewScanner(file)
    // Set larger buffer jika lines sangat panjang
    buf := make([]byte, 0, 64*1024)
    scanner.Buffer(buf, 1024*1024)
    
    for scanner.Scan() {
        processLine(scanner.Bytes())
    }
    
    return scanner.Err()
}

Result: Memory usage turun dari 10GB ke ~100MB, processing time juga lebih cepat karena better cache locality.

Case Study 3: Goroutine Leak dalam Microservice

Problem: Kubernetes pod memory usage naik terus hingga OOMKilled setelah beberapa jam.

Investigation:

# Monitor goroutine count over time
while true; do
    curl -s http://service:6060/debug/pprof/goroutine?debug=1 | grep "goroutine profile" 
    sleep 60
done

# Output:
# goroutine profile: total 150
# goroutine profile: total 380
# goroutine profile: total 620
# ... keeps growing

Ambil goroutine profile dan analyze:

curl http://service:6060/debug/pprof/goroutine > goroutine.prof
go tool pprof goroutine.prof

Findings: Majority of goroutines stuck di grpcClient.Subscribe:

(pprof) top
      flat  flat%   sum%        cum   cum%
       450 72.58% 72.58%        450 72.58%  grpcClient.Subscribe
       120 19.35% 91.94%        120 19.35%  runtime.gopark

Source analysis reveal:

func (s *Service) handleEvent(event Event) {
    // BUG: Creates goroutine tapi tidak pernah cleanup
    go func() {
        stream, err := s.grpcClient.Subscribe(context.Background(), &SubscribeRequest{
            EventID: event.ID,
        })
        if err != nil {
            log.Error(err)
            return
        }
        
        for {
            msg, err := stream.Recv()
            if err != nil {
                return // Context never cancelled, stream never closed
            }
            processMessage(msg)
        }
    }()
}

Solution: Proper lifecycle management dengan context:

type Service struct {
    grpcClient  GRPCClient
    ctx         context.Context
    cancel      context.CancelFunc
    wg          sync.WaitGroup
    subscribers map[string]context.CancelFunc
    mu          sync.Mutex
}

func (s *Service) handleEvent(event Event) {
    ctx, cancel := context.WithCancel(s.ctx)
    
    s.mu.Lock()
    s.subscribers[event.ID] = cancel
    s.mu.Unlock()
    
    s.wg.Add(1)
    go func() {
        defer s.wg.Done()
        defer func() {
            s.mu.Lock()
            delete(s.subscribers, event.ID)
            s.mu.Unlock()
        }()
        
        stream, err := s.grpcClient.Subscribe(ctx, &SubscribeRequest{
            EventID: event.ID,
        })
        if err != nil {
            log.Error(err)
            return
        }
        
        for {
            select {
            case <-ctx.Done():
                return
            default:
                msg, err := stream.Recv()
                if err != nil {
                    return
                }
                processMessage(msg)
            }
        }
    }()
}

func (s *Service) Shutdown() {
    s.cancel() // Cancel all subscribers
    s.wg.Wait() // Wait for cleanup
}

Result: Goroutine count stable at ~50, no more memory leaks.

Bagian 9: Best Practices dan Pitfalls

Best Practices

1. Profile Before Optimizing "Premature optimization is the root of all evil" - Donald Knuth. Selalu profile terlebih dahulu untuk memastikan Anda mengoptimasi bottleneck yang benar.

2. Use Representative Workloads Profile dengan data dan load patterns yang mencerminkan production. Synthetic microbenchmarks sering misleading.

3. Profile Production (Carefully) pprof memiliki overhead, tapi biasanya acceptable untuk production profiling. CPU profiling overhead typically <5%. Memory profiling overhead minimal karena sampling. Hindari profiling dengan rate yang terlalu tinggi atau duration yang terlalu lama di production.

4. Automate Profiling dalam CI/CD Integrate benchmarking dan profiling dalam CI pipeline untuk detect performance regressions:

# .github/workflows/benchmark.yml
name: Benchmark
on: [pull_request]
jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-go@v2
      - name: Run benchmarks
        run: |
          go test -bench=. -benchmem -cpuprofile=cpu.prof -memprofile=mem.prof
          go tool pprof -top cpu.prof

5. Document Performance Characteristics Maintain documentation tentang expected performance metrics dan known bottlenecks.

Common Pitfalls

Pitfall 1: Ignoring Allocation Count Developers sering hanya melihat total bytes tapi ignore allocation count. Many small allocations bisa worse untuk GC daripada few large allocations:

// BAD: Many allocations
func badConcat(items []string) string {
    result := ""
    for _, item := range items {
        result += item // New allocation each iteration
    }
    return result
}

// GOOD: Single allocation dengan pre-sizing
func goodConcat(items []string) string {
    totalLen := 0
    for _, item := range items {
        totalLen += len(item)
    }
    
    var builder strings.Builder
    builder.Grow(totalLen) // Pre-allocate
    
    for _, item := range items {
        builder.WriteString(item)
    }
    return builder.String()
}

Pitfall 2: Profiling Debug Builds Selalu profile production builds dengan optimizations enabled. Debug builds memiliki banyak overhead yang tidak representative.

Pitfall 3: Misleading Cumulative Time High cumulative time tapi low flat time berarti function itu sendiri tidak slow, tapi memanggil slow functions. Optimize the callees, bukan caller.

Pitfall 4: Sampling Bias Remember bahwa pprof menggunakan sampling. Very short-lived functions mungkin underrepresented. Very infrequent tapi expensive operations mungkin missed.

Pitfall 5: Ignoring Context Selalu interpretasikan profiling data dalam context of your application's requirements. 100ms average latency mungkin acceptable untuk background job tapi unacceptable untuk API endpoint.

Bagian 10: Advanced Topics

Profiling dengan Delve Debugger

Delve adalah Go debugger yang powerful dan bisa dikombinasikan dengan profiling:

# Start dengan delve
dlv debug --headless --listen=:2345 --api-version=2

# Di terminal lain, attach dan collect profile
dlv connect :2345
(dlv) profile cpu cpu.prof

Integration dengan Observability Platforms

Production-grade profiling sering terintegrasi dengan observability platforms seperti Grafana, Datadog, atau custom solutions. Contoh integration dengan Prometheus:

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    goroutineCount = prometheus.NewGauge(prometheus.GaugeOpts{
        Name: "go_goroutines_current",
        Help: "Current number of goroutines",
    })
    
    memoryAlloc = prometheus.NewGauge(prometheus.GaugeOpts{
        Name: "go_memory_alloc_bytes",
        Help: "Current memory allocation in bytes",
    })
)

func init() {
    prometheus.MustRegister(goroutineCount)
    prometheus.MustRegister(memoryAlloc)
}

func collectMetrics() {
    ticker := time.NewTicker(10 * time.Second)
    defer ticker.Stop()
    
    for range ticker.C {
    goroutineCount.Set(float64(runtime.NumGoroutine()))
    
    var m runtime.MemStats
    runtime.ReadMemStats(&m)
    memoryAlloc.Set(float64(m.Alloc))
}

Profiling Distributed Systems

Untuk microservices dan distributed systems, profiling individual services tidak cukup. Gunakan distributed tracing (OpenTelemetry, Jaeger) kombinasi dengan profiling:

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/trace"
)

func handleRequest(ctx context.Context, req Request) error {
    tracer := otel.Tracer("myservice")
    ctx, span := tracer.Start(ctx, "handleRequest")
    defer span.End()
    
    // CPU-intensive work
    cpuCtx, cpuSpan := tracer.Start(ctx, "cpu-work")
    result := doCPUWork()
    cpuSpan.End()
    
    // Network call
    netCtx, netSpan := tracer.Start(ctx, "external-api")
    data := callExternalAPI(netCtx)
    netSpan.End()
    
    return processResult(result, data)
}

Kesimpulan

Profiling adalah skill essential untuk Go developers yang serius tentang performance. Tool pprof memberikan visibility mendalam ke dalam runtime behavior program Anda, memungkinkan Anda untuk identify dan fix performance issues dengan confidence.

Key takeaways dari handbook ini: selalu profile sebelum optimize, gunakan right profiling type untuk problem Anda (CPU, memory, goroutine, block, mutex), interpretasikan data dalam context yang tepat, validate improvements dengan measurements, automate profiling dalam development workflow.

Profiling adalah iterative process. Setiap optimization membuka visibility ke bottleneck berikutnya. Dengan practice dan experience, Anda akan develop intuition tentang di mana mencari masalah dan bagaimana menginterpretasikan profiling data dengan cepat.

Selamat profiling, dan semoga performance applications Anda selalu optimal!

PreviousRFC 7807: Problem Details for HTTP APIs NextHow to articulate your thoughts

Last updated 2 months ago

hashtagPendahuluan: Memahami Mengapa Profiling Penting

hashtagBagian 1: Fondasi Teori Profiling

hashtagKonsep Dasar: Apa yang Bisa Diprofiling?

hashtagSampling vs Tracing: Memahami Metodologi

hashtagBagian 2: Setup Environment dan Instrumentasi Dasar

hashtagPersiapan Proyek

hashtagMetode 1: Profiling via Testing

hashtagMetode 2: Profiling Runtime dengan net/http/pprof

hashtagMetode 3: Profiling Manual dengan runtime/pprof

hashtagBagian 3: Analisis Mendalam dengan pprof Tool

hashtagInteractive Mode: Command-line Interface

hashtagWeb UI: Visualisasi Grafis

hashtagFlame Graph: Memahami Visual Pattern

hashtagBagian 4: Memory Profiling Deep Dive

hashtagMemahami Metric Memory

hashtagStudi Kasus: Memory Leak Detection

hashtagEscape Analysis: Memahami Stack vs Heap Allocation

hashtagBagian 5: Goroutine dan Concurrency Profiling

hashtagGoroutine Profiling: Menemukan Leaks dan Deadlocks

hashtagBlock Profiling: Menemukan Contention

hashtagMutex Profiling: Contention Analysis

hashtagBagian 6: Advanced Profiling Techniques

hashtagDifferential Profiling: Membandingkan Profiles

hashtagContinuous Profiling: Production Monitoring

hashtagCustom Profiling: Instrumentasi Manual

hashtagBagian 7: Optimization Workflows

hashtagWorkflow 1: Performance Investigation

hashtagWorkflow 2: Memory Leak Investigation

hashtagBagian 8: Real-World Case Studies

hashtagCase Study 1: API Server dengan High Latency

hashtagCase Study 2: Batch Processing dengan Memory Spike

hashtagCase Study 3: Goroutine Leak dalam Microservice

hashtagBagian 9: Best Practices dan Pitfalls

hashtagBest Practices

hashtagCommon Pitfalls

hashtagBagian 10: Advanced Topics

hashtagProfiling dengan Delve Debugger

hashtagIntegration dengan Observability Platforms

hashtagProfiling Distributed Systems

hashtagKesimpulan

Pendahuluan: Memahami Mengapa Profiling Penting

Bagian 1: Fondasi Teori Profiling

Konsep Dasar: Apa yang Bisa Diprofiling?

Sampling vs Tracing: Memahami Metodologi

Bagian 2: Setup Environment dan Instrumentasi Dasar

Persiapan Proyek

Metode 1: Profiling via Testing

Metode 2: Profiling Runtime dengan net/http/pprof

Metode 3: Profiling Manual dengan runtime/pprof

Bagian 3: Analisis Mendalam dengan pprof Tool

Interactive Mode: Command-line Interface

Web UI: Visualisasi Grafis

Flame Graph: Memahami Visual Pattern

Bagian 4: Memory Profiling Deep Dive

Memahami Metric Memory

Studi Kasus: Memory Leak Detection

Escape Analysis: Memahami Stack vs Heap Allocation

Bagian 5: Goroutine dan Concurrency Profiling

Goroutine Profiling: Menemukan Leaks dan Deadlocks

Block Profiling: Menemukan Contention

Mutex Profiling: Contention Analysis

Bagian 6: Advanced Profiling Techniques

Differential Profiling: Membandingkan Profiles

Continuous Profiling: Production Monitoring

Custom Profiling: Instrumentasi Manual

Bagian 7: Optimization Workflows

Workflow 1: Performance Investigation

Workflow 2: Memory Leak Investigation

Bagian 8: Real-World Case Studies

Case Study 1: API Server dengan High Latency

Case Study 2: Batch Processing dengan Memory Spike

Case Study 3: Goroutine Leak dalam Microservice

Bagian 9: Best Practices dan Pitfalls

Best Practices

Common Pitfalls

Bagian 10: Advanced Topics

Profiling dengan Delve Debugger

Integration dengan Observability Platforms

Profiling Distributed Systems

Kesimpulan