Hash Map di Go

Implementasi hash map di Golang (atau yang lebih dikenal sebagai map di bahasa Go) adalah salah satu struktur data paling penting dalam runtime Go. Map pada dasarnya adalah hash table yang menyediakan operasi O(1) rata-rata untuk insert, lookup, dan delete. Saya akan jelaskan secara lengkap berdasarkan implementasi terkini (pada Go 1.24 dan seterusnya, yang dirilis sekitar Februari 2025), termasuk tekniknya, cara handling, algoritma hashing, serta perbedaan dengan versi sebelumnya. Kemudian, saya akan buat versi sederhana kita sendiri dalam kode Go.

Saya asumsikan kita bicara tentang implementasi internal runtime Go, yang bisa berubah antar versi, tapi berdasarkan dokumentasi resmi dan source code terbaru. Implementasi ini dirancang untuk efisien, aman terhadap serangan hash flooding, dan mendukung berbagai tipe kunci (selama comparable, seperti string, int, dll.).

1. Teknik Utama Implementasi Hash Map di Go

Go menggunakan hash table sebagai dasar map. Sebelum Go 1.24, ini adalah hash table tradisional dengan separate chaining (buckets dengan overflow linked list). Mulai Go 1.24, Go beralih ke Swiss Tables (desain dari Google Abseil library), yang merupakan varian open addressing dengan optimasi modern seperti SIMD (Single Instruction Multiple Data) untuk performa lebih baik.

Mengapa Swiss Tables? Desain ini lebih cepat (hingga 60% di benchmark), hemat memori (load factor lebih tinggi ~87%), dan mendukung growth incremental tanpa pause besar. Ini mengatasi masalah latency di aplikasi real-time Go.

Struktur Data Utama

hmap struct (di runtime/map.go): Ini adalah pointer ke struktur internal map. Isinya termasuk:
- count: Jumlah elemen saat ini.
- flags: Status seperti sedang grow atau iterator active.
- B: Log base 2 dari jumlah buckets (pre-1.24) atau terkait table count.
- hash0: Seed random untuk hashing (dibuat per process untuk keamanan).
- buckets: Pointer ke array buckets (pre-1.24) atau ke array tables (post-1.24).
- Lainnya: oldbuckets untuk growth incremental, nevacuate untuk track progress resize.

Di Swiss Tables (Go 1.24+):

Map dibagi menjadi multiple independent tables (masing-masing hingga 1024 entries), bukan satu hash table besar. Jumlah table adalah power-of-two (misal 1, 2, 4, ...).
Setiap table punya groups (kelompok) dari 8 slot (key-value pair).
Setiap group punya control word 64-bit (metadata): Setiap byte (8 bit) untuk satu slot, encode status:
- Empty: 0x80 (10000000).
- Deleted: 0xFE (11111110).
- Occupied: 0x00 + lower 7 bits dari hash (h2).
Slot: Tempat simpan key dan value. Untuk map kecil (≤8 elemen), pakai single group tanpa table.

Pre-1.24:

Buckets array (2^B buckets), setiap bucket punya 8 slot + tophash array (top 8 bits hash untuk fast check).
Overflow buckets jika full (linked list).

2. Algoritma Hashing

Hashing adalah kunci efisiensi map. Go menghitung hash 64-bit untuk setiap key menggunakan fungsi runtime internal (di runtime/alg.go dan hash*.go).

Cara Kerja Hashing:
- Hash function bergantung pada tipe key (didefinisikan di type.alg untuk setiap tipe).
- Untuk string/bytes: Pakai memhash (memory-based hash) atau aeshash (AES hardware jika CPU support, fallback ke software). AES-based untuk cepat dan aman.
- Untuk int/float: Hash sederhana berdasarkan bit value.
- Untuk struct/complex: Hash rekursif atas field-fieldnya.
- Seed: Setiap process punya hash0 random (dibuat saat startup), dicampur ke hash untuk cegah serangan DoS (hash flooding, sejak Go 1.4).
- Output: 64-bit hash. Di Swiss Tables, dibagi jadi:
  - h1: Upper 57 bits → Tentukan group awal (index = h1 % jumlah groups).
  - h2: Lower 7 bits → Simpan di control word untuk fast matching.
- Upper bits hash juga tentukan table mana (extendible hashing).
Keamanan: Random seed cegah attacker buat collision masal. Hash function dirancang kriptografis mirip SipHash atau AES untuk distribusi uniform.

Contoh: Untuk key string "hello", runtime hitung hash = aeshash("hello", hash0), lalu split h1/h2.

3. Cara Handling (Operasi Lookup, Insert, Delete, dan Collision)

Collision Handling

Pre-1.24: Separate chaining. Jika bucket full (8 entries), tambah overflow bucket (linked). Collision: Cari di bucket + overflow chain.
Go 1.24+ (Swiss Tables): Open addressing dengan quadratic probing.
- Bukan linear probing (yang bisa cluster), tapi quadratic: Jika group full, probe ke group selanjutnya dengan langkah kuadratik (1, 4, 9, ...).
- Pakai control word untuk cek paralel: Bandingkan h2 dengan 8 bytes control sekaligus (SIMD jika available, atau arithmetic portable).
- Keuntungan: Probe sequence pendek, load factor tinggi (87% vs 6.5 pre-1.24).

Lookup (m[key])

Hitung hash(key) → h1 tentukan table dan group awal.
Cek control word: Cari byte yang match h2 (parallel check).
Jika match, bandingkan full key di slot itu.
Jika empty slot ditemui, stop (key tidak ada). Deleted slot lanjut probe.
Probe ke group berikut jika tidak ketemu.

Insert (m[key] = value)

Sama seperti lookup.
Jika key ada, update value.
Jika tidak, pakai slot empty pertama. Jika deleted ada, reuse deleted dulu.
Jika group full, probe quadratic ke group lain.
Jika table full (~87%), trigger growth.

Delete (delete(m, key))

Sama seperti lookup.
Jika ketemu, mark slot sebagai deleted (jika group full, untuk jaga probe chain). Jika ada empty, mark empty.
Tidak resize down (map tidak shrink).

Resizing/Growth

Pre-1.24: Trigger jika load factor >6.5 atau overflow terlalu banyak. Double buckets (B+1), lalu evacuate incremental (pindah entries lama ke baru selama operasi normal, pakai oldbuckets).
Go 1.24+: Incremental dan lokal per table.
- Setiap table grow independen.
- Saat table capai ~87%, double size hingga 1024 slots. Setelah itu, split jadi dua table baru, reinsert entries.
- Map punya array tables (power-of-two), ganti array jika perlu.
- Biaya bounded (max 1024 entries per grow), hindari copy besar.
- Selama growth, iterator anchor ke table lama, validasi di table baru.
Handling concurrency: Map tidak thread-safe by default (panic jika concurrent write). Pakai sync.Map untuk concurrent.
Lainnya: Map support iteration (for range), tapi order random karena hash random. Modifikasi selama iterasi OK di Swiss Tables (via versioning tables).

4. Performa dan Pertimbangan

Performa: O(1) rata-rata, worst-case O(n) jika collision buruk (tapi dicegah oleh random seed).
Memori: Hemat di Swiss Tables karena load factor tinggi, kurang overflow.
Limit: Key harus comparable (no slice, func, map sebagai key). Value bisa apa saja.

5. Buat Versi Kita Sendiri (Implementasi Sederhana Hash Map di Go)

Untuk demonstrasi, saya buat versi sederhana hash map dengan teknik traditional (mirip pre-1.24 Go): hash table dengan buckets array, separate chaining via slice (bukan linked list untuk simple). Kita pakai hash function bawaan (hash/maphash untuk mirip Go), load factor 0.75, dan resize double.

Ini bukan produksi-ready (tidak handle concurrency, tipe generik sederhana), tapi ilustrasi.

package main

import (
	"fmt"
	"hash/maphash"
)

// SimpleHashMap: Versi sederhana hash map untuk key string, value int.
type SimpleHashMap struct {
	buckets [][]entry // Array of buckets, each is slice of entries (separate chaining)
	size    int       // Jumlah elemen
	cap     int       // Jumlah buckets
	seed    maphash.Seed
	loadFactor float64
}

type entry struct {
	key   string
	value int
}

// NewSimpleHashMap: Buat map baru.
func NewSimpleHashMap(initialCap int) *SimpleHashMap {
	if initialCap < 1 {
		initialCap = 16
	}
	return &SimpleHashMap{
		buckets:    make([][]entry, initialCap),
		cap:        initialCap,
		seed:       maphash.MakeSeed(), // Random seed mirip Go
		loadFactor: 0.75,
	}
}

// hash: Hitung hash untuk key.
func (m *SimpleHashMap) hash(key string) uint64 {
	var h maphash.Hash
	h.SetSeed(m.seed)
	h.WriteString(key)
	return h.Sum64()
}

// index: Dapatkan bucket index dari hash.
func (m *SimpleHashMap) index(hash uint64) int {
	return int(hash % uint64(m.cap))
}

// Put: Insert atau update.
func (m *SimpleHashMap) Put(key string, value int) {
	if float64(m.size)/float64(m.cap) > m.loadFactor {
		m.resize()
	}
	h := m.hash(key)
	idx := m.index(h)
	bucket := m.buckets[idx]
	for i, e := range bucket {
		if e.key == key {
			m.buckets[idx][i].value = value // Update
			return
		}
	}
	// Insert baru
	m.buckets[idx] = append(bucket, entry{key: key, value: value})
	m.size++
}

// Get: Lookup.
func (m *SimpleHashMap) Get(key string) (int, bool) {
	h := m.hash(key)
	idx := m.index(h)
	for _, e := range m.buckets[idx] {
		if e.key == key {
			return e.value, true
		}
	}
	return 0, false
}

// Delete: Hapus key.
func (m *SimpleHashMap) Delete(key string) {
	h := m.hash(key)
	idx := m.index(h)
	bucket := m.buckets[idx]
	for i, e := range bucket {
		if e.key == key {
			// Hapus dari slice
			m.buckets[idx] = append(bucket[:i], bucket[i+1:]...)
			m.size--
			return
		}
	}
}

// resize: Double capacity.
func (m *SimpleHashMap) resize() {
	newCap := m.cap * 2
	newBuckets := make([][]entry, newCap)
	for _, bucket := range m.buckets {
		for _, e := range bucket {
			h := m.hash(e.key)
			newIdx := int(h % uint64(newCap))
			newBuckets[newIdx] = append(newBuckets[newIdx], e)
		}
	}
	m.buckets = newBuckets
	m.cap = newCap
}

func main() {
	m := NewSimpleHashMap(4)
	m.Put("one", 1)
	m.Put("two", 2)
	val, ok := m.Get("one")
	if ok {
		fmt.Println("one:", val) // Output: one: 1
	}
	m.Delete("one")
	_, ok = m.Get("one")
	fmt.Println("one exists?", ok) // false
}

Teknik: Hash table dengan separate chaining (slice per bucket).
Hashing: Pakai hash/maphash (mirip Go runtime, dengan seed random).
Handling Collision: Append ke slice bucket.
Resizing: Double buckets saat load >0.75, rehash semua (bukan incremental seperti Go asli).

Ini versi basic; untuk mirip Swiss Tables, butuh lebih kompleks (control words, probing). Jika mau extend, bisa tambah generic dengan Go 1.18+.

PreviousLRU Cache NextSimple Array Sum

Last updated 2 months ago