Work Stealing di Go

📌 1. Apa itu Work-Stealing?

Work-stealing adalah strategi penjadwalan (scheduling strategy) untuk mendistribusikan beban kerja (task) secara efisien di antara worker thread.

🔑 Intinya:

Setiap worker memiliki queue tugas sendiri (biasanya deque: double-ended queue).
Kalau worker selesai dengan antriannya sendiri dan masih idle, dia akan "mencuri" tugas dari worker lain yang masih sibuk.
Dengan ini, beban kerja merata, thread starvation (worker menganggur) berkurang.

📌 2. Bagaimana di Golang?

Di Golang, work-stealing diimplementasikan pada level runtime scheduler, khususnya untuk manajemen Goroutine.

🔧 Konsepnya:

Setiap P (Processor) punya antrian lokal berisi goroutine.
Jika M (Machine = OS Thread) idle, dia bisa mencuri goroutine dari antrian P lain.
Tujuannya: Menjaga semua core CPU tetap sibuk.

📌 3. Ilustrasi Arsitektur:

Goroutine -> dijadwalkan ke P -> dieksekusi oleh M -> idle M bisa mencuri dari P lain.

📌 4. Contoh Kode Sederhana

Karena work-stealing terjadi di runtime, kita tidak perlu mengatur secara manual. Tapi kita bisa mensimulasikan perilaku beban kerja tak seimbang supaya runtime memicu work-stealing.

Berikut contoh program Go yang memperlihatkan banyak goroutine dengan beban kerja berbeda:

package main

import (
	"fmt"
	"math/rand"
	"runtime"
	"time"
)

func worker(id int, jobs <-chan int) {
	for job := range jobs {
		time.Sleep(time.Duration(rand.Intn(500)) * time.Millisecond) // simulasi kerja
		fmt.Printf("Worker %d menyelesaikan job %d\n", id, job)
	}
}

func main() {
	runtime.GOMAXPROCS(4) // misalnya kita pakai 4 core

	jobs := make(chan int, 100)

	// Buat 4 worker goroutine
	for w := 1; w <= 4; w++ {
		go worker(w, jobs)
	}

	// Kirim 20 job
	for j := 1; j <= 20; j++ {
		jobs <- j
	}
	close(jobs)

	// Tunggu agar worker selesai
	time.Sleep(5 * time.Second)
}

🧩 Penjelasan:

Kita punya 4 worker goroutine.
Masing-masing job durasinya random → ada worker yang cepat selesai lebih dulu.
Worker idle akan mencuri dari antrian global (atau local P lain).
Runtime yang mengatur work-stealing → developer tidak perlu atur manual.

📌 5. Inti Penting

✅ Work-stealing membuat Golang efisien:

Tidak semua goroutine numpuk di satu core.
Beban dibagi otomatis.
Cocok untuk beban kerja yang tidak merata (irregular parallelism).

🔑 Ringkas:

Detail

Siapa yang mengatur?

Runtime Scheduler (M, P, G)

Tujuan?

CPU core selalu sibuk

Bagaimana?

Mencuri goroutine dari antrian P lain

Developer?

Hanya fokus bikin goroutine, runtime yang atur

📌 Gambaran

Ada beberapa worker.
Setiap worker punya queue lokal (deque).
Worker akan memproses task dari queue sendiri.
Kalau queue sendiri kosong, worker akan mencuri task dari queue worker lain.

📌 Contoh Kode: Work-Stealing Manual

Berikut implementasi sederhana, hanya untuk demo (tidak thread-safe optimal, tapi ilustratif).

package main

import (
	"fmt"
	"math/rand"
	"sync"
	"time"
)

// Task adalah fungsi yang dikerjakan worker
type Task func()

// Deque sederhana (queue double-ended)
type Deque struct {
	tasks []Task
	lock  sync.Mutex
}

func NewDeque() *Deque {
	return &Deque{
		tasks: []Task{},
	}
}

func (d *Deque) PushBottom(task Task) {
	d.lock.Lock()
	defer d.lock.Unlock()
	d.tasks = append(d.tasks, task)
}

func (d *Deque) PopBottom() Task {
	d.lock.Lock()
	defer d.lock.Unlock()
	if len(d.tasks) == 0 {
		return nil
	}
	task := d.tasks[len(d.tasks)-1]
	d.tasks = d.tasks[:len(d.tasks)-1]
	return task
}

func (d *Deque) StealTop() Task {
	d.lock.Lock()
	defer d.lock.Unlock()
	if len(d.tasks) == 0 {
		return nil
	}
	task := d.tasks[0]
	d.tasks = d.tasks[1:]
	return task
}

// Worker
type Worker struct {
	id    int
	deque *Deque
	pool  []*Worker
	wg    *sync.WaitGroup
}

func (w *Worker) Run() {
	for {
		task := w.deque.PopBottom()
		if task != nil {
			task()
			w.wg.Done()
			continue
		}

		// Coba mencuri dari worker lain
		stolen := false
		for _, other := range w.pool {
			if other == w {
				continue
			}
			task = other.deque.StealTop()
			if task != nil {
				fmt.Printf("Worker %d mencuri task dari Worker %d\n", w.id, other.id)
				task()
				w.wg.Done()
				stolen = true
				break
			}
		}

		if !stolen {
			// Semua queue kosong, istirahat sebentar
			time.Sleep(10 * time.Millisecond)
		}
	}
}

func main() {
	numWorkers := 4
	workers := make([]*Worker, numWorkers)
	wg := &sync.WaitGroup{}

	// Buat worker + deque
	for i := 0; i < numWorkers; i++ {
		workers[i] = &Worker{
			id:    i,
			deque: NewDeque(),
			pool:  workers,
			wg:    wg,
		}
	}

	// Jalankan worker
	for _, w := range workers {
		go w.Run()
	}

	// Buat 20 task secara acak ke worker
	numTasks := 20
	for i := 0; i < numTasks; i++ {
		wg.Add(1)
		taskID := i
		task := func() {
			fmt.Printf("Task %d dikerjakan oleh goroutine\n", taskID)
			time.Sleep(time.Duration(rand.Intn(500)) * time.Millisecond)
		}
		// Distribusikan ke worker random
		worker := workers[rand.Intn(numWorkers)]
		worker.deque.PushBottom(task)
	}

	wg.Wait()
	fmt.Println("Semua task selesai!")
}

📌 Cara Kerja

✅ Deque:

PushBottom → Worker menambahkan task ke bawah deque-nya sendiri.
PopBottom → Worker mengambil task dari bawah deque sendiri.
StealTop → Worker lain mencuri task dari atas deque orang lain.

✅ Worker Loop:

Kerjakan task sendiri.
Kalau habis, coba steal dari deque worker lain.
Kalau semua habis, tunggu sebentar, lalu ulang.

📌 Output

Kalau dijalankan, kamu akan lihat output mirip:

Task 0 dikerjakan oleh goroutine
Task 1 dikerjakan oleh goroutine
Worker 1 mencuri task dari Worker 0
...
Semua task selesai!

📌 Catatan Penting

Ini demo → deque pakai sync.Mutex supaya aman diakses dari beberapa worker.
Versi runtime scheduler Go aslinya jauh lebih rumit & dioptimasi.
Real-world work-stealing pakai deque lock-free seperti Chase-Lev deque.

📌 📚 1. Apa itu Go Runtime Scheduler?

Golang punya green-thread scheduler bawaan. Artinya: goroutine = lightweight thread buatan user → dipetakan ke thread OS nyata. Runtime Scheduler bertugas:

Menjadwalkan jutaan goroutine ke thread OS yang jumlahnya terbatas.
Mengatur distribusi beban kerja (work stealing).
Mengatur preemptive scheduling (bisa pause/resume goroutine).

📌 🧩 2. Komponen Utama: M, P, G

Go memakai model M-P-G (Machine, Processor, Goroutine).

Komponen

Fungsi

G (Goroutine)

Unit eksekusi paling kecil → tugas kamu.

M (Machine)

OS thread nyata yang menjalankan G.

P (Processor)

Struktur logis: menyimpan queue G & resource eksekusi (context).

🔍 Rincian

✅ G (Goroutine)

Representasi unit kerja.
Punya stack kecil → runtime bisa perbesar/mperkecil.
Ribuan G bisa aktif → cheap dibanding thread OS.
Runtime memetakan G ke M melalui P.

✅ M (Machine)

Adalah OS Thread nyata (pthread di Linux/Mac).
Membawa register, stack, runtime state.
M mengambil G dari P.
Kalau tidak ada P, M tidak bisa kerja → harus punya P.

✅ P (Processor)

Ini bukan CPU fisik, tapi slot logis di runtime.
Punya queue lokal: antrian G (siap dijalankan).
Menyediakan resource seperti memory cache, scheduler context.
Jumlah P diatur oleh GOMAXPROCS (default = jumlah CPU core).
- Contoh: runtime.GOMAXPROCS(4) → runtime membuat 4 P.

📌 🔄 3. Cara Kerja M-P-G Bersama

1️⃣ Kamu buat goroutine → misalnya go f(). 2️⃣ Goroutine (G) masuk ke queue P tertentu. 3️⃣ M aktif (thread OS) terhubung ke P. 4️⃣ M ambil G dari queue P → jalankan. 5️⃣ Kalau P kehabisan G → M bisa:

Curi G dari P lain (work stealing).
Ambil G dari global run queue. 6️⃣ Kalau M idle & tidak punya P → parkir, sleep, tunggu P tersedia.

📌 🧵 4. Kenapa Butuh P?

Tanpa P:

M tidak bisa mengeksekusi G.
P = resource execution → kalau P habis, M idle.

Tujuan:

Membatasi jumlah thread aktif.
Mengontrol scaling → 1 M : 1 P.
Membuat concurrency efisien di multi-core.

📌 🔀 5. Global Run Queue & Work-Stealing

Jika queue lokal P kosong → P steal dari P lain (work stealing).
Kalau semua P kosong → scheduler mencari G di global run queue.
Work stealing → beban kerja merata → CPU tidak idle.

📌 ⚡ 6. Preemption (Preemptive Scheduling)

Runtime Go mendukung preemptive scheduling:

Goroutine bisa dipaksa stop.
Cegah G rakus CPU terlalu lama.
Membuat concurrency tetap responsif.

📌 🏗️ 7. Contoh Diagram Sederhana

         +-----------+          +-----------+         +-----------+
         |    M1     | <------> |    P1     | <------> |  Queue G  |
         +-----------+          +-----------+         +-----------+

         +-----------+          +-----------+         +-----------+
         |    M2     | <------> |    P2     | <------> |  Queue G  |
         +-----------+          +-----------+         +-----------+

M1 & M2 = thread OS nyata.
P1 & P2 = slot logis berisi queue G.
G diproses, queue lokal, work stealing antar P.

📌 🏆 8. Kenapa Desain Ini Hebat?

✅ Ribuan G → hanya perlu P sesuai core → hemat OS thread. ✅ Work-stealing → load balancing otomatis. ✅ Preemptive → goroutine lambat tidak block yang lain. ✅ Terlihat sederhana buat developer → go f() → runtime atur semuanya.

📌 📣 Kesimpulan

Elemen

Ringkasan

Unit kerja terkecil

Thread OS nyata

Slot logis, queue G

GOMAXPROCS

Jumlah P aktif

Work-Stealing

P idle mencuri G dari P lain

Preemption

Runtime pause/resume G

✅ 1️⃣ Kode tracing `M`, `P`, `G` (pakai `runtime` package)

Kita tidak bisa melihat M, P, G langsung, tapi bisa lihat dampaknya: berapa banyak goroutine, berapa P aktif (GOMAXPROCS), dan ID goroutine. Berikut contoh:

package main

import (
	"fmt"
	"runtime"
	"sync"
	"time"
)

func worker(id int, wg *sync.WaitGroup) {
	defer wg.Done()
	fmt.Printf("Worker %d start | GOMAXPROCS: %d | NumGoroutine: %d\n",
		id,
		runtime.GOMAXPROCS(0),
		runtime.NumGoroutine(),
	)
	time.Sleep(time.Second)
}

func main() {
	runtime.GOMAXPROCS(2) // Batasi hanya 2 P aktif

	wg := &sync.WaitGroup{}
	numWorkers := 5

	for i := 0; i < numWorkers; i++ {
		wg.Add(1)
		go worker(i, wg)
	}

	fmt.Printf("Jumlah OS thread (NumCPU): %d\n", runtime.NumCPU())
	fmt.Printf("Jumlah GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))
	fmt.Printf("Jumlah goroutine: %d\n", runtime.NumGoroutine())

	wg.Wait()
	fmt.Println("Selesai semua.")
}

Apa yang terlihat?

GOMAXPROCS(2) → cuma 2 P.
Jalankan 5 goroutine → scheduler akan switch antar goroutine di 2 P.
NumGoroutine menunjukkan jumlah G.
Kalau kamu runtime.LockOSThread(), kamu bisa paksa G tetap di 1 M.

✅ 2️⃣ Intip `runtime/proc.go`: Struktur M, P, G

Berikut cuplikan nyata dari sumber kode Go runtime:

// runtime/runtime2.go

type g struct {
	stack       stack
	sched       gobuf
	m           *m    // M yang mengeksekusi G ini
	// ...
}

type m struct {
	g0      *g    // Goroutine khusus untuk runtime
	curg    *g    // Goroutine yang sedang berjalan
	p       *p    // Processor yang terhubung
	// ...
}

type p struct {
	runqhead uint32
	runqtail uint32
	runq     [256]g // queue G lokal P ini
	// ...
}

➡️ Makna nyata:

G menyimpan stack, context, dan pointer ke M.
M punya pointer ke curg (Goroutine aktif).
P punya queue runq → antrian G di P ini.

📌 Sumber aslinya di: src/runtime/runtime2.go

✅ 3️⃣ Simulasi M, P, G pakai log custom

Kita bikin mini scheduler:

M → OS thread (goroutine di sini).
P → Slot logis dengan queue G.
G → Task.
Tunjukkan worker, queue lokal, dan stealing.

Berikut simulasi M-P-G:

package main

import (
	"fmt"
	"math/rand"
	"sync"
	"time"
)

// G = Goroutine (task)
type G struct {
	id int
}

func (g *G) run() {
	fmt.Printf("Goroutine %d running\n", g.id)
	time.Sleep(time.Millisecond * time.Duration(rand.Intn(300)))
}

// P = Processor (queue lokal)
type P struct {
	id    int
	queue []*G
	lock  sync.Mutex
}

// M = Machine (OS thread)
type M struct {
	id int
	p  *P
	wg *sync.WaitGroup
	pool []*P
}

func (m *M) run() {
	for {
		m.p.lock.Lock()
		var g *G
		if len(m.p.queue) > 0 {
			g = m.p.queue[0]
			m.p.queue = m.p.queue[1:]
			m.p.lock.Unlock()
			fmt.Printf("[M %d] eksekusi G %d dari P %d\n", m.id, g.id, m.p.id)
			g.run()
			m.wg.Done()
		} else {
			m.p.lock.Unlock()

			// Coba stealing dari P lain
			stolen := false
			for _, otherP := range m.pool {
				if otherP == m.p {
					continue
				}
				otherP.lock.Lock()
				if len(otherP.queue) > 0 {
					g = otherP.queue[0]
					otherP.queue = otherP.queue[1:]
					otherP.lock.Unlock()
					fmt.Printf("[M %d] mencuri G %d dari P %d\n", m.id, g.id, otherP.id)
					g.run()
					m.wg.Done()
					stolen = true
					break
				}
				otherP.lock.Unlock()
			}

			if !stolen {
				time.Sleep(10 * time.Millisecond)
			}
		}
	}
}

func main() {
	numP := 2
	numM := 2
	numG := 10

	ps := make([]*P, numP)
	for i := 0; i < numP; i++ {
		ps[i] = &P{
			id: i,
		}
	}

	// Sebarkan G ke P
	for i := 0; i < numG; i++ {
		p := ps[rand.Intn(numP)]
		p.lock.Lock()
		p.queue = append(p.queue, &G{id: i})
		p.lock.Unlock()
	}

	wg := &sync.WaitGroup{}
	wg.Add(numG)

	// Jalankan M, hubungkan ke P
	for i := 0; i < numM; i++ {
		m := &M{
			id:   i,
			p:    ps[i%numP],
			wg:   wg,
			pool: ps,
		}
		go m.run()
	}

	wg.Wait()
	fmt.Println("Semua G selesai diproses!")
}

📌 📌 Penjelasan Simulasi

Elemen

Peran

Task yang akan dijalankan

Queue lokal, punya lock

Goroutine OS, ambil G dari P

Stealing

Kalau queue P kosong, M mencuri dari P lain

✅ HASILNYA

Kamu akan melihat M mencuri G dari P lain.
Mirip cara kerja runtime Go sebenarnya.
Semua M akan aktif selama masih ada G.

🔑 Kesimpulan

✅ 1️⃣ Gunakan runtime untuk observasi dasar. ✅ 2️⃣ Lihat proc.go untuk struktur nyatanya. ✅ 3️⃣ Pahami bagaimana work-stealing & P-M-G saling terhubung.

Although both task stealing and continuation stealing have stalling joins, continuations happen
more often than tasks; therefore, the Go scheduling algorithm works with continuations rather
than tasks. The main disadvantage of continuation stealing is that it requires extra work from the
compiler of the programming language. Fortunately, Go provides that extra help and, therefore,
uses continuation stealing in its work-stealing algorithm. One of the benefits of continuation
stealing is that you get the same results when using function calls instead of goroutines or a
single thread with multiple goroutines. This makes perfect sense, as only one thing is executed
at any given point in both cases.

📌 1️⃣ Konteks Umum

Dalam parallel computing, work stealing artinya worker idle mencuri pekerjaan dari worker lain. Nah, di literatur teori, ada 2 pendekatan:

Task stealing
Continuation stealing

📌 2️⃣ Apa itu Task vs Continuation?

Istilah

Penjelasan

Task

Unit kerja yang berdiri sendiri. Contoh: fungsi yang harus dijalankan → doWork().

Continuation

Bukan tugas baru, tapi kelanjutan eksekusi yang belum selesai.

👉 Analogi:

Task → “Ini 1 dus berisi pekerjaan A. Ambil & kerjakan.”
Continuation → “Ini titik berhenti di pekerjaan A. Lanjutkan dari sini.”

📌 3️⃣ Task stealing → mencuri task

Worker idle akan mencuri task yang masih belum dikerjakan dari worker lain.
Contoh:
- Fork-Join → bikin task → simpan ke queue → worker lain bisa ambil.

Contoh di Go: kalau kamu go f(), maka f = task.

📌 4️⃣ Continuation stealing → mencuri kelanjutan eksekusi

Worker idle tidak mencuri task-nya, tapi mencuri kelanjutan eksekusi.
Ini muncul di bahasa functional dan runtime dengan banyak rekursi.
Biasanya: fungsi memanggil fungsi lain → akan ada kelanjutan. Nah, kelanjutannya itulah yang bisa dicuri.

Contoh:

func A() {
   B() // B jalan → A “pause” dulu
}

Continuation di sini: “apa yang harus A lakukan setelah B selesai”.

📌 5️⃣ Kenapa Go pakai Continuation stealing?

➡️ Goroutine di Go ringan & sangat rekursif → sering bikin kelanjutan dibanding bikin task baru. ➡️ Continuation muncul lebih sering → lebih murah & praktis mencuri kelanjutan eksekusi. ➡️ Compiler Go membantu split stack, pause/resume → runtime jadi bisa potong di titik tertentu.

📌 6️⃣ “Stalling joins” → apa maksudnya?

Kalau punya fork-join, contohnya:

// Fork
go A()
go B()

// Join

Ketika A & B selesai → ada titik “join” → harus nunggu → ini bisa bikin stall (idle). Baik task stealing maupun continuation stealing sama-sama kena join stall.

📌 7️⃣ Kutipan: “Continuations happen more often than tasks”

➡️ Artinya: di program nyata, fungsi memanggil fungsi lain → lebih sering terjadi daripada bikin task berdiri sendiri. ➡️ Jadi lebih efisien curi continuation daripada bikin task stealing murni.

📌 8️⃣ “Same results when using function calls instead of goroutines”

➡️ Karena scheduler Go bekerja di level continuation:

Panggilan fungsi biasa (f())
atau go f() ➡️ Keduanya bisa dipotong, dijadwal ulang, dicuri → sama saja buat scheduler.

📌 9️⃣ “Disadvantage: compiler must help”

Kenapa compiler harus bantu?

Karena continuation stealing butuh split stack, context switch point, yield.
Compiler Go menambahkan hook agar runtime bisa preempt goroutine → pause/resume.

📌 10️⃣ Kesimpulan kutipan

Bagian

Makna

Task stealing

Mencuri tugas utuh.

Continuation stealing

Mencuri kelanjutan eksekusi.

Go pakai continuation stealing

Lebih efisien di rekursif & goroutine ringan.

Compiler membantu

Split stack & titik preempt.

Hasil konsisten

Fungsi biasa & goroutine sama-sama dijadwal.

✅ Inti Singkat

🔑 Go runtime tidak cuma membagi goroutine → tapi juga memotong & memindahkan kelanjutan eksekusi. Inilah rahasia kenapa Go ringan, preemptive, & bisa work-stealing tanpa developer sadar.

🧩 Bagian 1: Entitas utama — M, P, G

“The Go scheduler works using three main kinds of entities: OS threads (M), which are related to the OS in use, goroutines (G), and logical processors (P).”

✅ Arti:

Scheduler Go bekerja dengan 3 komponen kunci:
- M = Machine → OS Thread nyata → thread yang dibuat & dikelola OS (Linux: pthread).
- G = Goroutine → unit kerja kecil → tugas program yang mau dijalankan.
- P = Processor → bukan CPU fisik, tapi slot logis → menyimpan antrian G & resource eksekusi.

💡 Inti: M = jalan, G = apa yang dijalankan, P = sumber daya + queue untuk memetakan G ke M.

🧩 Bagian 2: GOMAXPROCS

“The number of processors that can be used by a Go program is specified by the value of the GOMAXPROCS environment variable—at any given time, there are, at most, GOMAXPROCS processors.”

✅ Arti:

GOMAXPROCS = berapa banyak P aktif di runtime Go.
Default = jumlah CPU fisik (runtime.NumCPU()).
runtime.GOMAXPROCS(4) artinya: maksimal 4 P aktif → hanya 4 M yang bisa jalan paralel.

💡 Kenapa dibatasi? Supaya tidak membuat thread OS berlebihan, biar concurrency pas dengan hardware.

🧩 Bagian 3: m:n Scheduling

“Now, let us return to the m:n scheduling algorithm used in Go. Strictly speaking, at any time, you have m goroutines that are executed and, therefore, scheduled to run, on n OS threads using, at most, GOMAXPROCS number of logical processors.”

✅ Arti:

Go pakai m:n scheduling.
- m = jumlah Goroutine aktif → bisa jutaan.
- n = jumlah thread OS nyata → dibatasi jumlah P.
m:n artinya:
- Banyak G (m) di-multiplex ke Thread OS (n).
- P = slot eksekusi yang menentukan thread OS mana yang jalan.

🧩 Bagian 4: Status Goroutine

“Each goroutine can be in one of the following three stages: executing, runnable, or waiting. In the executing stage, the instructions of the goroutine are executed on an OS thread. In the runnable stage, the goroutine waits to be assigned to an OS thread for execution. Finally, in the waiting stage, the goroutine is blocked for some reason like waiting for a resource or a mutex to become available to go into one of the other two stages.”

✅ Arti: Setiap Goroutine punya status: 1️⃣ Executing → Sedang dijalankan di Thread OS (M) oleh P. 2️⃣ Runnable → Siap jalan, belum dapat M + P → antre di queue. 3️⃣ Waiting → Tunggu resource, misalnya:

Channel belum ada data.
Mutex dipegang goroutine lain.
I/O block.

➡️ Dari waiting → runnable kalau resource tersedia.

🧩 Bagian 5: Antrian Global & Lokal

“The following figure shows that there are two different kinds of queues—a global run queue and a local run queue—attached to each logical processor. Goroutines from the global queue are assigned to the queue of a logical processor in order to get executed at some point in the future.”

✅ Arti:

Global Run Queue:
- Semua G yang runnable tapi belum dapat P.
- Kalau P lokal kosong, P bisa ambil dari global.
Local Run Queue (di P):
- Setiap P punya queue sendiri.
- P bekerja dengan queue lokal dulu → lebih cepat.
- Kalau queue kosong → curi dari P lain (work stealing) → atau ambil global.

🧩 Bagian 6: Stealing

“Each logical processor can have multiple threads, and the stealing occurs between the local queues of the available logical processors.”

✅ Arti:

Stealing = work stealing.
P1 kosong → dia bisa curi G dari P2.
Ini bikin load balance → tidak ada P yang idle kalau P lain overload.

🧩 Bagian 7: Membuat OS Thread Baru

“Finally, keep in mind that the Go scheduler is allowed to create more OS threads when needed. OS threads are expensive in terms of resources and going from one status to another (context switching), which means that dealing too much with OS threads might slow down your Go applications.”

✅ Arti:

M (thread OS) bisa bertambah kalau:
- Ada G blocking syscall.
- P butuh M baru buat lanjut.
Tapi thread OS mahal:
- Context switch berat.
- Stack OS thread lebih besar.
Jadi Go runtime hemat M, makanya scheduler fokus multiplex G ke M.

🧩 Bagian 8: Inti `GOMAXPROCS`

“Next, we discuss the meaning and the use of GOMAXPROCS”

✅ Artinya: Selanjutnya, buku ini bakal bahas detail GOMAXPROCS:

Batas P aktif.
Bagaimana mempengaruhi concurrency.
Bagaimana mempengaruhi M & CPU usage.

🧑‍💻 Ringkas Visual

      +------------------+
      |    Global Queue  |   (Goroutine runnable cadangan)
      +------------------+
               |
               v
    +-------------------------+   +-------------------------+
    |         P1              |   |          P2             |
    |  Local Run Queue (G)    |   |   Local Run Queue (G)   |
    +-------------------------+   +-------------------------+
         |         |                   |          |
         v         v                   v          v
     OS Thread   OS Thread         OS Thread   OS Thread
      (M)          (M)               (M)         (M)

🔑 Inti Practical

✅ G = apa yang dijalankan ✅ M = siapa yang menjalankan (thread OS) ✅ P = slot queue & resource ✅ GOMAXPROCS = berapa banyak P aktif ✅ Work stealing = kalau P kosong → curi G di P lain ✅ Thread OS mahal → Go runtime hemat pakai M secukupnya.

✅ 💡 Gambaran Kasar

Eksekusi goroutine di Go runtime ≈ jalur begini:

Program membuat Goroutine (G)
        |
   Masuk Global Run Queue atau Local Run Queue (P)
        |
   P memilih M (thread OS)
        |
   M mengeksekusi G
        |
   G selesai → M ambil G lain → loop

📌 1️⃣ Kamu buat Goroutine

Contoh:

go myFunction()

➡️ Go runtime:

Buat G baru (struct g di runtime).
Simpan pointer myFunction + stack awal + state → status = runnable.

📌 2️⃣ Scheduler menaruh G ke Queue

➡️ Ada 2 kemungkinan:

Kalau P (Processor) lokal belum penuh, G masuk ke Local Run Queue milik P.
Kalau P lagi sibuk atau queue lokal penuh → G masuk Global Run Queue.

G -> Local Run Queue (P)
    atau
G -> Global Run Queue

Kenapa? Runtime lebih suka queue lokal (lebih cepat diakses → cache friendly). Kalau penuh, sebagian G dilempar ke global biar P lain bisa ambil.

📌 3️⃣ P yang aktif cari G

P punya queue lokal (runq).
Kalau ada G di queue → P suruh M jalankan.
Kalau queue lokal kosong:
- P cari G di global queue.
- Atau work steal dari P lain.

P: Ada kerjaan? 
   - Ya → eksekusi G.
   - Tidak → steal.

📌 4️⃣ P membutuhkan M untuk eksekusi

P tidak bisa eksekusi sendiri, dia hanya slot logis.
Butuh M (Machine): thread OS nyata.
Jadi:
- P “serahkan” G ke M.
- M jalankan G di OS thread.

➡️ 1 P → 1 M → jalankan 1 G (praktis).

📌 5️⃣ M mulai jalan

M memanggil fungsi G → instruksi jalan.
Stack pointer & context diatur.
Kalau G blocking (misal syscall):
- M bisa parkir G → cari G lain.
- Kalau M harus block, runtime bisa bikin M baru.

📌 6️⃣ G selesai

Kalau G selesai → status jadi dead → resource dikembalikan.
M minta G baru ke P.
Loop lagi ke queue lokal → global queue → work stealing → dst.

📌 7️⃣ GOMAXPROCS membatasi berapa banyak P aktif

Jumlah P = GOMAXPROCS.
Kalau kamu GOMAXPROCS = 4:
- Maksimum 4 P aktif.
- Artinya hanya 4 M bisa menjalankan G paralel.
- Sisanya ngantri → runnable.

✅ 🔗 Alur Ringkas

Tahap

Rincian

1. Buat G

go f() → runtime bikin G

2. Masuk Queue

G → Local P Queue → atau Global Queue

3. P aktif

P ambil G dari queue

4. M butuh P

P butuh M → M jalankan G

5. Eksekusi

M jalankan G di OS Thread

6. Selesai?

G selesai → M ambil G lain

7. P & GOMAXPROCS

Jumlah P dibatasi → thread OS hemat

✅ 🔍 Diagram Mermaid (Flowchart)

Ini diagram flowchart mermaid — kalau kamu pakai VSCode Extension atau https://mermaid.live bisa copy-paste ini:

flowchart TD
    A[Program membuat Goroutine] --> B{Queue Mana?}
    B --> |Local P Queue| C[Masuk Local Run Queue]
    B --> |Global Queue| D[Masuk Global Run Queue]
    C --> E[P aktif Ambil G]
    D --> E
    E --> F{P butuh M?}
    F --> |Ya| G[P pasang M aktif]
    G --> H[M jalankan G di OS Thread]
    H --> I{G selesai?}
    I --> |Ya| J[G Selesai, Resource Kembali]
    I --> |Block?| K[G Blocked, M cari G lain]
    J --> E
    K --> E

✅ 🔑 Kesimpulan

Elemen

Fungsi

Unit kerja

Slot logis + Queue G

Thread OS yang jalanin G

Queue Lokal

Utama

Queue Global

Cadangan, fallback

Work stealing

Kalau P kehabisan, curi dari P lain

GOMAXPROCS

Batasi P aktif → batasi concurrency nyata

Banyak orang paham M, P, G di Go separuh-separuh karena bagian “kenapa P butuh M” sering bikin bingung. Yuk, kita bongkar sejelas-jelasnya.

✅ 1️⃣ Kita ulang definisi dulu

Komponen

Arti

G (Goroutine)

Unit kerja.

P (Processor)

Logical Processor: Slot logis di Go runtime.

M (Machine)

Machine: Thread OS nyata (pthread).

📌 2️⃣ Fungsi P

P = Slot logis. Di dalam P:

Ada queue G (lokal).
Ada context eksekusi (GC, mem alloc, scheduler local data).

Tugas P:

Menyimpan & mengelola antrian G.
Membuat jalur “siapa next G yang akan dijalankan”.

TAPI: P tidak bisa mengeksekusi bytecode. P bukan CPU fisik — hanya manajemen.

📌 3️⃣ Fungsi M

M = Thread OS nyata → dialah yang benar-benar jalan di CPU.

M-lah yang:

Memuat stack G ke register CPU.
Menjalankan instruksi.
Berhubungan langsung dengan kernel (syscall, context switch).

➡️ Tanpa M, P tidak bisa bikin G berjalan di CPU.

📌 4️⃣ Kenapa P butuh M?

➡️ Eksekusi nyata = CPU → hanya thread OS (M) yang bisa bicara ke kernel + CPU.

➡️ P = logis. Jadi kalau P punya G, dia butuh mesin nyata (M) untuk mengeksekusi G di hardware.

Analogi sederhana:

🔧 P = Jalur produksi + tumpukan order 🏃 M = Pekerja pabrik 📦 G = Order kerja

P punya tumpukan order (queue G).
Tapi P tidak punya tangan & kaki → dia butuh pekerja nyata (M) untuk memproses barang (G).

📌 5️⃣ Kenapa M butuh P?

Kebalikannya:

M tidak bisa jalan sendirian.
M butuh P untuk:
- Dapat queue G.
- Dapat mem alloc data, runtime context.
- Tanpa P, M tidak tahu mau kerjakan apa.

➡️ Runtime Go: 1 M aktif = harus punya 1 P aktif.

📌 6️⃣ Kenapa Go mendesain begitu?

Kenapa tidak langsung G → M?

Karena:

Go mau control concurrency supaya sesuai jumlah core → GOMAXPROCS = jumlah P.
Kalau M bisa jalan tanpa P → runtime bisa membuat thread OS sebanyak G → boros resource.
P membatasi seberapa banyak eksekusi paralel nyata. Contoh:
- GOMAXPROCS = 4 → hanya 4 P → hanya 4 M bisa aktif paralel.
- Kalau G banyak, tetap antre di queue P → work stealing.

📌 7️⃣ Contoh Jalur

G (task) --> P (queue + slot) --> M (thread OS) --> CPU

✅ P:

Manage queue.
Manage context.
Batasi concurrency.

✅ M:

Jalankan G nyata di CPU.
Handle syscall.
Context switch ke kernel.

📌 8️⃣ Visual Sederhana

graph TD
  subgraph User Code
    A[go myFunc()]
  end

  A --> B[G]
  B --> C[Local Run Queue (P)]
  C --> D{M Ada?}
  D --> E[M jalankan G]
  E --> F[CPU Kerjakan Instruksi]

📌 9️⃣ Inti

Komponen

Tugas

“Jalur logis”, queue G, resource control.

“Eksekutor nyata”, thread OS → jalankan G di hardware.

Task yang diatur & dieksekusi.

P tidak bisa kerja tanpa M → M tidak punya kerja tanpa P.

✅ 🔑 Kesimpulan

P = logis, M = fisik. P = manajer shift. M = pekerja nyata. P atur tumpukan kerja & jalur. M eksekusi ke CPU.

Apa yang dimaksud “context switching” di Go runtime.

✅ 📌 1️⃣ Potongan yang kamu kutip

"If you decide to assign a value to GOMAXPROCS that is smaller than the number of cores in your machine, you might affect the performance of your program. However, using a GOMAXPROCS value that is larger than the number of available cores does not necessarily make your Go programs run faster due to the context switching of threads."

➡️ Artinya:

Kalau GOMAXPROCS kekecilan, kamu tidak memanfaatkan semua core → underutilized.
Kalau GOMAXPROCS kebanyakan, bukan berarti makin cepat → karena context switching overhead.

✅ 📌 2️⃣ Apa maksud “context switching”?

Context switching = proses CPU berpindah dari 1 thread ke thread lain.

🔑 Levelnya?

Level OS thread (M).
BUKAN di P → P itu hanya logical slot, bukan eksekusi nyata.
CPU hardware hanya tau M → karena M = thread nyata (pthread).

Jadi: ➡️ Context switch = CPU nyimpan state M1 → load state M2 → jalankan M2. ➡️ Kenapa mahal?

Register CPU harus disimpan & di-load lagi.
Kernel scheduler harus berpindah.
Cache CPU invalid → overhead.

✅ 📌 3️⃣ Apa kaitannya dengan `GOMAXPROCS`?

GOMAXPROCS = jumlah P aktif.
1 P = butuh 1 M aktif untuk mengeksekusi.
Jadi: GOMAXPROCS ~ jumlah M aktif.

Contoh:

Core CPU = 4 → idealnya GOMAXPROCS = 4.
Kalau GOMAXPROCS = 8:
- Ada 8 P.
- Runtime akan bikin hingga 8 M.
- Tapi hanya 4 core CPU → OS harus ganti-ganti thread (M) di core yang sama → context switching meningkat.

➡️ Hasilnya? Lebih banyak switch → lebih banyak overhead → program bisa lebih lambat daripada GOMAXPROCS = 4.

✅ 📌 4️⃣ Flow: Context Switch

Berikut flow sederhana kalau GOMAXPROCS lebih besar dari core:

sequenceDiagram
    participant P1 as P1
    participant M1 as M1 (Thread OS)
    participant M2 as M2 (Thread OS)
    participant CPU as CPU Core

    P1->>M1: M1 jalankan G di P1
    M1->>CPU: M1 jalan di Core
    Note over CPU: Kernel scheduler detect M2 ready
    CPU->>M1: Suspend M1 (save context)
    CPU->>M2: Load M2 (restore context)
    M2->>CPU: M2 jalan di Core

➡️ Perpindahan M1 → M2 = Context Switch ➡️ Ini diatur oleh Kernel, bukan runtime Go.

✅ 📌 5️⃣ Kenapa Context Switch Mahal?

Alasan

Dampak

Save/Restore Register CPU

Waktu eksekusi habis untuk nyimpan & load.

CPU Cache Thrashing

Data thread sebelumnya kebuang.

Overhead Kernel Scheduler

Kernel perlu ngatur prioritas thread.

➡️ Kalau terlalu sering → latency naik, throughput turun.

✅ 📌 6️⃣ Bagaimana cara Go mengurangi overhead ini?

1️⃣ Work stealing → bikin G tetap di P lokal → biar M tidak perlu switch ke P lain. 2️⃣ P membatasi concurrency → GOMAXPROCS = seberapa banyak M aktif. 3️⃣ Goroutine multiplexing → 1 M jalanin banyak G secara bergiliran, bukan bikin M banyak.

✅ 📌 7️⃣ Kesimpulan

Kasus

Konsekuensi

GOMAXPROCS < CPU Core

Tidak maksimal, CPU idle.

GOMAXPROCS = CPU Core

Ideal → M = core.

GOMAXPROCS > CPU Core

M > core → OS harus sering switch → overhead tinggi.

➡️ Context switch = level OS thread (M), BUKAN P. ➡️ P hanya queue + resource → eksekusi nyata di M, switch-nya di situ.

✅ 📌 8️⃣ Flow Ringkas

flowchart TD
  A[Program] --> B[Goroutine G]
  B --> C[P aktif]
  C --> D[M jalankan G]
  D --> E[CPU Core]
  E --> |Core Penuh| F[OS Kernel jadwalkan switch]
  F --> G[Context Switch: M pause]
  G --> H[M lain jalan di Core sama]

✅ 🔑 Inti

Context switching = di level M = thread OS
P hanya slot logis → tidak diswitch.
Kebanyakan M aktif > core = overhead context switch.

PreviousDynamic Dispatch NextLatency Numbers Every Programmer Should Know

Last updated 5 months ago

📌 2. Bagaimana di Golang?

📌 3. Ilustrasi Arsitektur:

📌 4. Contoh Kode Sederhana

📌 5. Inti Penting

🔑 Ringkas:

📌 Gambaran

📌 Contoh Kode: Work-Stealing Manual

📌 Cara Kerja

📌 Output

📌 Catatan Penting

📌 📚 1. Apa itu Go Runtime Scheduler?

📌 🧩 2. Komponen Utama: M, P, G

🔍 Rincian

✅ G (Goroutine)

✅ M (Machine)

✅ P (Processor)

📌 🔄 3. Cara Kerja M-P-G Bersama

📌 🧵 4. Kenapa Butuh P?

📌 🔀 5. Global Run Queue & Work-Stealing

📌 ⚡ 6. Preemption (Preemptive Scheduling)

📌 🏗️ 7. Contoh Diagram Sederhana

📌 🏆 8. Kenapa Desain Ini Hebat?

📌 📣 Kesimpulan

✅ 1️⃣ Kode tracing M, P, G (pakai runtime package)

✅ 2️⃣ Intip runtime/proc.go: Struktur M, P, G

✅ 3️⃣ Simulasi M, P, G pakai log custom

📌 📌 Penjelasan Simulasi

✅ HASILNYA

🔑 Kesimpulan

📌 1️⃣ Konteks Umum

📌 2️⃣ Apa itu Task vs Continuation?

📌 3️⃣ Task stealing → mencuri task

📌 4️⃣ Continuation stealing → mencuri kelanjutan eksekusi

📌 5️⃣ Kenapa Go pakai Continuation stealing?

📌 6️⃣ “Stalling joins” → apa maksudnya?

📌 7️⃣ Kutipan: “Continuations happen more often than tasks”

📌 8️⃣ “Same results when using function calls instead of goroutines”

📌 9️⃣ “Disadvantage: compiler must help”

📌 10️⃣ Kesimpulan kutipan

✅ Inti Singkat

🧩 Bagian 1: Entitas utama — M, P, G

🧩 Bagian 2: GOMAXPROCS

🧩 Bagian 3: m:n Scheduling

🧩 Bagian 4: Status Goroutine

🧩 Bagian 5: Antrian Global & Lokal

🧩 Bagian 6: Stealing

🧩 Bagian 7: Membuat OS Thread Baru

🧩 Bagian 8: Inti GOMAXPROCS

🧑‍💻 Ringkas Visual

🔑 Inti Practical

✅ 💡 Gambaran Kasar

📌 1️⃣ Kamu buat Goroutine

📌 2️⃣ Scheduler menaruh G ke Queue

📌 3️⃣ P yang aktif cari G

📌 4️⃣ P membutuhkan M untuk eksekusi

📌 5️⃣ M mulai jalan

📌 6️⃣ G selesai

📌 7️⃣ GOMAXPROCS membatasi berapa banyak P aktif

✅ 🔗 Alur Ringkas

✅ 🔍 Diagram Mermaid (Flowchart)

✅ 🔑 Kesimpulan

✅ 1️⃣ Kita ulang definisi dulu

📌 2️⃣ Fungsi P

📌 3️⃣ Fungsi M

📌 4️⃣ Kenapa P butuh M?

📌 5️⃣ Kenapa M butuh P?

📌 6️⃣ Kenapa Go mendesain begitu?

📌 7️⃣ Contoh Jalur

📌 8️⃣ Visual Sederhana

📌 9️⃣ Inti

✅ 🔑 Kesimpulan

✅ 📌 1️⃣ Potongan yang kamu kutip

✅ 📌 2️⃣ Apa maksud “context switching”?

✅ 📌 3️⃣ Apa kaitannya dengan GOMAXPROCS?

✅ 📌 4️⃣ Flow: Context Switch

✅ 📌 5️⃣ Kenapa Context Switch Mahal?

✅ 📌 6️⃣ Bagaimana cara Go mengurangi overhead ini?

✅ 📌 7️⃣ Kesimpulan

✅ 📌 8️⃣ Flow Ringkas

✅ 🔑 Inti

✅ 1️⃣ Kode tracing `M`, `P`, `G` (pakai `runtime` package)

✅ 2️⃣ Intip `runtime/proc.go`: Struktur M, P, G

🧩 Bagian 8: Inti `GOMAXPROCS`

✅ 📌 3️⃣ Apa kaitannya dengan `GOMAXPROCS`?