go-fuzz: Worker.crasherQueue can grow without bounds #303

rogpeppe · 2020-09-28T07:46:45Z

I started investigating this because my go-fuzz process was OOM-killed within 30 minutes two times in a row.

After spending a while puzzling over the heap profile results (before I found the MemProfileRate=0 assignment), I acquired a reasonable profile and a few logs of what happened when it started using lots of memory.

The process in question does crash a lot (a restart rate of over 1/50), which is obviously a big contributor to the problem, but I believe that go-fuzz should continue without using arbitrary memory even in that situation.

Here's a screenshot of the heap profile from one such run (unfortunately I lost the profile from that run), where over 2GB of memory is kept around in Worker.crasherQueue:

Although code inspection pointed towards crasherQueue as a possible culprit, I wasn't entirely sure that's what was happening until I reproduced the issue with a log statement added that showed the current size of the queue (including its associated data) whenever the queue slice is grown.

The final line that it printed before I dumped the heap profile was:

crasherQueue 0xc0000ca380 len 37171; space 465993686 (data 430941433; error 26019700; suppression 9032553)

That 466MB was 65% of the total current heap size of 713MB. In previous runs, I observed the total alloc size to rise to more than 8GB, although I wasn't able to obtain a heap profile at that time.

This problem does not always happen! It seems to depend very much on the current workload. It seems like it might be starvation problem, because only one of the worker queues grows in this way.

Here's the whole log printed by that run: https://gist.github.com/rogpeppe/ad97d2c83834c24b0777a4009d71d120

The crasherQueue log lines were produced by this patch to the Worker.noteCrasher method:

+++ b/go-fuzz/worker.go
@@ -628,6 +628,15 @@ func (w *Worker) noteCrasher(data, output []byte, hanged bool) {
 	if _, ok := ro.suppressions[hash(supp)]; ok {
 		return
 	}
+	if len(w.crasherQueue) == cap(w.crasherQueue) {
+		totalData, totalError, totalSuppression := 0, 0, 0
+		for _, a := range w.crasherQueue {
+			totalData += len(a.Data)
+			totalError += len(a.Error)
+			totalSuppression += len(a.Suppression)
+		}
+		log.Printf("crasherQueue %p len %d; space %d (data %d; error %d; suppression %d)", w, len(w.crasherQueue), totalData+totalError+totalSuppression, totalData, totalError, totalSuppression)
+	}
 	w.crasherQueue = append(w.crasherQueue, NewCrasherArgs{
 		Data:        makeCopy(data),
 		Error:       output,

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

go-fuzz: Worker.crasherQueue can grow without bounds #303

go-fuzz: Worker.crasherQueue can grow without bounds #303

rogpeppe commented Sep 28, 2020

go-fuzz: Worker.crasherQueue can grow without bounds #303

go-fuzz: Worker.crasherQueue can grow without bounds #303

Comments

rogpeppe commented Sep 28, 2020