Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syz-cluster/controller: TestProcessor times out #5776

Open
a-nogikh opened this issue Feb 13, 2025 · 4 comments
Open

syz-cluster/controller: TestProcessor times out #5776

a-nogikh opened this issue Feb 13, 2025 · 4 comments
Assignees
Labels

Comments

@a-nogikh
Copy link
Collaborator

This will likely be gone after the refactoring I'm going to push for review soon, but let's keep it in the issues for now.

https://github.com/google/syzkaller/actions/runs/13296004987/job/37128005699

2025-02-12T22:35:19.1020779Z panic: test timed out after 10m0s
2025-02-12T22:35:19.1021197Z 	running tests:
2025-02-12T22:35:19.1021543Z 		TestProcessor (9m59s)
2025-02-12T22:35:19.1021778Z 
2025-02-12T22:35:19.1021939Z goroutine 6743 [running]:
2025-02-12T22:35:19.1022336Z testing.(*M).startAlarm.func1()
2025-02-12T22:35:19.1023133Z 	/syzkaller/.cache/gomod/golang.org/[email protected]/src/testing/testing.go:2373 +0x385
2025-02-12T22:35:19.1023954Z created by time.goFunc
2025-02-12T22:35:19.1024656Z 	/syzkaller/.cache/gomod/golang.org/[email protected]/src/time/sleep.go:215 +0x2d
2025-02-12T22:35:19.1025307Z 
< ... >
2025-02-12T22:35:19.1035592Z main.main()
2025-02-12T22:35:19.1036027Z 	_testmain.go:63 +0x9b
2025-02-12T22:35:19.1036294Z 
2025-02-12T22:35:19.1036527Z goroutine 6 [select]:
2025-02-12T22:35:19.1037232Z go.opencensus.io/stats/view.(*worker).start(0xc000356580)
2025-02-12T22:35:19.1038156Z 	/syzkaller/.cache/gomod/[email protected]/stats/view/worker.go:292 +0x9f
2025-02-12T22:35:19.1039084Z created by go.opencensus.io/stats/view.init.0 in goroutine 1
2025-02-12T22:35:19.1039917Z 	/syzkaller/.cache/gomod/[email protected]/stats/view/worker.go:34 +0x8d
2025-02-12T22:35:19.1040470Z 
2025-02-12T22:35:19.1040631Z goroutine 517 [chan send, 9 minutes]:
2025-02-12T22:35:19.1041566Z github.com/google/syzkaller/syz-cluster/controller.TestProcessor(0xc000bf2b60)
2025-02-12T22:35:19.1042756Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor_test.go:73 +0x647
2025-02-12T22:35:19.1043708Z testing.tRunner(0xc000bf2b60, 0x23e2268)
2025-02-12T22:35:19.1044541Z 	/syzkaller/.cache/gomod/golang.org/[email protected]/src/testing/testing.go:1690 +0xf4
2025-02-12T22:35:19.1045384Z created by testing.(*T).Run in goroutine 1
2025-02-12T22:35:19.1046256Z 	/syzkaller/.cache/gomod/golang.org/[email protected]/src/testing/testing.go:1743 +0x390
@a-nogikh a-nogikh added the bug label Feb 13, 2025
@a-nogikh a-nogikh self-assigned this Feb 13, 2025
@a-nogikh
Copy link
Collaborator Author

No, it didn't go away: https://github.com/google/syzkaller/actions/runs/13396267430/job/37415981700?pr=5800

2025-02-18T17:27:17.1756018Z panic: test timed out after 10m0s
2025-02-18T17:27:17.1756481Z 	running tests:
2025-02-18T17:27:17.1756820Z 		TestProcessor (10m0s)
2025-02-18T17:27:17.1757071Z 
< ... >

2025-02-18T17:27:17.1777910Z goroutine 24 [chan send, 9 minutes]:
2025-02-18T17:27:17.1778684Z github.com/google/syzkaller/syz-cluster/controller.TestProcessor(0xc00019b520)
2025-02-18T17:27:17.1779857Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor_test.go:74 +0x618
2025-02-18T17:27:17.1780803Z testing.tRunner(0xc00019b520, 0x2506058)
2025-02-18T17:27:17.1781598Z 	/syzkaller/.cache/gomod/golang.org/[email protected]/src/testing/testing.go:1690 +0xf4
2025-02-18T17:27:17.1782444Z created by testing.(*T).Run in goroutine 1
2025-02-18T17:27:17.1783291Z 	/syzkaller/.cache/gomod/golang.org/[email protected]/src/testing/testing.go:1743 +0x390
2025-02-18T17:27:17.1784345Z 
< .. >

2025-02-18T17:27:17.2128068Z goroutine 311 [semacquire, 9 minutes]:
2025-02-18T17:27:17.2128510Z sync.runtime_Semacquire(0x447c9f?)
2025-02-18T17:27:17.2129271Z 	/syzkaller/.cache/gomod/golang.org/[email protected]/src/runtime/sema.go:71 +0x25
2025-02-18T17:27:17.2130069Z sync.(*WaitGroup).Wait(0x1da17d5?)
2025-02-18T17:27:17.2130839Z 	/syzkaller/.cache/gomod/golang.org/[email protected]/src/sync/waitgroup.go:118 +0x48
2025-02-18T17:27:17.2132083Z github.com/google/syzkaller/syz-cluster/controller.(*SeriesProcessor).Loop(0xc000982780, {0x2738038, 0xc00095adc0})
2025-02-18T17:27:17.2133703Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor.go:81 +0x397
2025-02-18T17:27:17.2135022Z github.com/google/syzkaller/syz-cluster/controller.TestProcessor.func2()
2025-02-18T17:27:17.2136145Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor_test.go:63 +0x28
2025-02-18T17:27:17.2137356Z created by github.com/google/syzkaller/syz-cluster/controller.TestProcessor in goroutine 24
2025-02-18T17:27:17.2138712Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor_test.go:62 +0x59b
< ... >


2025-02-18T17:27:17.2294695Z goroutine 121 [select]:
2025-02-18T17:27:17.2295640Z github.com/google/syzkaller/syz-cluster/controller.(*SeriesProcessor).streamSeries(0xc000982780, {0x2738038, 0xc00095adc0}, 0xc000ac93b0)
2025-02-18T17:27:17.2297115Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor.go:87 +0xe5
2025-02-18T17:27:17.2298491Z github.com/google/syzkaller/syz-cluster/controller.(*SeriesProcessor).Loop.func2()
2025-02-18T17:27:17.2299616Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor.go:78 +0x66
2025-02-18T17:27:17.2301045Z created by github.com/google/syzkaller/syz-cluster/controller.(*SeriesProcessor).Loop in goroutine 311
2025-02-18T17:27:17.2302504Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor.go:76 +0x375

< .. >


2025-02-18T17:27:17.5168876Z make[1]: *** [Makefile:394: test] Error 1
2025-02-18T17:27:17.5172290Z make: *** [Makefile:323: presubmit_build] Error 2
2025-02-18T17:27:17.5177228Z github.com/google/syzkaller/syz-cluster/controller.(*SeriesProcessor).seriesRunner(0xc000982780, {0x2738038, 0xc00095adc0}, 0xc000ac93b0)
2025-02-18T17:27:17.5179737Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor.go:116 +0x167
2025-02-18T17:27:17.5181107Z github.com/google/syzkaller/syz-cluster/controller.(*SeriesProcessor).Loop.func1()
2025-02-18T17:27:17.5182243Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor.go:63 +0x5d
2025-02-18T17:27:17.5183615Z created by github.com/google/syzkaller/syz-cluster/controller.(*SeriesProcessor).Loop in goroutine 311
2025-02-18T17:27:17.5184906Z 	/__w/syzkaller/syzkaller/gopath/src/github.com/google/syzkaller/syz-cluster/controller/processor.go:61 +0x15d
2025-02-18T17:27:17.5185949Z FAIL	github.com/google/syzkaller/syz-cluster/controller	600.075s

@a-nogikh
Copy link
Collaborator Author

The real problem is this line of the log

2025-02-18T17:27:17.1725312Z 2025/02/18 17:17:17 failed to mark session started: spanner: code = "Canceled", desc = "context canceled, transaction outcome unknown"

And this is weird -- if the context were indeed cancelled, we would not have Loop() and streamSeries() in the dump of the running goroutines -- we do monitor for ctx.Done() in them.

Is it something inside the Spanner library?

Full log:

2025-02-18T17:27:17.1711693Z 2025/02/18 17:17:17 queried 0 unfinished sessions
2025-02-18T17:27:17.1713221Z 2025/02/18 17:17:17 scheduled session "642e5bf5-4669-4a36-9f1c-c92916dc737d" for series "8d09acad-db85-4b25-be47-a9dd54e5f5c4"
2025-02-18T17:27:17.1714575Z 2025/02/18 17:17:17 started processing session "642e5bf5-4669-4a36-9f1c-c92916dc737d"
2025-02-18T17:27:17.1716203Z 2025/02/18 17:17:17 finished processing session "642e5bf5-4669-4a36-9f1c-c92916dc737d"
2025-02-18T17:27:17.1718398Z 2025/02/18 17:17:17 scheduled session "0f526300-0cdd-46e9-bfd7-8426f9ce3023" for series "6f4ac69b-3f75-4881-af2d-9c901dd54fa6"
2025-02-18T17:27:17.1720380Z 2025/02/18 17:17:17 started processing session "0f526300-0cdd-46e9-bfd7-8426f9ce3023"
2025-02-18T17:27:17.1721404Z 2025/02/18 17:17:17 finished processing session "0f526300-0cdd-46e9-bfd7-8426f9ce3023"
2025-02-18T17:27:17.1722822Z 2025/02/18 17:17:17 scheduled session "a4b0a1cd-0c78-4528-9b08-17cbfaf296f5" for series "6dad8188-76fe-43cc-a262-9f0bb12df892"
2025-02-18T17:27:17.1724210Z 2025/02/18 17:17:17 started processing session "a4b0a1cd-0c78-4528-9b08-17cbfaf296f5"
2025-02-18T17:27:17.1725312Z 2025/02/18 17:17:17 failed to mark session started: spanner: code = "Canceled", desc = "context canceled, transaction outcome unknown"
2025-02-18T17:27:17.1726619Z 2025/02/18 17:17:17 finished processing session "a4b0a1cd-0c78-4528-9b08-17cbfaf296f5"
2025-02-18T17:27:17.1727975Z 2025/02/18 17:17:17 queried 0 unfinished sessions
2025-02-18T17:27:17.1729000Z 2025/02/18 17:17:17 scheduled session "6c31b260-e70d-4c89-8327-82e1b351ee64" for series "57186afa-c8e3-49c6-a69d-81d3211cbaf2"
2025-02-18T17:27:17.1730193Z 2025/02/18 17:17:17 started processing session "6c31b260-e70d-4c89-8327-82e1b351ee64"
2025-02-18T17:27:17.1731664Z 2025/02/18 17:17:18 finished processing session "6c31b260-e70d-4c89-8327-82e1b351ee64"
2025-02-18T17:27:17.1733036Z 2025/02/18 17:17:18 scheduled session "cac63593-91bc-4eac-816b-630377fbbd7e" for series "a96b5df5-cae1-463a-8ade-e07113b06728"
2025-02-18T17:27:17.1734171Z 2025/02/18 17:17:18 started processing session "cac63593-91bc-4eac-816b-630377fbbd7e"
2025-02-18T17:27:17.1736309Z 2025/02/18 17:17:18 finished processing session "cac63593-91bc-4eac-816b-630377fbbd7e"
2025-02-18T17:27:17.1738935Z 2025/02/18 17:17:18 scheduled session "7ddcc19c-0479-45ad-a4d3-0a2a97b222d1" for series "60399446-30c2-4f10-8745-5df8c3074e4d"
2025-02-18T17:27:17.1740001Z 2025/02/18 17:17:18 started processing session "7ddcc19c-0479-45ad-a4d3-0a2a97b222d1"
2025-02-18T17:27:17.1740858Z 2025/02/18 17:17:18 finished processing session "7ddcc19c-0479-45ad-a4d3-0a2a97b222d1"
2025-02-18T17:27:17.1742230Z 2025/02/18 17:17:18 scheduled session "ff20b562-ee14-4328-8f0c-9517be745679" for series "e3f57ed3-05a2-49eb-9c15-ed5c923beb1c"
2025-02-18T17:27:17.1743285Z 2025/02/18 17:17:18 started processing session "ff20b562-ee14-4328-8f0c-9517be745679"
2025-02-18T17:27:17.1744196Z 2025/02/18 17:17:18 finished processing session "ff20b562-ee14-4328-8f0c-9517be745679"
2025-02-18T17:27:17.1745237Z 2025/02/18 17:17:18 scheduled session "4b298876-aebb-4860-acc9-ff88433264a8" for series "030a8f3e-4b54-4688-b8eb-94e74b3454b2"
2025-02-18T17:27:17.1746328Z 2025/02/18 17:17:18 started processing session "4b298876-aebb-4860-acc9-ff88433264a8"
2025-02-18T17:27:17.1747179Z 2025/02/18 17:17:18 finished processing session "4b298876-aebb-4860-acc9-ff88433264a8"
2025-02-18T17:27:17.1748671Z 2025/02/18 17:17:18 scheduled session "c3a851df-b71c-442e-bfb8-bfbfddb06508" for series "16f3bfe1-5014-4294-8e83-6fb24cc0db41"
2025-02-18T17:27:17.1750662Z 2025/02/18 17:17:18 started processing session "c3a851df-b71c-442e-bfb8-bfbfddb06508"
2025-02-18T17:27:17.1751694Z 2025/02/18 17:17:18 finished processing session "c3a851df-b71c-442e-bfb8-bfbfddb06508"
2025-02-18T17:27:17.1752800Z 2025/02/18 17:17:18 scheduled session "29ba6e25-53ce-4062-967b-22fea2b76120" for series "b7e65ed1-f5f4-4074-a2d6-8723702ba50a"
2025-02-18T17:27:17.1753961Z 2025/02/18 17:17:18 started processing session "29ba6e25-53ce-4062-967b-22fea2b76120"
2025-02-18T17:27:17.1754880Z 2025/02/18 17:17:18 finished processing session "29ba6e25-53ce-4062-967b-22fea2b76120"

@a-nogikh
Copy link
Collaborator Author

Ah, okay, that's likely an artifact of the previous context cancellation within TestProcessor.

Still, if we failed to start session, it must be among next Loop's sessionRepo.ListWaiting results.

if err := sp.sessionRepo.Start(ctx, session.ID); err != nil {
app.Errorf("failed to mark session started: %v", err)
break
}

If the session was actually marked as started, it would be among the ListRunning results. But it isn't:

2025-02-18T17:27:17.1727975Z 2025/02/18 17:17:17 queried 0 unfinished sessions

Was it already marked finished somehow?

a-nogikh added a commit to a-nogikh/syzkaller that referenced this issue Feb 19, 2025
That should hopefully shed more light on google#5776.
a-nogikh added a commit to a-nogikh/syzkaller that referenced this issue Feb 19, 2025
That should hopefully shed more light on google#5776.
github-merge-queue bot pushed a commit that referenced this issue Feb 19, 2025
That should hopefully shed more light on #5776.
@a-nogikh
Copy link
Collaborator Author

With more logs: https://github.com/google/syzkaller/actions/runs/13440550510/job/37553644015

2025-02-20T17:11:34.1744623Z 2025/02/20 17:01:34 queried 0 unfinished sessions
2025-02-20T17:11:34.1746098Z 2025/02/20 17:01:34 scheduled session "216cc8e9-fa32-49bb-9e5e-c6ceef157d11" for series "b901b15d-4078-4509-b03a-94d075309d3d"
2025-02-20T17:11:34.1747182Z 2025/02/20 17:01:34 started processing session "216cc8e9-fa32-49bb-9e5e-c6ceef157d11"
2025-02-20T17:11:34.1748431Z 2025/02/20 17:01:34 scheduling a workflow for "216cc8e9-fa32-49bb-9e5e-c6ceef157d11"
2025-02-20T17:11:34.1749597Z 2025/02/20 17:01:34 workflow for "216cc8e9-fa32-49bb-9e5e-c6ceef157d11" completed, mark the session finished
2025-02-20T17:11:34.1750871Z 2025/02/20 17:01:34 finished processing session "216cc8e9-fa32-49bb-9e5e-c6ceef157d11"
2025-02-20T17:11:34.1752094Z 2025/02/20 17:01:34 scheduled session "f7cfbcd7-0418-4857-87f2-d7c386e2b8d8" for series "7215a2f2-6ee4-49f7-b139-ce4296f11fba"
2025-02-20T17:11:34.1754011Z 2025/02/20 17:01:34 started processing session "f7cfbcd7-0418-4857-87f2-d7c386e2b8d8"
2025-02-20T17:11:34.1755069Z 2025/02/20 17:01:34 scheduling a workflow for "f7cfbcd7-0418-4857-87f2-d7c386e2b8d8"
2025-02-20T17:11:34.1756046Z 2025/02/20 17:01:34 workflow for "f7cfbcd7-0418-4857-87f2-d7c386e2b8d8" completed, mark the session finished
2025-02-20T17:11:34.1756978Z 2025/02/20 17:01:34 finished processing session "f7cfbcd7-0418-4857-87f2-d7c386e2b8d8"
2025-02-20T17:11:34.1758008Z 2025/02/20 17:01:34 scheduled session "3be7fe2b-3301-4cbf-b275-0da22701aeaf" for series "11528906-79c2-4fd6-887a-250fef76391d"
2025-02-20T17:11:34.1759035Z 2025/02/20 17:01:34 started processing session "3be7fe2b-3301-4cbf-b275-0da22701aeaf"
2025-02-20T17:11:34.1760070Z 2025/02/20 17:01:34 scheduling a workflow for "3be7fe2b-3301-4cbf-b275-0da22701aeaf"
2025-02-20T17:11:34.1761097Z 2025/02/20 17:01:34 failed to mark session started: spanner: code = "Canceled", desc = "context canceled, transaction outcome unknown"
2025-02-20T17:11:34.1762327Z 2025/02/20 17:01:34 finished processing session "3be7fe2b-3301-4cbf-b275-0da22701aeaf"
2025-02-20T17:11:34.1763458Z 2025/02/20 17:01:34 queried 0 unfinished sessions
2

The answer to

Was it already marked finished somehow?

is no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant