-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cockroachdb: add benchmarks #1
base: master
Are you sure you want to change the base?
Conversation
sweet/harnesses/cockroachdb.go
Outdated
return gitShallowClone( | ||
gcfg.SrcDir, | ||
"https://github.com/cockroachdb/cockroach", | ||
"release-24.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe master
is better, otherwise someone has to periodically update the next release branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but leaving it as master would mean that it might occasionally break outside of this benchmark's control. I see that the etcd
picks a specific release and leaves a comment to try and keep it close to HEAD.
I changed it to v24.1.0-rc.1
instead for now, and maybe by the time this benchmark is ready we can switch to an actual dot release? But maybe final say is up to the Go team anyway, so I'm okay with whatever for now.
go.mod
Outdated
@@ -14,6 +14,7 @@ require ( | |||
github.com/google/pprof v0.0.0-20211122183932-1daafda22083 | |||
github.com/kballard/go-shellquote v0.0.0-20180428030007-95032a82bc51 | |||
github.com/opencontainers/runtime-spec v1.0.2 | |||
github.com/pkg/errors v0.9.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using anything specifically from this package or can we just use "errors"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I was using it for errors.Wrap
but doesn't seem worth importing it for that. Removed.
|
||
type CockroachDB struct{} | ||
|
||
func (h CockroachDB) CheckPrerequisites() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I assume these methods are part of some interface, can we add comments mentioning that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sweet/harnesses/cockroachdb.go
Outdated
|
||
// Run dev doctor which will set up the environment for | ||
// the dev tool to work properly. | ||
cmd := exec.Command("./dev", "doctor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be fragile, since doctor
is trying to set up many things, including a local background cache. Can't we just runt the bazel command directly (I think bazel run //pkg/gen:code
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was originally having some trouble with this, but thanks to Ricky we figured it out 😄 Now using bazel
directly.
sweet/harnesses/cockroachdb.go
Outdated
return err | ||
} | ||
defer bazelrc.Close() | ||
setting := []byte("build --config=dev") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should be using dev
; it looks like it configures --strip=never
, which seems ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, without crossbuild
we may run into problems on non-standard flavors of linux (or cpu architecture).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if your first comment is still relevant after removing all dev
stuff. As for your second comment, I added a if runtime.GOARCH != "arm64" && runtime.GOARCH != "amd64"
check in the prerequisites.
sweet/harnesses/cockroachdb.go
Outdated
func (h CockroachDB) Build(cfg *common.Config, bcfg *common.BuildConfig) error { | ||
// Build the cockroach binary. | ||
// We do this by using the cockroach `dev` tool. The dev tool is a bazel | ||
// wrapper normally used for building cockroach, but can also be to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "...but can also be used to generate artifacts for building with go build
."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sweet/harnesses/cockroachdb.go
Outdated
cmd := exec.Command("./dev", "doctor") | ||
cmd.Dir = bcfg.SrcDir | ||
cmd.Env = env.Collapse() | ||
log.TraceCommand(cmd, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we redirect stdout/stderr
since the command may take a while and mislead the user, thinking that it's "hanging"?
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for the bazel run commands.
// as the c-deps needed. | ||
cmd = exec.Command("./dev", "gen", "go") | ||
cmd.Dir = bcfg.SrcDir | ||
cmd.Env = env.Collapse() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we redirect stdout/stderr
since the command may take a while and mislead the user, thinking that it's "hanging"?
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for the bazel run commands.
sweet/harnesses/cockroachdb.go
Outdated
return err | ||
} | ||
|
||
// Clean up the bazel workspace. If we don't do this, our _bazel directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in defer
, otherwise return on L112 escapes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
const ( | ||
// Arbitrarily chosen to match the cockroachdb default. | ||
basePort = 26257 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this thing going to run in a container primarily? Otherwise, we may need to ensure this port is not taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The etcd benchmark also hardcodes a port, so I think it should be okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw that. Let's assume it's ok unless we're told otherwise :)
sweet/benchmarks/cockroachdb/main.go
Outdated
const ( | ||
// Arbitrarily chosen to match the cockroachdb default. | ||
basePort = 26257 | ||
// The percentage of memory to allocate to the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "...to allocate to the pebble cache."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for i := 0; i < cfg.bench.nodeCount; i++ { | ||
instances = append(instances, &cockroachdbInstance{ | ||
name: fmt.Sprintf("roach-node-%d", i+1), | ||
sqlPort: basePort + 2*i, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above... let's double-check if [basePort, basePort+nodeCount*2) is available.
// The --ramp flag will take care of startup noise. | ||
func waitForCluster(instances []*cockroachdbInstance, cfg *config) error { | ||
ctx, cancel := context.WithCancel(context.Background()) | ||
defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: don't really need this defer
since cancel
doesn't escape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure I follow 😓 Without the defer
, wouldn't it just cancel the context right away? Or do you mean it'd be better to call cancel()
in each exit path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it just cancel the context right away
cancel
is just a function pointer, so no cancellation is done until it's invoked, and the only place it's invoked is when a node successfully responds to a ping. But, you're absolutely right that we do need the defer
. Without it, the "instance" goroutines will be leaked. That is, in the case when none of the nodes respond in time,
we read from time.After(time.Minute)
and return an error, leaving the "instance" goroutines running.
} | ||
|
||
// Clean up the cluster after we're done. | ||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Methinks this defer
should be moved up, otherwise return on L479 escapes; i.e., an error in waitForCluster
would skip cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, done.
// send it again will cause it to hang. | ||
inst := instances[0] | ||
if r := inst.shutdown(); r != nil { | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err
is provably nil
here, otherwise above return
would have escaped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's wrapped in a defer
call, so it should run even if the above return exits right? (err is the the named return value of the Run()
function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's wrapped in a defer call
Right. I guess I missed that since defer
was in the wrong place :)
sweet/benchmarks/cockroachdb/main.go
Outdated
|
||
// Start the instances with `cockroach start`. | ||
for n, inst := range instances { | ||
allOtherInstances := append([]*cockroachdbInstance{}, instances[:n]...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: allOtherInstances := append(instances[:n:n], instances[n+1:]...)
would be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about the a[low : high : max]
expression! Done!
sweet/benchmarks/cockroachdb/main.go
Outdated
// The node will almost certainly not be ready right away, so wait | ||
// 5 seconds first and between pings. 5 seconds was chosen through | ||
// trial and error as a time that nodes are *usually* ready by. | ||
time.Sleep(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of the time.Sleep
in a default
case, a case <-time.After(5 * time.Second)
would allow this loop to react more quickly to context cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks for the tip 😄
"--ramp=15s", | ||
"--duration=1m", | ||
}, | ||
// Chosen through trial and error as the shortest time that doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results of short runs may be more stable if we pre-split and scatter the kv table before the test run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! That might've helped? At the very least it didn't seem to make it worse so I'll keep it in there.
Question on the value to set --splits
to though. I set it to 5
because that's how many ranges there were at the end without pre-splitting. Is that reasonable? Seems low to me but it shouldn't be much data?
return gitDeepClone( | ||
gcfg.SrcDir, | ||
"https://github.com/cockroachdb/cockroach", | ||
"v24.1.0-rc.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the version upgrade story with sweet
? Will we periodically update this version and check whether any benchmark results change?
This change adds a benchmark that spins up a local CRDB cluster and runs the kv workload against it.
This is looking really nice! I'll take another pass by EOW. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think it's ready for a CL.
This change adds a benchmark that spins up a local CRDB cluster and runs the kv workload against it.