Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cockroachdb: add benchmarks #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

cockroachdb: add benchmarks #1

wants to merge 1 commit into from

Conversation

DarrylWong
Copy link
Owner

This change adds a benchmark that spins up a local CRDB cluster and runs the kv workload against it.

return gitShallowClone(
gcfg.SrcDir,
"https://github.com/cockroachdb/cockroach",
"release-24.1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe master is better, otherwise someone has to periodically update the next release branch?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but leaving it as master would mean that it might occasionally break outside of this benchmark's control. I see that the etcd picks a specific release and leaves a comment to try and keep it close to HEAD.

I changed it to v24.1.0-rc.1 instead for now, and maybe by the time this benchmark is ready we can switch to an actual dot release? But maybe final say is up to the Go team anyway, so I'm okay with whatever for now.

go.mod Outdated
@@ -14,6 +14,7 @@ require (
github.com/google/pprof v0.0.0-20211122183932-1daafda22083
github.com/kballard/go-shellquote v0.0.0-20180428030007-95032a82bc51
github.com/opencontainers/runtime-spec v1.0.2
github.com/pkg/errors v0.9.1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using anything specifically from this package or can we just use "errors"?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I was using it for errors.Wrap but doesn't seem worth importing it for that. Removed.


type CockroachDB struct{}

func (h CockroachDB) CheckPrerequisites() error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I assume these methods are part of some interface, can we add comments mentioning that?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Run dev doctor which will set up the environment for
// the dev tool to work properly.
cmd := exec.Command("./dev", "doctor")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be fragile, since doctor is trying to set up many things, including a local background cache. Can't we just runt the bazel command directly (I think bazel run //pkg/gen:code)?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was originally having some trouble with this, but thanks to Ricky we figured it out 😄 Now using bazel directly.

return err
}
defer bazelrc.Close()
setting := []byte("build --config=dev")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should be using dev; it looks like it configures --strip=never, which seems ok?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, without crossbuild we may run into problems on non-standard flavors of linux (or cpu architecture).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if your first comment is still relevant after removing all dev stuff. As for your second comment, I added a if runtime.GOARCH != "arm64" && runtime.GOARCH != "amd64" check in the prerequisites.

func (h CockroachDB) Build(cfg *common.Config, bcfg *common.BuildConfig) error {
// Build the cockroach binary.
// We do this by using the cockroach `dev` tool. The dev tool is a bazel
// wrapper normally used for building cockroach, but can also be to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "...but can also be used to generate artifacts for building with go build."

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cmd := exec.Command("./dev", "doctor")
cmd.Dir = bcfg.SrcDir
cmd.Env = env.Collapse()
log.TraceCommand(cmd, false)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we redirect stdout/stderr since the command may take a while and mislead the user, thinking that it's "hanging"?

cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for the bazel run commands.

// as the c-deps needed.
cmd = exec.Command("./dev", "gen", "go")
cmd.Dir = bcfg.SrcDir
cmd.Env = env.Collapse()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we redirect stdout/stderr since the command may take a while and mislead the user, thinking that it's "hanging"?

cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for the bazel run commands.

return err
}

// Clean up the bazel workspace. If we don't do this, our _bazel directory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in defer, otherwise return on L112 escapes.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


const (
// Arbitrarily chosen to match the cockroachdb default.
basePort = 26257

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this thing going to run in a container primarily? Otherwise, we may need to ensure this port is not taken.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The etcd benchmark also hardcodes a port, so I think it should be okay?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw that. Let's assume it's ok unless we're told otherwise :)

const (
// Arbitrarily chosen to match the cockroachdb default.
basePort = 26257
// The percentage of memory to allocate to the cache.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "...to allocate to the pebble cache."

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for i := 0; i < cfg.bench.nodeCount; i++ {
instances = append(instances, &cockroachdbInstance{
name: fmt.Sprintf("roach-node-%d", i+1),
sqlPort: basePort + 2*i,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above... let's double-check if [basePort, basePort+nodeCount*2) is available.

// The --ramp flag will take care of startup noise.
func waitForCluster(instances []*cockroachdbInstance, cfg *config) error {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't really need this defer since cancel doesn't escape.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure I follow 😓 Without the defer, wouldn't it just cancel the context right away? Or do you mean it'd be better to call cancel() in each exit path?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it just cancel the context right away

cancel is just a function pointer, so no cancellation is done until it's invoked, and the only place it's invoked is when a node successfully responds to a ping. But, you're absolutely right that we do need the defer. Without it, the "instance" goroutines will be leaked. That is, in the case when none of the nodes respond in time,
we read from time.After(time.Minute) and return an error, leaving the "instance" goroutines running.

}

// Clean up the cluster after we're done.
defer func() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methinks this defer should be moved up, otherwise return on L479 escapes; i.e., an error in waitForCluster would skip cleanup.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done.

// send it again will cause it to hang.
inst := instances[0]
if r := inst.shutdown(); r != nil {
if err == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err is provably nil here, otherwise above return would have escaped.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's wrapped in a defer call, so it should run even if the above return exits right? (err is the the named return value of the Run() function)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's wrapped in a defer call

Right. I guess I missed that since defer was in the wrong place :)


// Start the instances with `cockroach start`.
for n, inst := range instances {
allOtherInstances := append([]*cockroachdbInstance{}, instances[:n]...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: allOtherInstances := append(instances[:n:n], instances[n+1:]...) would be simpler.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about the a[low : high : max] expression! Done!

// The node will almost certainly not be ready right away, so wait
// 5 seconds first and between pings. 5 seconds was chosen through
// trial and error as a time that nodes are *usually* ready by.
time.Sleep(5 * time.Second)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of the time.Sleep in a default case, a case <-time.After(5 * time.Second) would allow this loop to react more quickly to context cancellation.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thanks for the tip 😄

"--ramp=15s",
"--duration=1m",
},
// Chosen through trial and error as the shortest time that doesn't

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results of short runs may be more stable if we pre-split and scatter the kv table before the test run.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! That might've helped? At the very least it didn't seem to make it worse so I'll keep it in there.

Question on the value to set --splits to though. I set it to 5 because that's how many ranges there were at the end without pre-splitting. Is that reasonable? Seems low to me but it shouldn't be much data?

return gitDeepClone(
gcfg.SrcDir,
"https://github.com/cockroachdb/cockroach",
"v24.1.0-rc.1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the version upgrade story with sweet? Will we periodically update this version and check whether any benchmark results change?

This change adds a benchmark that spins up a local CRDB cluster and runs the kv workload against it.
@srosenberg
Copy link

This is looking really nice! I'll take another pass by EOW.

Copy link

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think it's ready for a CL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants