-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage engine benchmarks #3
Comments
Guarantees+1 to documenting what guarantees are provided. My favorite one is: what are the durability guarantees for writes? In particular, we should ensure apples to apples comparison with regards to when data is fsynced to disk. Some do it before the write is acknowledged, some do it periodically. Factors to Consider for Benchmarks
|
I'd love to see how the SQLite B-tree that FDB is currently using stacks up against them as well. |
@spullara Link? Is there a Go wrapper for it currently? |
The current selection of engines looks good; all of them claim to have ACID txn support. I assume this will be a requirement for any other tested engines as well. @danchia Nice list.
|
I've posted the hardware specs of the benchmark server. @hyc I think most benchmarks should be durable because that's the mode most people are using these. Let's start identifying some benchmark variations. How about we begin with 4 variations and expand from there by adding variations to them.
|
I'm thinking a common key type that databases use are UUID's. Some people use optimized UUID's that are 12 bytes but probably the most common are 16 byte UUID's. Some people use sortable UUID's some people don't. I was thinking of using these lua scripts to create https://gist.github.com/catwell/1e022833ae849180adf58d72245ce8e0 |
@danchia Yeah I plan to have 2 different tools for benchmarks. |
A typical time series workload (at least in my experience) has lots of frequent random writes and occasional large scan reads. The records would be like (series ID, timestamp) => (value) The first version is optimized for reads of a few series for some time range. The second is better for reads of lots/all series for some time range. The first also has more random writes. Might be good to try both. |
Oh, and it would be great to see how throughput and latency changes with various numbers of concurrent readers and writers. |
Great idea. Any news? |
We should create some independent benchmarks for storage engines. We can get input from the various storage engine authors to make sure we are creating apples to apples comparisons and using them with the right settings but we get to make sure nothing unfair happens.
We can compare Go native and CGO based storage engines.
Setup
Requirements
Storage engines
Others?
Architecture
I propose an architecture where each storage engine implements a standard HTTP API implemented in Go. This ensures the workload client code is in 1 standard place and utilizes excellent HTTP based reporting tools.
The HTTP benchmark tool will be implemented using
wrk
andwrk2
and their LUA scripting capabilities.I have a minimal HTTP server that can currently handle 15M ops/sec at 3% CPU utilization on the server that can handle 1.5GB/sec of requests.
Included in the benchmarks will be 2 baseline results so that we can measure the HTTP overhead and the
cgo
overhead.Concerns
People may argue that C libraries like LMDB and RocksDB have a disadvantage due to the CGO overhead but I suspect in long term tests this won't be as visible. A lot of infrastructure is being written in go and we want to benchmark how these behave.
By benchmarking baselines of HTTP and HTTP+CGO we can visualize or subtract the overhead in the results.
Proposed benchmark variations
Each benchmark variation we come up with there will be a
short
andlong (72 hours?)
version.TODO: Define variations with the community.
The text was updated successfully, but these errors were encountered: