-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Value Handler in KV separation #199
Comments
This is really interesting! |
… will benifit from value handle bytedance#199
I have run readrandomwriterandom benchmark using db_bench to statistics the read blob info common='--benchmarks=readrandomwriterandom --use_terark_table=false --statistics=true --threads=20 --enable_lazy_compaction=false --num=167772160 --key_size=8 --value_size=1000 --duration=36000 --use_existing_db=true --histogram=true --cache_size=4294967296'
./db_bench $common --readwritepercent=10 --db=/data02/wangyi/11_4 after the bench run one hour, some key statistic info as below
read.blob_invalid represents the invalid count of the readrandomwriterandom workload query the blob file to fetch the value, but the file number is not the newest
invalid_file_cnt represents the invalid count of blob file stored in the SST's key-value which is not the newest
invalid_entry_cnt represents the invalid count of entry in the SST's key-value which is not the newest
so, most of the queries needn't query the blob file's index block |
from 2022/01/06-21:16:42.632655 to 2022/01/06-20:30:33.826999 the invalid_file_cnt form 405 decreases to 99 as the LSM-tree's compaction
So, the cost of the index block of blob files will be reduced to the invalid ratio of invalid_file_cnt. |
after the bench run 10 hours, we can see the blob info of the engine, the invalid_
the invalid_file_cnt ratio is increasing to 99.26% in the heavy write workload
the invalid_entry_cnt ratio is increasing to 13.59%, still considerd a low level |
I fixed the invalid_file_cnt statistic logic like that void VersionStorageInfo::CalculateBlobInfo() {
valid_file_cnt_ = 0;
valid_entry_cnt_ = 0;
invalid_file_cnt_ = 0;
invalid_entry_cnt_ = 0;
std::unordered_map<uint64_t, uint64_t> file_map;
std::unordered_set<uint64_t> invalid_file_set;
for (int i = 0; i < num_levels_; i++) {
for (auto& f : LevelFiles(i)) {
for (auto fn : f->prop.dependence) {
file_map[fn.file_number] += fn.entry_count;
}
}
}
for (auto f : file_map) {
if (dependence_map_.find(f.first)->second->fd.GetNumber() != f.first) {
invalid_file_set.insert(
dependence_map_.find(f.first)->second->fd.GetNumber());
invalid_entry_cnt_ += f.second;
} else {
valid_file_cnt_++;
valid_entry_cnt_ += f.second;
}
}
invalid_file_cnt_ = invalid_file_set.size();
} so, the invalid_fiule_cnt is the real blob file index block needs to be loaded.
so, we just need load the quarter of blob index block as before |
… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199
… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199
… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199
… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199 code format and add some assert
… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199 code format and add some assert code defend
I have benched blobDB to observe the entry size when building an SST file. {"file_number": 10, "file_size": 18351744,
"table_properties":
{"data_size": 18213147, "index_size": 200391, "raw_key_size": 35792208, "raw_average_key_size": 36, "raw_value_size": 8940580, "raw_average_value_size": 8, "num_data_blocks": 10266,
"num_entries": 994228, "compression": "Snappy"}} 18351744/994228 = 18.458 Byte / Key |
… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199 code format and add some assert code defend
… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199 code format and add some assert code defend
we should backward compatibility with historical data,such as the behavior of decode value_meta |
[Enhancement]
Problem
If we support the value handler in KV separation, the cost of the blob file's index block is expected to be reduced.
Solution
The text was updated successfully, but these errors were encountered: