how do I interpret the benchmarks in this crate? #115
-
It would be nice to mention benchmark results in reamed. I've ran all benches, but looking on them didn't give me any understanding how fast this library is. I look at results, like:
And it blows my mind🤯. I've looked at becnhmarks, looks like it all about measuring this library, perhaps to compare with previous build. All this benches are internal? I am interested to know how it compares to rust's String::contains, String::replace, if we are using Perhaps it's two different issues:
Basically when it is beneficial to use this lib, at what number of sub-strings? |
Beta Was this translation helpful? Give feedback.
Replies: 11 comments
-
Yes, the benchmarks are internal. Publishing real benchmarks The Right Way is a ton of work. I'm about to do it for regex engines and it has taken me literal months of work. I have no plans of doing that for this crate.
Someone else can't answer this for you. You should define a benchmark that models your work load. Then you should run them and make a decision based on that.
In a default configuration, this crate will use This crate is used by the If you give it a "small" number of patterns and you're running on But I have to stress: the only way to answer your question is to define a benchmark that is meaningful to your use case. |
Beta Was this translation helpful? Give feedback.
-
Oh, wow. I thought STD version is highly SIMD optimized. Am I understand correctly -- your library, in case if single pattern is povided is suing different implementation, than if we have many items, so instead of automation generation it is using
Oh I see. That's really cool. Thank you. I'll try doing some benchmarks myself. |
Beta Was this translation helpful? Give feedback.
-
Yeah, it's not. We would probably like it to be, but substring search is in
For a single pattern, yes. You should be able to confirm it by looking at a profile. If it isn't happening then it's probably a bug and I'd happily look into it if you can provide a reproduction. And note that as I said above, even if you provide multiple patterns, an Aho-Corasick automaton still might not be used at all. It just depends. A lot of it is heuristics at this point. (Note that an actual automaton will always be generated IIRC. It's just not always used for searches.)
Aye. Good. Let me know how it goes! |
Beta Was this translation helpful? Give feedback.
-
By the way in STD I thought that ones to search index of occurrence has as well, but turns out that no, it doesn't. https://github.com/rust-lang/rust/blob/84dd17b56a931a631a23dfd5ef2018fd3ef49108/library/core/src/str/pattern.rs#L1062 |
Beta Was this translation helpful? Give feedback.
-
Ah yeah, forgive the senility. That ends up working because SSE2 isn't an ISA extension, it's part of |
Beta Was this translation helpful? Give feedback.
-
Hmmmm, that's right. On each 64bit intel systems SSE2 would be present. Strange that it checks both conditions: x86_64 AND SSE2 feature. |
Beta Was this translation helpful? Give feedback.
-
It might be because, technically, the OS can choose not to support SSE2 by not supporting calling conventions that use |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Why not share your code and inputs so that others can run it? And yes, there is often a latency versus throughput trade off. See: https://docs.rs/memchr/latest/memchr/#why-use-this-crate |
Beta Was this translation helpful? Give feedback.
-
An expectation that one implementation is always faster in every case is unrealistic. This is why I said you should vernal your specific use case. Then you can make a judgement call about whether the differences in, say, latency merit a more complicated implementation on your end. Or whether just using aho-corasick is good enough. |
Beta Was this translation helpful? Give feedback.
-
it's really dirty now :D |
Beta Was this translation helpful? Give feedback.
Yes, the benchmarks are internal. Publishing real benchmarks The Right Way is a ton of work. I'm about to do it for regex engines and it has taken me literal months of work. I have no plans of doing that for this crate.
Someone else can't answer this for you. You should define a benchmark that models your work load. Then you should run them and make a decision based on that.
In a default configuration, this crate will use
…