how do I interpret the benchmarks in this crate? #115

Mart-Bogdan · 2023-04-29T20:40:16Z

Mart-Bogdan
Apr 29, 2023

It would be nice to mention benchmark results in reamed.

I've ran all benches, but looking on them didn't give me any understanding how fast this library is.

I look at results, like:

same/packed/teddy/threebytes/nomatch                                                    
same/packed/rabinkarp/threebytes/nomatch

And it blows my mind🤯.

I've looked at becnhmarks, looks like it all about measuring this library, perhaps to compare with previous build.

All this benches are internal?

I am interested to know how it compares to rust's String::contains, String::replace, if we are using
aho-corasick with single needle (I think would be slower), two needles, N needles, etc.

Perhaps it's two different issues:

Readme for ./benches
Add benchmark to compare with standard single-string operations from std

Basically when it is beneficial to use this lib, at what number of sub-strings?

Answered by BurntSushi

Apr 29, 2023

Yes, the benchmarks are internal. Publishing real benchmarks The Right Way is a ton of work. I'm about to do it for regex engines and it has taken me literal months of work. I have no plans of doing that for this crate.

Basically when it is beneficial to use this lib, at what number of sub-strings?

Someone else can't answer this for you. You should define a benchmark that models your work load. Then you should run them and make a decision based on that.

I am interested to know how it compares to rust's String::contains, String::replace, if we are using
aho-corasick with single needle (I think would be slower), two needles, N needles, etc.

In a default configuration, this crate will use …

View full answer

BurntSushi · 2023-04-29T21:00:05Z

BurntSushi
Apr 29, 2023
Maintainer

Yes, the benchmarks are internal. Publishing real benchmarks The Right Way is a ton of work. I'm about to do it for regex engines and it has taken me literal months of work. I have no plans of doing that for this crate.

Basically when it is beneficial to use this lib, at what number of sub-strings?

Someone else can't answer this for you. You should define a benchmark that models your work load. Then you should run them and make a decision based on that.

I am interested to know how it compares to rust's String::contains, String::replace, if we are using
aho-corasick with single needle (I think would be slower), two needles, N needles, etc.

In a default configuration, this crate will use memmem from the memchr crate if you only hand it a single pattern. That routine will blow the pants off of std's substring search. So that means that, even for a single pattern, you can use this crate just fine and it should be faster than what you can do with std.

This crate is used by the regex crate for literal optimizations. It tries to do the best it can with what you give it.

If you give it a "small" number of patterns and you're running on x86_64, then it will automatically try to use a vectorized multi-substring matcher that doesn't use the Aho-Corasick algorithm at all. It can be an order of magnitude faster than Aho-Corasick.

But I have to stress: the only way to answer your question is to define a benchmark that is meaningful to your use case.

0 replies

Mart-Bogdan · 2023-04-30T17:18:48Z

Mart-Bogdan
Apr 30, 2023
Author

In a default configuration, this crate will use memmem from the memchr crate if you only hand it a single pattern. That routine will blow the pants off of std's substring search. So that means that, even for a single pattern, you can use this crate just fine and it should be faster than what you can do with std.

Oh, wow. I thought STD version is highly SIMD optimized.

Am I understand correctly -- your library, in case if single pattern is povided is suing different implementation, than if we have many items, so instead of automation generation it is using memmem?

If you give it a "small" number of patterns and you're running on x86_64, then it will automatically try to use a vectorized multi-substring matcher that doesn't use the Aho-Corasick algorithm at all.

Oh I see. That's really cool. Thank you.

I'll try doing some benchmarks myself.

0 replies

BurntSushi · 2023-04-30T17:25:21Z

BurntSushi
Apr 30, 2023
Maintainer

Oh, wow. I thought STD version is highly SIMD optimized.

Yeah, it's not. We would probably like it to be, but substring search is in core and at least right now, we don't have a great way of using ISA extensions in core. I'm not sure anyone has any great ideas on how to make it work either. It's pretty unfortunate.

Am I understand correctly -- your library, in case if single pattern is povided is suing different implementation, than if we have many items, so instead of automation generation it is using memmem?

For a single pattern, yes. You should be able to confirm it by looking at a profile. If it isn't happening then it's probably a bug and I'd happily look into it if you can provide a reproduction.

And note that as I said above, even if you provide multiple patterns, an Aho-Corasick automaton still might not be used at all. It just depends. A lot of it is heuristics at this point.

(Note that an actual automaton will always be generated IIRC. It's just not always used for searches.)

I'll try doing some benchmarks myself.

Aye. Good. Let me know how it goes!

0 replies

Mart-Bogdan · 2023-04-30T19:36:17Z

Mart-Bogdan
Apr 30, 2023
Author

By the way in STD contains has SIMD optimisation, but only on level of sse2

https://github.com/rust-lang/rust/blob/a0daf22b95ff1cd3f7ac55ea9370987535f3134d/library/core/src/str/pattern.rs#L963-L968

I thought that ones to search index of occurrence has as well, but turns out that no, it doesn't. https://github.com/rust-lang/rust/blob/84dd17b56a931a631a23dfd5ef2018fd3ef49108/library/core/src/str/pattern.rs#L1062

0 replies

BurntSushi · 2023-04-30T20:55:53Z

BurntSushi
Apr 30, 2023
Maintainer

Ah yeah, forgive the senility. That ends up working because SSE2 isn't an ISA extension, it's part of x86_64.

0 replies

Mart-Bogdan · 2023-05-01T12:53:27Z

Mart-Bogdan
May 1, 2023
Author

Hmmmm, that's right. On each 64bit intel systems SSE2 would be present.

Strange that it checks both conditions: x86_64 AND SSE2 feature.

0 replies

BurntSushi · 2023-05-01T13:03:49Z

BurntSushi
May 1, 2023
Maintainer

Hmmmm, that's right. On each 64bit intel systems SSE2 would be present.

Strange that it checks both conditions: x86_64 AND SSE2 feature.

It might be because, technically, the OS can choose not to support SSE2 by not supporting calling conventions that use xmm registers. But that's just a guess. I don't know for sure.

0 replies

Mart-Bogdan · 2023-05-02T09:52:36Z

Mart-Bogdan
May 2, 2023
Author

Benchmarks are hard :D

I've made some draft benchmarks, but results are confusing.

But preliminary results are:

str::contains is faster on small haystack on x86

str::matches is slower, good to use aho-corasick even for single needle
str::replace is also slower even for single needle.

I think I'll try creating more organized results, more likely closer to end of a week.

P.S. criterion is weird, as it shows results with diferent scale, mixing ns and micro-s in same output. That caused me to belive that creating new instance of AchoCrasick inside loop is 2 times faster, but it was actually 500 times slower

0 replies

BurntSushi · 2023-05-02T10:45:24Z

BurntSushi
May 2, 2023
Maintainer

Why not share your code and inputs so that others can run it?

And yes, there is often a latency versus throughput trade off. See: https://docs.rs/memchr/latest/memchr/#why-use-this-crate

0 replies

BurntSushi · 2023-05-02T10:47:41Z

BurntSushi
May 2, 2023
Maintainer

An expectation that one implementation is always faster in every case is unrealistic. This is why I said you should vernal your specific use case. Then you can make a judgement call about whether the differences in, say, latency merit a more complicated implementation on your end. Or whether just using aho-corasick is good enough.

0 replies

Mart-Bogdan · 2023-05-02T11:02:25Z

Mart-Bogdan
May 2, 2023
Author

Why not share your code and inputs so that others can run it?

it's really dirty now :D

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how do I interpret the benchmarks in this crate? #115

{{title}}

Replies: 11 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

how do I interpret the benchmarks in this crate? #115

Mart-Bogdan Apr 29, 2023

Replies: 11 comments

BurntSushi Apr 29, 2023 Maintainer

Mart-Bogdan Apr 30, 2023 Author

BurntSushi Apr 30, 2023 Maintainer

Mart-Bogdan Apr 30, 2023 Author

BurntSushi Apr 30, 2023 Maintainer

Mart-Bogdan May 1, 2023 Author

BurntSushi May 1, 2023 Maintainer

Mart-Bogdan May 2, 2023 Author

BurntSushi May 2, 2023 Maintainer

BurntSushi May 2, 2023 Maintainer

Mart-Bogdan May 2, 2023 Author

Mart-Bogdan
Apr 29, 2023

BurntSushi
Apr 29, 2023
Maintainer

Mart-Bogdan
Apr 30, 2023
Author

BurntSushi
Apr 30, 2023
Maintainer

Mart-Bogdan
Apr 30, 2023
Author

BurntSushi
Apr 30, 2023
Maintainer

Mart-Bogdan
May 1, 2023
Author

BurntSushi
May 1, 2023
Maintainer

Mart-Bogdan
May 2, 2023
Author

BurntSushi
May 2, 2023
Maintainer

BurntSushi
May 2, 2023
Maintainer

Mart-Bogdan
May 2, 2023
Author