-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating fast_float with Valkey to replace strtod invocation #1260
base: unstable
Are you sure you want to change the base?
Conversation
I suppose I don't really understand the benefit of a submodule vs inlining. Since we aren't tightly controlling the versioning between the main repo and releases, it becomes much harder to know if there is a security issue impacting a specific release. At the very least we should be pinning a specific version of fast_float that we are pulling. |
We track fast_float commit id in the valkey repository with this change - see the copy pasted change from my commit below. For every valkey commit, we can look up exact version of fast_float at all times using this method. Pulling new fast_float version is as simple as git pull on the submodule. Submodule fast_float added at e800ca |
Does it get pulled automatically as part of the release into the release artifacts? |
There is a "git submodule update --init" command in Makefile to initialize it automatically. So yes, It will automatically checkout the same commit every time during build. |
So we are adding a new dependency to the release process, since you need to be able to fetch the code from github. I think we should consider figuring out how to pull the code in when we do a release so that folks don't need to do |
The other approach would be to just check-in the whole git repo as a folder under valkey, which should work. What is the issue with git dependency in github release workflows? I can put a post-checkout hook that always initializes modules on checkout, as long as checkout happens outside of the release process. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1260 +/- ##
============================================
- Coverage 70.70% 70.68% -0.02%
============================================
Files 114 114
Lines 63152 63165 +13
============================================
- Hits 44650 44647 -3
- Misses 18502 18518 +16
|
I am dropping the submodule idea for now as it requires a larger discussion about release process. I don't have access to @swaingotnochill's repository or PR. Therefore, I pulled his commit into this CR to maintain his author-ship on the code he wrote. I integrated it with Redis and fixed the Makefile. It should build now and will be ready to push. I will work on benchmarking this separately. I can also turn off fast_float by default if folks have concerns and test it on my branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to modify. I am on a vacation so it will be difficult for me to work until I am back. Cheers
@zuiderkwast or @madolson any pointers on how to solve almalinux issue with missing g++?
|
I'm skeptical to submodules too. Offline builds get complicated. Source releases get complicated. So far, we've used either vendored dependencies or system installed ones (like OpenSSL). I prefer that we vendor this one. We could copy only one or a few files, as we've done with crc64 and some other small libraries? |
…or replacing strtod Signed-off-by: Parth Patel <[email protected]>
* Simplified the interface to remove if branches. * Simplified Makefile to be more readable. * Integrating fast_float with the redis code base in resp_parser.c file. Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
… issues. Signed-off-by: Parth Patel <[email protected]>
…d packages. Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
…fy it explicitly in Makefile to fix 32-bit compilation issues. Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
…i386 compilation Signed-off-by: Parth Patel <[email protected]>
…compilations Signed-off-by: Parth Patel <[email protected]>
Does this make sense? I can help with Makefile if you need. |
Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
We will do it with opt-in, and then at a later point decide if we want to make it opt-out. At least for now we are okay with taking the C++ dependency. @soloestoy @PingXie Please comment if you don't want to take the C++ dependency. |
|
I wouldn't want an extra function call if it affects performance. Performance is the point of this PR after all. :) This is statically linked and we use link-time-optimization by default (-flto and -O3), which can inline across compilation units, so I guess it can be inlined? Another option is to make a
We discussed this in a core team meeting today and made a decision to use OFF by default, so that's already decided by a majority of the core team. Off by default is because we'll release it in a minor versions. (Later, we might decide change the default when we bumb the major version, like Valkey 9.0.) |
Signed-off-by: Parth Patel <[email protected]>
Evaluating this further, I can't make fast_float_strtod a wrapper coz then it will require C++ compilation always. if we create another wrapper on top of fast_float_strtod, that's just too many wrappers. I recommend making fast_float_strtod the default wrapper once USE_FAST_FLOAT is defaulted to yes and c++ is required for compilation always. I am keeping existing code and enabling -flto, if compiler sees it fit then it will optimize the code to be inline.
On the topic of moving the fast_float_strtod.cpp to src/, the src/Makefile has become a monolith. From my experience so far with this Makefile, I am now strongly against this. We should move towards smaller build units. If the intention is to separate fast_float library from integration code, then we can create two separate folders under deps - one for library and one for integration with its Makefiles. We can also move integration code folder outside of deps, if it does not fit there. But let's not merge it into src/Makefile.
that said, this can still happen if we can inline string2d. I am still inclined to just keep the code as it is without relying too much on compile inlining things. |
Signed-off-by: Parth Patel <[email protected]>
deps/fast_float/Makefile
Outdated
# | ||
# Copyright (c) 2024, Valkey contributors | ||
# All rights reserved. | ||
# | ||
# Redistribution and use in source and binary forms, with or without | ||
# modification, are permitted provided that the following conditions are met: | ||
# | ||
# * Redistributions of source code must retain the above copyright notice, | ||
# this list of conditions and the following disclaimer. | ||
# * Redistributions in binary form must reproduce the above copyright | ||
# notice, this list of conditions and the following disclaimer in the | ||
# documentation and/or other materials provided with the distribution. | ||
# * Neither the name of Redis nor the names of its contributors may be used | ||
# to endorse or promote products derived from this software without | ||
# specific prior written permission. | ||
# | ||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE | ||
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | ||
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | ||
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | ||
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | ||
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
# POSSIBILITY OF SUCH DAMAGE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only need the copyright information in the actual code files, so you can drop it here and in the header.
You won't be able to see the performance of a single branchless jump instruction since it will be prefetched into the instruction cache. |
src/util.c
Outdated
#ifdef USE_FAST_FLOAT | ||
#include "../deps/fast_float/fast_float_strtod.h" | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#ifdef USE_FAST_FLOAT | |
#include "../deps/fast_float/fast_float_strtod.h" | |
#endif | |
#ifdef USE_FAST_FLOAT | |
#include "../deps/fast_float/fast_float_strtod.h" | |
static inline double valkey_strtod(const char *str, char **endptr) { | |
errno = 0; | |
return fast_float_strtod(str, endptr); | |
} | |
#else | |
static inline double valkey_strtod(const char *str, char **endptr) { | |
return strtod(str, endptr); | |
} | |
#endif |
This cuts down the usage of ifdefs to a single location in this file. This can even be just in the header if we really want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorporated this into the new structure. Changed the interface though.
const char* fast_float_strtod(const char *str, double *value) | ||
{ | ||
double temp = 0; | ||
auto answer = fast_float::from_chars(str, str + strlen(str), temp); | ||
*value = temp; | ||
if (answer.ec != std::errc()) { | ||
errno = (answer.ec == std::errc::result_out_of_range) ? ERANGE : EINVAL; | ||
} | ||
return answer.ptr; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const char* fast_float_strtod(const char *str, double *value) | |
{ | |
double temp = 0; | |
auto answer = fast_float::from_chars(str, str + strlen(str), temp); | |
*value = temp; | |
if (answer.ec != std::errc()) { | |
errno = (answer.ec == std::errc::result_out_of_range) ? ERANGE : EINVAL; | |
} | |
return answer.ptr; | |
} | |
double fast_float_strtod(const char *str, const char *endptr) | |
{ | |
double temp = 0; | |
auto answer = fast_float::from_chars(str, str + strlen(str), temp); | |
*endptr = answer.ptr; | |
if (answer.ec != std::errc()) { | |
errno = (answer.ec == std::errc::result_out_of_range) ? ERANGE : EINVAL; | |
} | |
return temp; | |
} |
Is there a reason we aren't emulating strtod here? It seems odd we are just returning the parameters slightly differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one less if/else, as endptr can be null. This approach removes that if branch. It also removes the requirement to pass NULL since endptr is optional parameter.
deps/fast_float/test_fast_float.c
Outdated
*/ | ||
|
||
|
||
#include "fast_float_strtod.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should go in src/unit, where the other unit tests are going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/unit is broken - #1290
deps/fast_float/Makefile
Outdated
rm -f *.o || true; | ||
rm test || true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we always succeed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No point failing clean especially if the build had failed previously and "test" file was never generated.
deps/fast_float/Makefile
Outdated
|
||
# We don't need explicit target for fast_float_strtod.o as Makefile will use | ||
# CXXFLAGS and CXX variables to auto-generate it for us. | ||
all: fast_float_strtod.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to build an archive here instead of pulling the C++ toolchain into the main makefile. You can see deps/hdr_histogram as an example. That will produce a bunch of symbols that we can that compile against. As we talked about, linker time optimizations still can happen with an archive so it won't affect performance but it'll keep the C++ code more self-contained here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the new structure. fast_float integration is now moved under src/fast_float. Main makefile remains unpolluted except it will need libc++ to compile against even for gcc, because the c++ compiled library uses it.
deps/fast_float/fast_float_strtod.h
Outdated
/** | ||
* @brief Converts a null-terminated byte string to a double using the fast_float library. | ||
* | ||
* This function provides a C-compatible wrapper around the fast_float library's string-to-double | ||
* conversion functionality. It aims to offer a faster alternative to the standard strtod function. | ||
* | ||
* @param nptr A pointer to the null-terminated byte string to be converted. | ||
* @param value A pointer to the double variable where the function stores converted double value. | ||
* On success, the function stores the converted double value. On failure, it stores | ||
* 0.0 and stores error code in errno to ERANGE or EINVAL. | ||
* | ||
* @return On success, returns char pointer pointing to '\0' at the end of the string. | ||
* On failure, returns char pointer pointing to first invalid character in the string. | ||
* | ||
* @note This function uses the fast_float library (https://github.com/fastfloat/fast_float) | ||
* for the actual conversion, which can be significantly faster than standard library functions. | ||
* | ||
* @see https://github.com/fastfloat/fast_float for more information on the underlying library. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | |
* @brief Converts a null-terminated byte string to a double using the fast_float library. | |
* | |
* This function provides a C-compatible wrapper around the fast_float library's string-to-double | |
* conversion functionality. It aims to offer a faster alternative to the standard strtod function. | |
* | |
* @param nptr A pointer to the null-terminated byte string to be converted. | |
* @param value A pointer to the double variable where the function stores converted double value. | |
* On success, the function stores the converted double value. On failure, it stores | |
* 0.0 and stores error code in errno to ERANGE or EINVAL. | |
* | |
* @return On success, returns char pointer pointing to '\0' at the end of the string. | |
* On failure, returns char pointer pointing to first invalid character in the string. | |
* | |
* @note This function uses the fast_float library (https://github.com/fastfloat/fast_float) | |
* for the actual conversion, which can be significantly faster than standard library functions. | |
* | |
* @see https://github.com/fastfloat/fast_float for more information on the underlying library. | |
*/ | |
/** | |
* Converts a null-terminated byte string to a double using the fast_float library. | |
* | |
* This function provides a C-compatible wrapper around the fast_float library's string-to-double | |
* conversion functionality. It aims to offer a faster alternative to the standard strtod function. | |
* | |
* nptr A pointer to the null-terminated byte string to be converted. | |
* value A pointer to the double variable where the function stores converted double value. | |
* On success, the function stores the converted double value. On failure, it stores | |
* 0.0 and stores error code in errno to ERANGE or EINVAL. | |
* | |
* On success, returns char pointer pointing to '\0' at the end of the string. | |
* On failure, returns char pointer pointing to first invalid character in the string. | |
* | |
* This function uses the fast_float library (https://github.com/fastfloat/fast_float) | |
* for the actual conversion, which can be significantly faster than standard library functions. | |
*/ |
We don't use doxygen, so I think the tags don't make a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still makes the documentation readable. I generally prefer it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't introduce new style in this 15 years old code base please. Consistency is more important.
Signed-off-by: Parth Patel <[email protected]>
…static inline function as wrapper. Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
… and without fast_float on 32-bit as well as 64-bit systems. Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth Patel <[email protected]>
This build should succeed now. Here is a summary of changes from today:
I verified that this compiles for Mac and Rhel for both USE_FAST_FLOAT=yes and USE_FAST_FLOAT=no. Used "make test" to also include tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress.
On the topic of moving the fast_float_strtod.cpp to src/, the src/Makefile has become a monolith. From my experience so far with this Makefile, I am now strongly against this. We should move towards smaller build units.
This is another topic. We appear to be moving towards CMake in the long term. Let's not "solve" this now. Keep the monolith and flat directory structure.
The Makefile monolith provides a common ground for optimization flags, and debug flags and other stuff. I'm posting make code in the comments below that you can use. (Personally, I don't think this file is messy at all, given all it does. I guess you need to understand make well to like it.)
Unit test still resides in this same directory because src/unit build is broken.
What do you mean broken? They're fairly new and they work well in the CI. Please explain what problems you encountered so we can help you.
# * Redistributions in binary form must reproduce the above copyright | ||
# notice, this list of conditions and the following disclaimer in the | ||
# documentation and/or other materials provided with the distribution. | ||
# * Neither the name of Redis nor the names of its contributors may be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not Redis. For new files, we don't use this license text with "Redis" in the 3rd clause.
Legally, we don't need the full text in each file. Let's use this very short version:
# Copyright (c) 2024, Valkey contributors
# All rights reserved.
# SPDX-License-Identifier: BSD 3-Clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think we should add it to makefiles, I think just the source files is enough.
fast_float/fast_float_strtod.o: | ||
cd fast_float && $(MAKE) CFLAGS=" $(CFLAGS) " LDFLAGS=" $(LDFLAGS) " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default make target is the first target in the file. Here, you're adding this target above the all
target, so when I type make
, it only builds this file.
$ make
CC Makefile.dep
cd fast_float && make CFLAGS=" -funwind-tables " LDFLAGS=" "
make[1]: Entering directory '/home/viktor/repos/valkey/src/fast_float'
g++ -std=c++11 -O3 -fPIC -D FASTFLOAT_ALLOWS_LEADING_PLUS -funwind-tables -c fast_float_strtod.cpp -o fast_float_strtod.o
make[1]: Leaving directory '/home/viktor/repos/valkey/src/fast_float'
This needs to be somewhere below all:
.
I tried changing this to get rid of the separate Makefile under src/fast_float/ and to use the same optimization and debug flags as for the rest of Valkey, because it can be configured for debugging, etc.
I just added this code below the rules for %.o:
and unit/%.o:
ifeq ($(USE_FAST_FLOAT),yes)
CXX ?= c++
CXXFLAGS += -Wall -W $(OPT) $(DEBUG)
fast_float/fast_float_strtod.o: fast_float/fast_float_strtod.cpp ../deps/fast_float/fast_float.h
$(QUIET_CXX)$(CXX) $(CXXFLAGS) $(CFLAGS) $(SERVER_CFLAGS) -D FASTFLOAT_ALLOWS_LEADING_PLUS -c $< -o $@
endif
The QUIET_CXX
variable is just for nice output, because we have something similar for CC. The definition next to QUIET_CC:
ifndef V
QUIET_CC = @printf ' %b %b\n' $(CCCOLOR)CC$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
+QUIET_CXX = @printf ' %b %b\n' $(CCCOLOR)C++$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
QUIET_GEN = @printf ' %b %b\n' $(CCCOLOR)GEN$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
QUIET_LINK = @printf ' %b %b\n' $(LINKCOLOR)LINK$(ENDCOLOR) $(BINCOLOR)$@$(ENDCOLOR) 1>&2;
With these changes, I don't see the need for the separate sub directory src/fast_float/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default make target is the first target in the file. Here, you're adding this target above the all target, so when I type make, it only builds this file.
Good catch. I was too busy testing "make test" across all platforms to focus on normal make. Fixed.
Working on the Makefile changes, but I am against a single monolith Makefile. Experienced valkey developers may be able to make out what's happening there but new developers will have significant friction. If everyone else is in agreement, I will push that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this file src/valkey_strtod.h
, named after the function it provides, rather than what it uses behind the scenes. Fast float is just one of the two implementations it can use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, strtod has the signature double strtod(const char *str, char **endptr);
, I think we should ideally conform so that it's more normal to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, strtod has the signature double strtod(const char *str, char **endptr);, I think we should ideally conform so that it's more normal to follow.
I changed the signature to remove some if checks. endptr can be null. Some of the invocations pass null and others don't. I think this is a minor change.
Let's call this file src/valkey_strtod.h,
Done.
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: Parth <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: Parth <[email protected]>
This is an alternative to this PR request by @swaingotnochill : #1170
This allows fast_float to remain a submodule that we can pull from for updates.
---- Update after the discussion
I am dropping the submodule idea for now as it requires a larger discussion about release process. I don't have access to @swaingotnochill's repository or PR. Therefore, I pulled his commit into this CR to maintain his author-ship on the code he wrote. I integrated it with Redis and fixed the Makefile. It should build now and will be ready to push.
I will work on benchmarking this separately. I can also turn off fast_float by default if folks have concerns and test it on my branch.