optimize vector component access and use GLM_ASSERT_LENGTH in dual_quaternion #1308

Tr1NgleDev · 2024-07-15T20:13:12Z

the title says it all
the explanation for the optimization:

accessing using a pointer is faster than a switch/case (which is also clearly visible if you ever try to RE some code that uses that operator[])
it's a single line
this was technically already done in type_quat so i don't understand why it wasn't done in type_vecs

also dual_quaternion operator[] for some reason didn't use the GLM_ASSERT_LENGTH macro, so i fixed that.

Mashpoe · 2024-07-15T20:42:31Z

I'm the developer of the hit game 4D Miner (which uses GLM). The vector operator[] calls are getting inlined as switch statements, which aren't getting optimized. There is a lot of code in the game that deals with component indices, so there will probably be a considerable speedup for certain functions if this is merged, and some functions will actually be inlined just because of the changes in this pr.

ZXShady · 2024-08-07T20:15:23Z

but this techincally has UB though because the standard allows padding of members of same data type although no implementation does

Mashpoe · 2024-08-07T20:44:06Z

but this techincally has UB though because the standard allows padding of members of same data type although no implementation does

I would argue that the consistency and performance improvements far outweigh the risks of UB here then. If this can't be merged, then this optimization should probably be removed from type_quat. I was originally against this for the same reason, but I was unable to find a better solution, and then I learned that the same thing has been done elsewhere in the project for at least 8 years without causing any issues.

If any major compiler ever breaks this, preprocessor checks can be added in the future for that compiler version, or this optimization can be removed altogether. For now at least, considering the large performance gains which have already been demonstrated for a real-world use case, and that this optimization has already been used elsewhere in the project for years, I see no reason this shouldn't be merged.

ZXShady · 2024-08-08T00:01:00Z

we already have UB because of union type punning but all compilers define it like C behavior and all compiler defines your issue's behavior and even if not we can add a fallback so this is worth it to implement, I agree with you .

sad that speezing out performance is technically "ub"

ZXShady · 2024-08-13T15:13:59Z

to note the thing we lose here is constexpr evaluation for any index but 0. but I would say the performance benefits is worth the change

Mashpoe · 2024-08-13T18:45:29Z

I would agree. From my experience, the vast majority of use cases for component indexing are when you have to determine which component you are accessing at runtime. Constexpr component indexing seems like a very niche use case that could be done manually with a switch statement if it is ever needed.

ZXShady · 2024-09-19T12:34:43Z

@Mashpoe this actually has a huge impact on other constexpr functions like all of glm matrix multiplications so alot of code needs to be rewritten

template<typename T, qualifier Q>
GLM_FUNC_QUALIFIER GLM_CONSTEXPR mat<4, 2, T, Q> operator*(mat<2, 2, T, Q> const& m1, mat<4, 2, T, Q> const& m2)
{
	return mat<4, 2, T, Q>(
		m1[0][0] * m2[0][0] + m1[1][0] * m2[0][1],
		m1[0][1] * m2[0][0] + m1[1][1] * m2[0][1],
		m1[0][0] * m2[1][0] + m1[1][0] * m2[1][1],
		m1[0][1] * m2[1][0] + m1[1][1] * m2[1][1],
		m1[0][0] * m2[2][0] + m1[1][0] * m2[2][1],
		m1[0][1] * m2[2][0] + m1[1][1] * m2[2][1],
		m1[0][0] * m2[3][0] + m1[1][0] * m2[3][1],
		m1[0][1] * m2[3][0] + m1[1][1] * m2[3][1]);
}

this function is no longer constexpr and needs to be rewritten in terms of accessing members

Mashpoe · 2024-09-20T08:59:30Z

@ZXShady we have tested that exact function in GCC, Clang, and MSVC, and constexpr worked for us in all cases. We even tried -Wall -Wextra and -Werror in Clang and had no issues. We have written a version that checks #if __cplusplus >= 202002L, and if so, provides a separate version of operator[]() that reverts to the old code if std::is_constant_evaluated() returns true (this feature requires C++20). This would solve any issues that this PR could possibly introduce, but it would mean that the optimization wouldn't work without C++20 support. It's also probably worth noting that type_quat already uses the exact same optimization in its operator[](), which is also already constexpr.

Could you possibly tell us which compiler you are using, as well as any compiler flags that might affect the outcome? Thanks.

ZXShady · 2024-09-20T09:35:19Z

@Mashpoe

constexpr glm::mat2 a(
	1, 0,
	0, 4
);
constexpr glm::mat2 b(
	2, 0,
	0, 2
);
static_assert((a * b).operator[](0).operator[](0) == 2, "");

compiler flags

clang++ main.cpp -I../glm -DOPTIMIZED_ACCESS_OPERATOR=0

if I set OPTIMIZED_ACCESS_OPERATOR to 1 it causes a compiler error

../glm\glm\./ext\../detail\type_mat2x2.inl:456:37: note: read of dereferenced one-past-the-end pointer is not allowed in
      a constant expression
  456 |                         m1[0][0] * m2[0][0] + m1[1][0] * m2[0][1],
      |                                                          ^
main.cpp:12:17: note: in call to 'operator*<float, glm::packed_highp>(a, b)'
   12 |         static_assert((a * b).operator[](0).operator[](0) == 2, "");

also your change will make it so almost all matrix multiplication functions are C++20 constexpr instead of C++14 (C++11 is possible too but it isnt currently supported) only which is not nice and a big API change.

we could apply this speed up in C++98 and C++03 and C++11 and C++20 and above versions but not C++14 and C++17 to not break API. or to apply this to every C++ version you will have to update all matrix multiplication operations and such to directly access the internal vector members instead of using operator[] and this doesn't even fix half the issue, there are many other functions thare use indices as generic way of iterating over different types of vec1,2,3,4 and that will cause them to not be constexpr

It's also probably worth noting that type_quat already uses the exact same optimization in its operator, which is also already constexpr.

this is different the operator[]() is not properly constexpr but all other functions related to quat are constexpr since they access the members directly
and quat doesn't have other member lengths so iterating over it is not as necessary as vectors

sharkautarch · 2024-10-01T20:33:57Z

Here’s a little godbolt link to an example of how to achieve something similar, while still being able to use constexpr, including for c++ versions before c++20:
https://godbolt.org/z/eofaM9Yh4
Note that this would require rewriting all vec constructors, plus some other constexpr bits that use .x/.y/.z/.w would have to change to use indexing in order for them to stay constexpr, because constant evaluated functions can only access the currently-active member of a union.

ZXShady · 2024-10-03T01:50:09Z

@sharkautarch

Here’s a little godbolt link to an example of how to achieve something similar, while still being able to use constexpr, including for c++ versions before c++20: https://godbolt.org/z/eofaM9Yh4 Note that this would require rewriting all vec constructors, plus some other constexpr bits that use .x/.y/.z/.w would have to change to use indexing in order for them to stay constexpr, because constant evaluated functions can only access the currently-active member of a union.

your code doesn't work if you change the active member of a union

constexpr int f() {
vec v{0};
v.x = 1;
return v[0]; // reads from data[0] which is not the active member no longer constexpr
}

static_assert(f() == 1); // not valid constant expression

the simplest solution is

adding another glm function extention called glm::fastIndexer that takes a vector and indexes into it but it isn't constexpr until C++20 (I prefer this option)
do as I said and use #if statements to control the body of the function like this so it only applies it to C++ versions 98,03,11 and 20,23,26

but doesn't apply it to 14 and 17 to not break API

#if defined __cpp_constexpr  && __cpp_constexpr >= 201301L
constexpr
#endif
int operator[](int i) 
#if CXX14 || CXX17
#if CXX20
   if (std::is_constant_evaluted()) {
#endif
      switch(i) {
          case 0: return x;
          case 1: return y;
          case 2: return z;
       }
#if CXX20
   } else {
#endif 
#else
   return (&x)[i];
#endif // CXX14 || CXX17

#if CXX20
   }
#endif 

#endif

bit the above still doesn't work because of vec memberd r,g,b,a and t,u,p,q changing the active member will break the constexpr here so I think the firdt option is the best.

actually this problem is too complicated because of the aliasing memebrs r,g,b,a and others.

look at this code for example if your change was implemented

constexpe bool f()
{
glm::vec3 v = {1,2,3};
v.x = 5;
return glm::all(glm::equal(v,{5,2,3}));
}
static_assert(f()); // non constexlr function
// rrading from inactive member `data`

I specifically choose yo use glm::equal instead of == to demonstrate that your solution can break with the glm algorithms that use indexing as a way to iterate they all will be no longer constexpr.

sharkautarch · 2024-10-03T16:29:36Z

@ZXShady
OK, I think this will work better:
https://godbolt.org/z/j1r81sE9s
haven't tested it against all of your examples, but I think it will work with all of your examples

I guess this would break reference-indexing in constexpr contexts, not sure how tolerable that is, because I'd imagine most uses of operator[] in constexpr contexts are by-value...

glm/detail/type_vec2.inl

Tr1NgleDev added 2 commits July 16, 2024 03:58

optimize vector component access

f6230f8

use GLM_ASSERT_LENGTH in dual_quaternion component access

642057b

fix constexpr ig

ce1448a

Tr1NgleDev force-pushed the master branch from 6c913ba to ce1448a Compare November 1, 2024 12:37

ZXShady reviewed Nov 1, 2024

View reviewed changes

glm/detail/type_vec2.inl Outdated Show resolved Hide resolved

make a bit more readable

9097b14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize vector component access and use GLM_ASSERT_LENGTH in dual_quaternion #1308

optimize vector component access and use GLM_ASSERT_LENGTH in dual_quaternion #1308

Tr1NgleDev commented Jul 15, 2024

Mashpoe commented Jul 15, 2024 •

edited

Loading

ZXShady commented Aug 7, 2024

Mashpoe commented Aug 7, 2024

ZXShady commented Aug 8, 2024 •

edited

Loading

ZXShady commented Aug 13, 2024

Mashpoe commented Aug 13, 2024

ZXShady commented Sep 19, 2024

Mashpoe commented Sep 20, 2024

ZXShady commented Sep 20, 2024 •

edited

Loading

sharkautarch commented Oct 1, 2024 •

edited

Loading

ZXShady commented Oct 3, 2024 •

edited

Loading

sharkautarch commented Oct 3, 2024 •

edited

Loading

optimize vector component access and use GLM_ASSERT_LENGTH in dual_quaternion #1308

Are you sure you want to change the base?

optimize vector component access and use GLM_ASSERT_LENGTH in dual_quaternion #1308

Conversation

Tr1NgleDev commented Jul 15, 2024

Mashpoe commented Jul 15, 2024 • edited Loading

ZXShady commented Aug 7, 2024

Mashpoe commented Aug 7, 2024

ZXShady commented Aug 8, 2024 • edited Loading

ZXShady commented Aug 13, 2024

Mashpoe commented Aug 13, 2024

ZXShady commented Sep 19, 2024

Mashpoe commented Sep 20, 2024

ZXShady commented Sep 20, 2024 • edited Loading

sharkautarch commented Oct 1, 2024 • edited Loading

ZXShady commented Oct 3, 2024 • edited Loading

sharkautarch commented Oct 3, 2024 • edited Loading

Mashpoe commented Jul 15, 2024 •

edited

Loading

ZXShady commented Aug 8, 2024 •

edited

Loading

ZXShady commented Sep 20, 2024 •

edited

Loading

sharkautarch commented Oct 1, 2024 •

edited

Loading

ZXShady commented Oct 3, 2024 •

edited

Loading

sharkautarch commented Oct 3, 2024 •

edited

Loading