Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worklist for nfloat #2003

Open
fredrik-johansson opened this issue May 27, 2024 · 3 comments
Open

Worklist for nfloat #2003

fredrik-johansson opened this issue May 27, 2024 · 3 comments

Comments

@fredrik-johansson
Copy link
Collaborator

fredrik-johansson commented May 27, 2024

Before I forget anything

  • fma and fmma that do the right thing when one term is smaller
  • polynomial multiplication
    • block classical, karatsuba, waksman and maybe strassen with fixed-point arithmetic
    • block multimodular
  • all _ui, _si, _fmpz variants
  • when the precision is more than a few limbs, have multiplication inspect the limbs to see if there are many trailing zeros -> strip off and do a normal mul (see fredrik-johansson@2b2c8a3)
  • vec_neg
  • any other important missing vec functions
  • inv, div, sqrt, rsqrt
  • transcendental functions
  • micro-optimization: consider changing the exponent range and redefining the exponent of zero so that one can check x*y == 0 with EXP(x)+EXP(y) < MIN_EXP (saves branches detecting zeros in multiplication and in dot products)
@albinahlback
Copy link
Collaborator

albinahlback commented May 27, 2024

I will just state it here as well: For inverses, I believe for small $n$ the fastest method is via Newton iteration (and based off of GMP, I suppose this extends to all numbers). Hardcoded routines for inverses could be implemented for limb counts that are powers of two, just generalizing mpn_invert_limb.

@fredrik-johansson
Copy link
Collaborator Author

There is also the basecase algorithm used by mpfr_divhigh_n_basecase and the variant described here: https://inria.hal.science/hal-04557431v1/document

@albinahlback
Copy link
Collaborator

There is also the basecase algorithm used by mpfr_divhigh_n_basecase and the variant described here: https://inria.hal.science/hal-04557431v1/document

I saw that one. I'm wondering how a fast mpn_invert joined with Granlund-Möller 2n-by-n division algorithm would compare to Sukop's and Zimmermann's new algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants