Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpl, par: Moves random indexing to uniform_int_distribution #6649

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

QuantamHD
Copy link
Collaborator

@QuantamHD QuantamHD commented Feb 6, 2025

uniform_real has a bug in it where it can sometimes return 1.0 for a range when it's supposed to return [0,1.0). Lesson is don't use real for this purpose use int distribution.

See the compiler bugs listed on this page for more info. https://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution

uniform_real has a bug in it where it can sometimes return 1.0
for a range when it's supposed to return [0,1.0). Lesson is
don't use real for this purpose use int distribution.

See the compiler bugs listed on this page for more info.
https://en.cppreference.com/w/cpp/numeric/random/uniform_int_distribution

Signed-off-by: Ethan Mahintorabi <[email protected]>
Copy link
Contributor

github-actions bot commented Feb 6, 2025

clang-tidy review says "All clean, LGTM! 👍"

@maliberty
Copy link
Member

You need to update the unit test. I'll start a secure CI

@AcKoucher
Copy link
Contributor

Thanks @QuantamHD

Signed-off-by: Ethan Mahintorabi <[email protected]>
Signed-off-by: Ethan Mahintorabi <[email protected]>
Copy link
Contributor

github-actions bot commented Feb 6, 2025

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

github-actions bot commented Feb 6, 2025

clang-tidy review says "All clean, LGTM! 👍"

@QuantamHD
Copy link
Collaborator Author

@maliberty Something seems weird with the CI. Only some of the platforms seems to not match. I wonder if they have slightly different implementations of random or something.

Copy link
Contributor

github-actions bot commented Feb 6, 2025

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Ethan Mahintorabi <[email protected]>
Copy link
Contributor

github-actions bot commented Feb 7, 2025

clang-tidy review says "All clean, LGTM! 👍"

@QuantamHD
Copy link
Collaborator Author

QuantamHD commented Feb 7, 2025

@maliberty This is ready to go.

@QuantamHD
Copy link
Collaborator Author

As context std::uniform_int_distribution is not platform stable so it can return different values for the same seed on different OS's. Had to replace with boost's implementation to get platform stability.

@maliberty
Copy link
Member

As context std::uniform_int_distribution is not platform stable so it can return different values for the same seed on different OS's. Had to replace with boost's implementation to get platform stability.

This is odd as it takes a random generator that is platform stable. I would like to understand this assertion as we use such in many places without observing any problem.

Also this change invalidates the secure CI I ran so I would have to start over.

@QuantamHD
Copy link
Collaborator Author

QuantamHD commented Feb 8, 2025

See this reddit thread.

https://www.reddit.com/r/cpp/comments/7i21sn/til_uniform_int_distribution_is_not_portable/

The entropy sources produce identical bits, but the distributions do not.

Also https://www.reddit.com/r/cpp/comments/e9mft4/when_did_you_want_to_use_random_but_couldnt/

@maliberty
Copy link
Member

Its a 7 year old post. I took the sample code and pasted it into godbolt and tried various compiler versions. I haven't found any so far that don't give the same output- from gcc 8.1 to 14.2 and clang 7.0.0-19.1. Have actually observed this?

Does boost give any more of a guarantee across versions and compilers?

@QuantamHD
Copy link
Collaborator Author

The std version fails in the OR CI for older versions of Debian and Ubuntu (different outputs). But passes for newer OSes and was immediately fixed by moving to boost.

Boosts algorithm doesn't rely on any compiler behavior. It just happens to be implemented in a certain way. It makes the output stable for at least any compiler version. No such guarantee is provided across versions, but arguably we don't care if the user decides to do something that's not in the CI.

The issue with the std version is that implementation for the distributions isn't specified, and there's lots of deterministic algorithms to generate a sequence of numbers that meets the spec.

@maliberty
Copy link
Member

Why is no issue observed or fixed in ppl/grt/drt/utl which all use uniform_int_distribution? What version of those OS and what compilers do they use where you see the issue?

@QuantamHD
Copy link
Collaborator Author

QuantamHD commented Feb 8, 2025

image image

The original issue is a blocking one for us @maliberty so I would like to get something merged early next week.

I'm fine if you want to experiment with the std version or look into other usages, but as far as I can tell std::uniform_int_distribution is not stable, and if other things work it just happens to be good luck (either compiler impls lining up, inadequate tests, or random algos converging).

Either way we need to fix the original bug which is a blocking segfault.

@maliberty
Copy link
Member

new CI started

@maliberty
Copy link
Member

The private designs are looking good (the largest two are still running). However there are failure in

aes-block asap7: [ERROR] cts__design__instance__count__hold_buffer fail test: 1103.0 <= 1087.0
bp_be_top nangate45: [ERROR GRT-0116] Global routing finished with congestion.
microwatt sky130hd: [ERROR GRT-0116] Global routing finished with congestion.

The first one could just be a metrics update but the other two require investigation. @QuantamHD would you take a look?

@AcKoucher
Copy link
Contributor

AcKoucher commented Feb 11, 2025

@QuantamHD Some results from our private CI are showing that these changes seem to be causing some impact on the clustering done within MPL which is made by PAR. I'm taking a look.

@QuantamHD
Copy link
Collaborator Author

Thanks @AcKoucher!

@AcKoucher
Copy link
Contributor

Updating on this: Indeed the PAR result changes and this leads to MPL having to work on a different physical hierarchical tree. The bad results do look like bad luck though.

Once #6697 and #6689 fixes are merged, I'll make some more testing with the changes here. My suspicions are perhaps we won't have the congestion problems.
If we do, I still have one idea to help solve both sky130hd/uW and ngt45/bp_be_top problems which seem to be related to big macro clusters being placed in a somewhat centered location.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants