Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval code not syncing between GPUs #83

Open
SnakeOnex opened this issue Feb 3, 2025 · 1 comment
Open

Eval code not syncing between GPUs #83

SnakeOnex opened this issue Feb 3, 2025 · 1 comment

Comments

@SnakeOnex
Copy link

Hello,

looking at train_utils.py:581, in the eval_reconstruction section, there doesn't seem to be any sync of eval values between the GPUs. So each GPU computes rFID etc on part of the eval set, but then only the value from main GPU is reported. Not sure if I am looking at it right or if you guys are aware of this and want it this, but wanted to bring it to your attention just in case.

@cornettoyu
Copy link
Collaborator

Thanks for the note.

The current code should run evaluation on the full val set on each device (GPU), so the online evaluation still has the correct numbers but the computation could be wasted (since each device will test the whole dataset). We will check on our side and update with a distributed version to speed up the evaluation process soon. Feel free to let me know if you have any question :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants