Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High loss and low bleu-4 for training #192

Open
loserlulin9 opened this issue Feb 16, 2023 · 7 comments
Open

High loss and low bleu-4 for training #192

loserlulin9 opened this issue Feb 16, 2023 · 7 comments

Comments

@loserlulin9
Copy link

When I train a new model in flickr8k and flickr30k dataset in my environment, I find that the trianing loss is too high(about 10) and the bleu-4 is too low(about 2.4e-232) after 20 epochs. It is also very strange that the parameter epochs since last improvement is 20. I didn't change the train.py code except some small bugs. How can I improve it? Is anyone having the same problem? THANKS!!!
image

@AndreiMoraru123
Copy link

AndreiMoraru123 commented Feb 16, 2023

What exactly have you changed in the code?

Be wary of erasing things like

global best_bleu4, epochs_since_improvement, checkpoint, start_epoch, fine_tune_encoder, data_name, word_map

PEP will mark those as warnings, but here they they have a good use.

@loserlulin9
Copy link
Author

I just change the code "scores, _ = pack_padded_sequence(scores, decode _lengths, batch_first = True)" to "scores = pack_padded_sequence(scores, decode _lengths, batch_first = True).data " to debug. I also change some data parameters in the begin of train.py but I don't think it would influence a lot. I didn't change the global parameters code. Do you know how to make the loss convergence? Should I lower the learning rate?

@AndreiMoraru123
Copy link

Have you tried this fix instead?

@loserlulin9
Copy link
Author

Have you tried this fix instead?

Yeah, I just delete the '_', but the cross entrypy loss must accept two tensor parameters. So I add the '.data' to the end of this code.

@AndreiMoraru123
Copy link

That's true. They should be the same in the loss by using .data. Curios, is your loss just not decreasing, or is it getting worse?

@loserlulin9
Copy link
Author

That's true. They should be the same in the loss by using .data. Curios, is your loss just not decreasing, or is it getting worse?

My trian.py works, but the loss just not decreases.

@Kevinskt
Copy link

I change the code "scores = pack_padded_sequence(scores, decode_lengths, batch_first=True)[0]",
Because he required these in the new version. After I finished these, I didn't encounter your situation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants