You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried a two-tower model (user and query) in a real industrial scenario using contrastive learning. The samples are all actual click samples, and the loss function is InfoNCE. I have a few questions:
The model performs best with only one layer, and the more MLP layers I add, the worse the HR@100 becomes.
Using L2 normalization at the end of the model degrades performance.
As a result, I currently only have one MLP layer and no normalization. Could you please provide some advice or share some experiences on what I should do?
The text was updated successfully, but these errors were encountered:
Did you write your own implementation of the InfoNCE loss function, or are you using an existing implementation? I'm interested in trying it. Are you using it in your retrieval model or ranking model?
While I haven't used an InfoNCE loss function, for my retrieval and ranking models, I've found inverse time decay works really well to avoid overfitting, for example:
I have tried a two-tower model (user and query) in a real industrial scenario using contrastive learning. The samples are all actual click samples, and the loss function is InfoNCE. I have a few questions:
As a result, I currently only have one MLP layer and no normalization. Could you please provide some advice or share some experiences on what I should do?
The text was updated successfully, but these errors were encountered: