[Question] two-tower-model + infoNCE how to optimize #718

unshaven · 2024-06-05T03:46:16Z

I have tried a two-tower model (user and query) in a real industrial scenario using contrastive learning. The samples are all actual click samples, and the loss function is InfoNCE. I have a few questions:

The model performs best with only one layer, and the more MLP layers I add, the worse the HR@100 becomes.
Using L2 normalization at the end of the model degrades performance.

As a result, I currently only have one MLP layer and no normalization. Could you please provide some advice or share some experiences on what I should do?

rlcauvin · 2024-06-06T14:52:08Z

Did you write your own implementation of the InfoNCE loss function, or are you using an existing implementation? I'm interested in trying it. Are you using it in your retrieval model or ranking model?

While I haven't used an InfoNCE loss function, for my retrieval and ranking models, I've found inverse time decay works really well to avoid overfitting, for example:

initial_learning_rate = 0.0007
decay_steps = cached_train_ds.cardinality().numpy()
decay_rate = 2.4

lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
  initial_learning_rate = initial_learning_rate,
  decay_steps = decay_steps,
  decay_rate = decay_rate,
  staircase = False)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] two-tower-model + infoNCE how to optimize #718

[Question] two-tower-model + infoNCE how to optimize #718

unshaven commented Jun 5, 2024

rlcauvin commented Jun 6, 2024

[Question] two-tower-model + infoNCE how to optimize #718

[Question] two-tower-model + infoNCE how to optimize #718

Comments

unshaven commented Jun 5, 2024

rlcauvin commented Jun 6, 2024