This library is a sentence semantic measurement tool based on BERT Embeddings. It uses the forward pass of the BERT (bert-base-uncased) model for estimating the embedding vectors and then applies the generic cosine formulation for distance measurement. The distance metric can be changed and the intermediate sentence and word embedding vectors can be attained as well. The model has been abstracted from the Google Research's BERT implementation.The pytorch wrapper over BERT is credited to Chris McCormick.
Installation is carried out using the pip command as follows:
pip install BERTSimilarity==0.1
For using inside the Jupyter Notebook or Python IDE:
import BERTSimilarity.BERTSimilarity as bertsimilarity
The 'Similarity_Test.py' file contains an example of using the Library in this context.
A sample of semantic similarity measurement with 4 different sentences , 2 of which are vaguely similar is provided below:
This Colab Notebook can be used as well for experimentation.
A Kaggle Kernel for Question Pair Similarity detection is also provided which uses this library.
The Notebook is featured in QuantumStat.com
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
MIT