Skip to content

A rudimentary Implementation of a simple neural ranking model. Based on word embeddings (Glove, Fasttext). The pre-trained model's inherent vector cosine similarity is the main metric for consideration.

Notifications You must be signed in to change notification settings

Jhex-AI/Semantic-Similarity-Ranking-v.1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Semantic Similarity Ranking

A simple implementation of ranking for search based systems using semantic similarity.

Dataset

https://ciir.cs.umass.edu/downloads/WebAP/

Writeup

Note: A more detailed writeup will be added soon

  1. Acquired the dataset through Slack.

  2. Pre-processed the dataset

    1. Removed Stop-words
    2. Lemmatizated the corpus and saved for future reference
    3. Creation of Inverted-Index (demonstation purposes)
  3. Converted corpus to vectors using Word2Vec

  4. Tested the semantic similarity on random query words using the model,

    Most similar word examples to the query

    modelW2V.wv.similarity('cancer', 'tumor') 
    #0.8035345
    modelW2V.wv.similarity('cancer','ovarian')
    #0.860453

    Least similar word examples to the query

    modelW2V.wv.similarity('cancer', 'cloud') 
    #0.8035345
  5. Converted corpus to vectors using Doc2Vec

  6. Found most similary documents given a query

    new_sentence = "i love dogs".split(" ") 
    # *query = {i,love,dogs}*
    
    model.docvecs.most_similar(positive=[model.infer_vector(new_sentence)],topn=5)
    # *selecting the top n documents*
    
    #Result
    #[('5235', 0.7422172427177429),
    #('4870', 0.7328481674194336),
    #('95', 0.7185875773429871),
    #('5868', 0.7118589878082275),
    #('1954', 0.6987151503562927)]
    
    # *Format = {'DocID','Accuracy of the document with the query'}*

Cheers!

About

A rudimentary Implementation of a simple neural ranking model. Based on word embeddings (Glove, Fasttext). The pre-trained model's inherent vector cosine similarity is the main metric for consideration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published