Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc the BOW approach #3

Open
arnicas opened this issue Dec 8, 2019 · 2 comments
Open

Doc the BOW approach #3

arnicas opened this issue Dec 8, 2019 · 2 comments

Comments

@arnicas
Copy link

arnicas commented Dec 8, 2019

Hi, this looks great. I had to look at the code to get some insight into how to do a BOW approach of my own. Maybe you could add a few lines to the readme about that? The paper seems a little light on how the topic words were selected as well, unless I missed that? But awesome work!

@dathath
Copy link
Contributor

dathath commented Dec 8, 2019

Oops. The current draft seems to be missing a link to where we got the wordlists from: https://www.enchantedlearning.com/wordlist/. Will add this back into the paper! Thanks for catching this.

Aside: Right now, the code only allows for words that are 1 BPE token long. Handling multiple tokens would need a few minor changes.

Thanks for the suggestion; yes, I agree it would be a good idea to make it easier to use with your own BoW. Will consider incorporating this!

@shamoons
Copy link

So are you saying that all words in a wordlist have to be 1 word only? Not carpenter ant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants