-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract geographic coverage via NLP #20
Comments
Agree, this would be valuable information. You mentioned NLP, do you have an idea where to get this info from? Any reliable/consistent source? |
I never worked with NLP but did some investigations in the past: On the website repo we also have an issue that is talking about this problem: In my view, a first step to get started with NLP would be to create missing topic labels for the projects. For this, one could use the README of the already created projects and their topic as training data. For about 50% of the projects, the topics are missing and could be added to the database in this way. This would be a clear improvement of the database, would enable much better searches and would also be very interesting in the analysis. |
I think there are several approaches we can consider here. |
That should be feasible. I never worked with such frameworks just the classical CNNs for image processing so far. |
Had some success last night with the DOI extraction. More details in the separated issue protontypes/open-sustainable-technology#172. The new list is compiling a now CSV file at the moment. It looks like we are getting DOI links for about a quarter of the projects, but we are still missing some. Let us see if there are open source tools that give us more contextual information based on the DOIs. |
Identifying the geographical coverage of the models and data behind the project could be very interesting to detect areas without coverage. This could also be for help to find projects for a specific geographical area they are interested in.
The text was updated successfully, but these errors were encountered: