Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/counting cells #19

Closed
wants to merge 5 commits into from
Closed

Conversation

AnneIsARealProgrammerNow
Copy link
Contributor

The normalisation did not work well because all cells in tables were counted as an individual text block, meaning the most-frequent geography was Columbia and second-most frequent was Moldova, despite these countries having relatively few documents in actuality. This is a data quality issue that needs to be solved in the open data more broadly, but for now, selecting only text blocks where "text_block.type" is text, title or heading seems to get rid of most of the problem.

@AnneIsARealProgrammerNow AnneIsARealProgrammerNow added the bug Something isn't working label Nov 14, 2024
Copy link
Member

@kdutia kdutia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you might've pushed a new version of the loading data notebook by accident!

@kdutia
Copy link
Member

kdutia commented Nov 18, 2024

closing as duplicated by #21

@kdutia kdutia closed this Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants