Name		Name	Last commit message	Last commit date
parent directory ..
1-name-detection-bisim.R		1-name-detection-bisim.R
2-data-analysis.R		2-data-analysis.R
3-name-transliterations.py		3-name-transliterations.py
README.md		README.md
TCPD.csv		TCPD.csv
soundalike_candidates_full.csv		soundalike_candidates_full.csv

README.md

Data Analysis

This folder contains the code to replicate the analysis in the story. It is divided into three files, in the preferred order of execution:

1-TCPD.R - Takes in the TCPD.csv file and runs the name similarity analysis. It will create an intermediate file for year State-Year.csv in the data/intermediate-data folder and will be joined into a final csv in the data folder.
2-data-analysis.R - Takes in the State-Year.csv file and runs the analysis. This is not perfectly structured, though the code is commented and should be easy to follow. The JSONs for each step are not written in all cases.
3-name-transliterations.py: (Not used in the story). This file goes through rows of the similar-candidates.csv file and transliterates the names of candidates from the original language to English using the Gemini API. It was not used in the story but is included for completeness, and possibly for future use.