Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyser to create a XLSX for Gephi from Twitter scrape #152

Merged
merged 13 commits into from
May 4, 2020

Conversation

breezykermo
Copy link
Member

Because the Twitter selector uses twint under the hood to mine tweets, a simple search doesn't go through replies or retweets.

The TwintToGephi analyser I'm adding here uses a simple heuristic to create a relational tweet graph, and then produces an XLSX with two tabs ('Edges' and 'Vertices') that is Gephi-ready.

The heuristic is:

  1. Scrape using search to find all tweets that contain a search term.
  2. For each tweet returned, scrape again for reply tweets. (I'm not sure that twint returns all replies for every tweet, but it does seem to return some.)
  3. Using both original tweets and replies, construct a Gephi-ready XLSX.

This final step is done in the post_analyse phase of the analyser, which produces the XLSX in a 'FINAL' element.

@breezykermo
Copy link
Member Author

Fix build and implement #153 before merge.

@breezykermo breezykermo merged commit b924abf into master May 4, 2020
@breezykermo breezykermo deleted the topic/twint-gephi branch May 4, 2020 11:06
breezykermo added a commit that referenced this pull request May 4, 2020
* rms faulty line in build

* factor out common twint utilities

* gets direct replies through another twint search

* WIP: start preparing graph logic

* WIP: start structuring CSV graph

* actually export CSV

* minor fix

* update requirements.txt

* correct dest_q update

* add download_videos option to Twitter selector

* lint

* proper fix

* correct info.yaml

Co-authored-by: Lachlan <Kermode>
breezykermo added a commit that referenced this pull request May 4, 2020
breezykermo added a commit that referenced this pull request May 8, 2020
* rms faulty line in build

* Analyser to create a XLSX for Gephi from Twitter scrape (#152)

* rms faulty line in build

* factor out common twint utilities

* gets direct replies through another twint search

* WIP: start preparing graph logic

* WIP: start structuring CSV graph

* actually export CSV

* minor fix

* update requirements.txt

* correct dest_q update

* add download_videos option to Twitter selector

* lint

* proper fix

* correct info.yaml

Co-authored-by: Lachlan <Kermode>
breezykermo added a commit that referenced this pull request May 8, 2020
* rms faulty line in build

* Analyser to create a XLSX for Gephi from Twitter scrape (#152)

* rms faulty line in build

* factor out common twint utilities

* gets direct replies through another twint search

* WIP: start preparing graph logic

* WIP: start structuring CSV graph

* actually export CSV

* minor fix

* update requirements.txt

* correct dest_q update

* add download_videos option to Twitter selector

* lint

* proper fix

* correct info.yaml

Co-authored-by: Lachlan <Kermode>

* modify to pass all_elements to post_analyse by default

* WIP: scaffold rank

* make in_parallel configurable

* refactor Rank analyser to a single function

* fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant