Distributed scraping and analysis pipeline for a range of social media platforms
Table of content
The goal of this project is to raise awareness about data privacy. The mean to do so is a tool to scrape, combine and analyze public data from multiple social media sources.
The results will be available via an API, used for some kind of art exhibition.
You can find an more detailed overview here.
Open it in draw.io and have a look at the different tabs "High level overview", "Distributed Scraper" and "Face Search".
part | docs | contact |
---|---|---|
Api | api/README.md |
@jo-fr |
Frontend | frontend/README.md |
@lukas-menzel |
Postgres DB | db/README.md |
@alexmorten |
If you want to join us raising awareness for data privacy have a look into CONTRIBUTING.md
Github handle | Real name | Instagram profile | Twitter profile |
---|---|---|---|
@1Jo1 | Josef Grieb | josef_grieb | josefgrieb |
@Urhengulas | Johann Hemmann | Urhengulas | Johann |
@alexmorten | Alexander Martin | no profile :( | no profile :( |
@jo-fr | Jonathan Freiberger | jonifreiberger | Jonathan |
@m-lukas | Lukas Müller | lmglukas | Lukas Müller |
@lukas-menzel | Lukas Menzel | lukasmenzel | Lukas Menzel |
@SpringHawk | Martin Zaubitzer | / | / |
The deployment of this project to kubernetes happens in codeuniversity/smag-deploy (this is a private repo!)
depency | version |
---|---|
go |
v1.13 (go modules) |
docker |
v19.x |
docker-compose |
v1.24.x |
If this is your first time running this:
- Add
127.0.0.1 my-kafka
and127.0.0.1 minio
to your/etc/hosts
file - Choose a
<user_name>
for your platform of choice<instagram|twitter>
as a starting point and run$ go run cli/main/main.go <instagram|twitter> <user_name>
Run the instagram- or twitter-scraper in docker:
$ make run-<platform_name>