Exploring Emerging Social Media: Acquiring, Processing, and Visualizing Data with Python and OSoMe Web Tools
Time: Wed, July 17, 2024, 9:00 AM to 12:30 PM
Location: Irvine: Amado Recital Hall (D), University of Pennsylvania, Philadelphia, PA
Stay ahead in the evolving field of social media research with our tutorial!
With new platforms like Bluesky and Mastodon emerging and established ones restricting data access, we will explore different innovative tools and techniques developed by the Observatory on Social Media (OSoMe) to facilitate the analysis and understanding of these platforms.
- Tools: Botometer-X, Coordiscope, TopFIBers, and more.
- Simulation: Generating simulated data using SimSoM.
- Languages/Tools: Python and Jupyter notebooks.
- Data Sources: Datasets from Mastodon and Bluesky.
- Techniques: Building networks (co-reposts, co-hashtags, co-urls), centrality measures, community detection, and text embedding with sentenceBERT.
- Tool: Helios-Web for interactive visualization.
- Techniques: Natural language processing for narrative extraction.
This tutorial is suited for anyone with an interest in social media analysis, encompassing a wide range of disciplines and expertise. While prior knowledge of Python is desirable, the tutorial is designed to be inclusive and accessible to those with varying levels of technical proficiency. The tutorial will be conducted in Python and use Jupyter notebooks preloaded with datasets and scripts. The OSoMe tools and materials will be open-source, available via the observatory’s website or GitHub, providing participants with a toolkit to kickstart or advance their social media research endeavors.
You can find the tutorial materials and instructions on how to set up your environment on our GitHub page: GitHub.
- Quick overview of the tutorial.
- Introduce the presenters.
- Demonstration on how to set up the environments and API keys.
- The organizers will help the attendees with the setup process.
- Utilizing OSoMe tools for analyzing and acquiring data, including Botometer-X, network tool, CoordiScope, Top FIBers, and OSoMe Mastodon Search.
- Data acquisition from an emerging platform using OSoMe infrastructure.
- Generate synthetic data from SimSoM, a minimal model that simulates information-sharing on a social media platform.
- Preprocessing, filtering, cleaning.
- Constructing interaction networks (re-post, reply, mention).
- Building co-hash and co-post networks.
- Generating embeddings of posts using BERT.
- Employing similarity measures to find posts with similar content.
- Illustrating a classification task using embeddings.
- Using Helios-web for visualizing user networks and post embeddings.
- Demonstrating the use of semantic axes to illustrate polarity.
- Extracting and analyzing communities.
- Identifying and discussing the narratives prevalent in each community, based on their content.
- Filipi N. Silva, Research Scientist, Observatory on Social Media
- Bao Tran Truong, Ph.D. candidate in Informatics, Observatory on Social Media
- Wanying Zhao, Ph.D. candidate in Complex Networks and Systems, Indiana University
- Kai-Cheng Yang, Postdoctoral research associate at the Network Science Institute, Northeastern University
Data collected from Bluesky and Mastodon via streaming covering the period between 2024-06-25 and 2024-07-02. Entries contains any the of following terms: biden, trump, or debate.
Dataset is available on Zenodo:
We suggest using miniconda to install python packages and setup your environment. You can also use Anaconda, but it is a larger package and may take longer to install. Alternatively, you can also setup your own python environment using pip and virtualenv.
Download the tutorial materials from the github.
- Click on the green "Code" button and download the zip file.
- Unzip the file in a folder of your choice.
Alternatively you can clone the repository using git:
git clone https://github.com/osome-iu/IC2S2_OSoMe_tutorial_2024.git
Miniconda is a minimal installer for Conda, a package manager and an environment manager. Here’s how to install it:
-
Download Miniconda:
- Visit the Miniconda download page.
- Choose the version suitable for your operating system (Windows, macOS, or Linux).
- Download the appropriate installer (Python 3.x is recommended).
-
Install Miniconda:
- Windows: Run the downloaded
.exe
file and follow the on-screen instructions. - macOS/Linux: Open a terminal, navigate to the folder containing the downloaded file, and run
bash Miniconda3-latest-MacOSX-x86_64.sh
(adjust the filename as needed).
- Windows: Run the downloaded
-
Verify the Installation:
- Open a new terminal window.
- Type
conda list
. If Miniconda is installed correctly, you'll see a list of installed packages.
Creating a separate environment for your projects is good practice:
-
Create a New Environment for this course:
- Run the command:
conda env create -f environment.yml
. This will create a new environment called ic2s2tutorial with all the necessary packages installed.
- Run the command:
-
Activate the Environment:
- Run:
conda activate ic2s2tutorial
.
- Run:
-
Launch Jupyter Lab:
- Run:
jupyter lab
. - This will open Jupyter Lab in your default web browser.
- Run:
The created environment already includes most of the packages we'll need, but if you need to install any additional packages, here's how:
-
**Installing packages using the requirements.txt **:
- All the packages used in this tutorial can also be installed using the
requirements.txt
file. - Run:
pip install -r requirements.txt
.
- All the packages used in this tutorial can also be installed using the
-
**Manually Install Packages **:
- In your activated environment, run:
conda install <name of the package>
to install the packages - For example, to install the
scikit-learn
package, run:conda install scikit-learn
. - Alternatively, you can use
pip install <name of the package>
to install packages from PyPI.
- In your activated environment, run:
Make sure everything is installed correctly:
-
Open a New Notebook in Jupyter Lab:
- In Jupyter Lab, create a new notebook.
-
Test the Packages:
- Try importing the packages:
import numpy as np
,import pandas as pd
,import matplotlib.pyplot as plt
. - If there are no errors, the packages are installed correctly.
- Try importing the packages:
- Updating Conda: Keep Conda and your packages updated with
conda update conda
andconda update --all
. - Managing Environments: View your environments with
conda env list
and switch between them usingconda activate <env_name>
. - Finding Packages: To find available packages, use
conda search <package_name>
. - Conda Cheat Sheet: For more commands, see the Conda Cheat Sheet.