GitHub - usc-cloud/parallel-high-betweenness-centrality: MPI based algorithm for detecting high centrality vertices in large graphs

Overview

Each community is comprised of several key players which have an impact on its overall behavior and play critical roles. Detecting what these players are is critical in various domains such as identifying key individuals or places when tracking criminal and terrorist activities. Our algorithm is able to quickly uncover the top n influential nodes located at major communication cross points inside a given community. It is designed to work on distributed environments making it suited for processing geographically spread data found in cloud systems.

Objectives

The objective of the algorithm is to enable fast identification of top n nodes in a distributed sparse graph.

Benefits

It is beneficial to data scientists seeking fast identification of top central players in large geographically distributed graph data.

Measures of effectiveness

The algorithm has been tested using MPI on an HPC cluster with a varying number of nodes ranging from 4 to 64 with 2 workers per node, each worker with 8 GB memory. Each node consists of two Quad-core AMD Opteron 2376 2.3 GHz processors. Speed-ups of up to 12x have been observed for synthetic and real-life sparse graphs up to 1M vertices and 3M edges.

Required Skill Sets

To use the algorithm on a given data set:
Required
Familiarity with Linux
Manipulating graph data (potentially to convert the given data to the Metis graph format)
Ability to run MPI programs on a single node or cluster depending on graph size
Good to have
To use the quick start guide, familiarity with Virtual Box and virtualization
C++
MPI Library
To setup the environment on a cluster
Cluster administration knowledge
Setting up MPI environment on cluster

How to get it

The algorithm is hosted at the git hub repository at https://github.com/usc-cloud/parallel-high-betweenness-centrality

Clone the repository using

git clone https://github.com/usc-cloud/parallel-high-betweenness-centrality.git

Note: You may need to install a git client to download the repository.

Installation

A quick start guide can be found here together with a precompiled VM to help you get started.

A detailed guide on how to install the software on a distributed setup can be found here.

Future enhancements

Given the increasing popularity of cloud oriented graph analytics frameworks a GoFFish version of the proposed algorithm will be delivered.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
msl-bc-mpi		msl-bc-mpi
GeneralInstallationGuide.md		GeneralInstallationGuide.md
QuickStart.md		QuickStart.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Objectives

Benefits

Measures of effectiveness

Required Skill Sets

How to get it

Installation

Future enhancements

About

Releases

Packages

Contributors 3

Languages

usc-cloud/parallel-high-betweenness-centrality

Folders and files

Latest commit

History

Repository files navigation

Overview

Objectives

Benefits

Measures of effectiveness

Required Skill Sets

How to get it

Installation

Future enhancements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages