Skip to content

Latest commit

 

History

History
123 lines (103 loc) · 7.06 KB

QuickStart.md

File metadata and controls

123 lines (103 loc) · 7.06 KB

Quick Start GoFFish - A Subgraph Centric Framework for Large Scale Time Series Graph Processing

Subgraph centric programming model

Vertex centric programming model getting popular for large scale graph processing due to simplicity of the programming model and public availability of software frameworks like Apache Giraph. But the vertex centric programming model has its own down sides mainly due to the slow convergence of vertex centric algorithms and high communication overhead.

Vertex centric programming model forces programmer to think like a vertex when developing algorithms. Logic is executed at each vertex in the graph in parallel and vertices communicate with each other by using edges and vertex IDs.

Subgraph centric programming model can be thought as an extension of the vertex centric programming model where the programming logic is executed at subgraph level. In this model the graph is initially partitioned and distributed across multiple computing nodes and subgraphs (or weakly connected components) were identified. Computation within the subgraph takes place in memory where subgraphs can communicate with each other using messages.


This document walks you through the basic steps to run a simple subgraph centric algorithm to get total number of vertices in the graph on your local machine.

Quick Start With Development VM

A pre installed virtual machine is used in this guide and can be easily installed on your local machine. Following installation instructions assume a Linux operating system environment.

Download Virtual Machine

Download the pre installed virtual machine from here

Setting up

Install Oracle Virtual Box 4.3.8. You can download it from here.
Install and configure Vagrant:
sudo apt-get install vagrant
Add Vagrant plugin that keeps Virtual Box Guest Additions in sync:
vagrant plugin install vagrant-vbguest
Extract the downloaded virtual machine

Go to the virtual machine directory and start the environment:
vagrant up

After it boots, log in to the VM:
vagrant ssh

Other useful vagrant commands:
vagrant suspend
Saves the current running state of the machine and stops it. vagrant up will resume.

vagrant halt gracefully shuts down the VM.
vagrant up will boot it back up.
vagrant destroy destroys the VM (and all the cruft inside of it). Running vagrant up at this point will reprovision and run the deploy scripts again.
More info at http://www.vagrantup.com/

Running The Sample

Setting up GoFS (GoFFish graph file system)

Log in to the virtual machine:
vagrant ssh

Change the password of root user and changed to root user:
$sudo passwd root
$su root

Start the namenode. A namenode is an interface that tracks all the metadata for a particular GoFS installation. At the moment GoFS ships with a REST server based namenode, which can be run through the GoFSNameNode script. This script accepts a URI argument and attempts to start a name node REST server at this URI. Optionally a file may also be specified which is used to save name node state. It is recommended you always specify a save file or name node state may be lost if the process is killed. Once a name node exists it is referenced through a Java class type and a URI. The default type is edu.usc.goffish.gofs.namenode.RemoteNameNode which can communicate with a remote REST server based name node.

cd $GOFFISH_HOME/gofs-2.0/bin
./GoFSNameNode http://localhost:9998

Login to vm from another terminal:
$vagrant ssh

Change to root:
$su root

Create directory named gofs-data in $GOFFISH_HOME. This will be used to store the graph data in binary format:
$mkdir $GOFFISH_HOME/gofs-data

Change in to GoFS binary directory and and format the graph file system:
$cd $GOFFISH_HOME/gofs-2.0/bin
$./GoFSFormat

A sample graph which can be used is located at /home/vagrant/deployment/samples/gofs-samples directory. To deploy this graph in the GoFS file system we need to create a list of graph and graph instances. In this example we will be only using graph template which is equivalent to a static graph.

File named list.xml in the /home/vagrant/deployment/samples/gofs-samples directory lists the graph data locations

Deploy the graph. This will partition the graph convert partitions into binary format and copy the files into graph storage directory. In this case we will only have a single graph partition:
$cd $GOFFISH_HOME/gofs-2.0/bin
$./GoFSDeployGraph edu.usc.goffish.gofs.namenode.RemoteNameNode http://localhost:9998 "graph1" 1 /home/vagrant/deployment/samples/gofs-samples/list.xml

Now the sample graph is deployed in the GoFS file system.

Setting up and Running applications

To start and run Gopher go to Gopher client directory and deploy the vertex count application:
$cd /home/vagrant/deployment/gopher-client-2.0/bin
$./setup-gopher.sh

This will install the user applications and setup computation nodes.

Now everything is setup and ready to run gopher applications. You can run gopher applications using the GopherClient command:
$./GopherClient.sh /home/vagrant/deployment/gopher-client-2.0/gopher-config.xml /home/vagrant/deployment/goffish_home/gofs-2.0/conf/gofs.config graph1 vert-count-2.0.jar edu.usc.pgroup.goffish.gopher.sample.VertCounter NILL

Executing this algorithm will output the vertex count of the input graph. You can find the results in $GOFFISH_HOME/gopher-server-2.0/vert-count.txt

Note: If executing ./GopherClient will not return to console. You need to forcefully return to console by pressing ctrl + c. (This is a known bug)

Mode detailed informations regarding general deployment of GoFS and Gopher is available in GitHub.

Publications:

GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics, Yogesh Simmhan, Alok Kumbhare, Charith Wickramaarachchi, Soonil Nagarkar, Santosh Ravi, Cauligi Raghavendra and Viktor Prasanna, European Conference on Parallel Processing (Euro-Par) , 2014