Skip to content

shaleen/SIGMOD17Reproducibility

Repository files navigation

SIGMOD17Reproducibility

Code Information
Programming Language Python
Compiler Info Python 2.7 Interpreter
Packages/Libraries Needed Python Anaconda (See details below)

Datasets

Experiments require MySQL 5.6. The code does not have any version specific features and should work on all version after 5.6 too.

Below are the dataset files. These scripts assume that there is no database with the name graph, qa, ssb and tpch. To import the database, use the following command replace <filename> with graph, qa, ssb and tpch.

mysql -u <username> -p < <filename>.sql
  1. https://www.dropbox.com/s/7mb6snalnxndlxp/graph.sql?dl=0
  2. https://www.dropbox.com/s/aqop2af4i39pe1w/qa.sql?dl=0
  3. https://www.dropbox.com/s/y609n91exdishyf/ssb.sql?dl=0
  4. https://www.dropbox.com/s/kdk4iq9kngu5cr2/tpch.sql?dl=0

Alternatively, you could also run the db.sh file in the root folder. If pass is your MySQL password, you can execute the command sudo ./db.sh pass to download the files and set up the database.

Hardware Information

All experiments were performed on 16GB machine installed with OS X 10.10.5. There are no special hardware requirements and results should be easily reproducable on any standard machine.

Hardware Information
Processor 2.2 GHz Intel Core i7
Memory 16 GB 1600 MHz DDR3
Cores quad core
Cache 6MB shared L3 cache

Environment Setup

  • Execute the database scripts. Once the scripts are executed, you should see 4 databases in MySQL.
  • Checkout the code from GitHub to a folder on the machine.
  • Install Anaconda as explained here
  • We will replicate the python code environment using conda's virtual environment. Execute the following command from the top level of the directory so that environment.yml is present. This will create a private virtual python 2.7 based anaconda environment with all dependencies loaded
conda env create -f environment.yml
  • Navigate to the SIGMOD17Reproducibility directory in terminal.
  • Activate the environment by using the command source activate sigmodduplicate.
  • Install mysql-python using the command pip install mysql-python.
  • Update constants/db.py file with the username and password for the database. Set up is now complete.

Running Experiments

Below is the complete set of experiments to reproduce results in the paper. An overarching note is to keep in mind that there will be some amount of variability in the results (time taken or price assigned to queries) due to sampling of query parameters, data sampled or both. However, these variations are minor and never close to an order of magnitude. Thus, the important thing is to observe the trend in the graphs which should be close to the results in the paper.

Section 2.4

  • Execute the following commands
cd integration
python CombinerBenchmarkPriceBehavior.py

This set of experiment will generate 4 figures - benchmarkselect.pdf, benchmarkproject.pdf, benchmarkjoin.pdf and benchmarkgroup.pdf corresponding to Figure 2 in the paper. Note that the legend labelling is consistent with the paper for ease of verification.

Section 5.1

  • Execute the following commands
cd integration
python PriceSelectivity.py #Generates benchmarkselectsupportsize.pdf corresponding to Figure 4a
python PriceAttributes.py #Generates benchmarkprojectsupportsize.pdf corresponding to Figure 4b
python SwapUpdateFraction.py #Generates benchmarkcellswapratio.pdf corresponding to Figure 4c
python SupportSetSize.py #Generates benchmarktimesssize.pdf corresponding to Figure 4d
  • Execute the following for SSB experiments
cd integration_ssb
python CombinerReproduce.py #Generates ssbstatichistorytime.pdf, ssbstatichistoryawareprice.pdf corresponding to Figure 4f/4e respectively and barchartssbtime.pdf for Figure 5a
python HistoryAwareQ11.py #Generates ssbq11.pdf corresponding to 4g
  • Execute the following for TPCH experiment
cd integration_tpch
python CombinerReproduce.py #Generates barcharttpchtimetest.pdf for Figure 5b

Section 5.4

  • Execute the following commands
cd integration_dblp
python Combiner.py #Generates prices for queryes Q^c_1, Q^c_2, Q^c_3, Q^c_4, Q^c_5, Q^c_6, Q^c_7
  • Execute the following commands
cd integration_crash
python Combiner.py #Generates prices for queryes Q^d_1, Q^d_2, Q^d_3, Q^d_4

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published