Skip to content

Commit

Permalink
add some guidance steps
Browse files Browse the repository at this point in the history
  • Loading branch information
mnmami authored Sep 8, 2016
1 parent 681eb8f commit f50b601
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,12 @@
# SeBiDA
Source code and a How-to for SeBiDA
Hi, here you find the source code of SeBiDA. It's actually a partial code, it shows only Spark code for schema extraction and loading of semantic (rdf) data (in nt format), which is for those having basic Spark knowledge easy to get started with. We will however publish a more complete code and make more detailed setup description for users with no prior knowledge in Spark.

In SchemExtration class, make the necessary changes, like:
typeDataFrame.write().parquet()
to specify the write settings, host, port, etc.

Loading non-semantic data in SeBiDA is done the normal way you can find online: reading file to dataframe, creating schema if needed, or otherwise just save it back into a Parquet file. This latter requires third-party libraries for reading e.g., CSV and XML, which can easily be found online. Again, more detailed get-started steps for Spark beginners will flow later.

For benchmarking, we used BSBM benchmark data generator (http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BenchmarkRules/#datagenerator) and its 12 queries on SQL. For these latter, we actually didn't use the exact all SQL queries but rather created our own SQL conversion from BSBM SPARQL queries. This is because of the limited syntax of Spark SQL comparing to the standard SQL syntax used in BSBM standard. The rewritten SQL queries will also published soon.

For more information, please contact me at: [email protected], and I'll be happy to assist.

0 comments on commit f50b601

Please sign in to comment.