-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
11 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,12 @@ | ||
# SeBiDA | ||
Source code and a How-to for SeBiDA | ||
Hi, here you find the source code of SeBiDA. It's actually a partial code, it shows only Spark code for schema extraction and loading of semantic (rdf) data (in nt format), which is for those having basic Spark knowledge easy to get started with. We will however publish a more complete code and make more detailed setup description for users with no prior knowledge in Spark. | ||
|
||
In SchemExtration class, make the necessary changes, like: | ||
typeDataFrame.write().parquet() | ||
to specify the write settings, host, port, etc. | ||
|
||
Loading non-semantic data in SeBiDA is done the normal way you can find online: reading file to dataframe, creating schema if needed, or otherwise just save it back into a Parquet file. This latter requires third-party libraries for reading e.g., CSV and XML, which can easily be found online. Again, more detailed get-started steps for Spark beginners will flow later. | ||
|
||
For benchmarking, we used BSBM benchmark data generator (http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BenchmarkRules/#datagenerator) and its 12 queries on SQL. For these latter, we actually didn't use the exact all SQL queries but rather created our own SQL conversion from BSBM SPARQL queries. This is because of the limited syntax of Spark SQL comparing to the standard SQL syntax used in BSBM standard. The rewritten SQL queries will also published soon. | ||
|
||
For more information, please contact me at: [email protected], and I'll be happy to assist. |