-
Notifications
You must be signed in to change notification settings - Fork 57
Direct mapping vs. CSV2RDF, problematic case with foreign keys
Consider three simple tables, called People
, Addresses
, and Departments
respectively (see the original example in the Direct Mapping document):
ID | fname | addr | deptName | deptCity |
---|---|---|---|---|
7 | Bob | 18 | accounting | Cambridge |
ID | city | state |
---|---|---|
18 | Cambridge | MA |
ID | name | city |
---|---|---|
23 | accounting | Cambridge |
A relational table always has a schema, that the RDF Direct Mapping makes use of. In this example, the schema defines that:
- For the
People
table-
ID
is a primary key in the table, and the cells are integers - The
name
column contains strings - The
addr
column contains integers
-
- For the
Addresses
table-
ID
is primary key in the table, and the cells are integers - both the
city
andstate
columns contain strings
-
- For the
Departments
table-
ID
is primary key in the table, and the cells are integers - both the
city
andstate
columns contain strings - both the
name
andcity
columns are strings
-
- The
addr
column in thePeople
table contains foreign keys that references theID
field of theAddresses
table. - The
deptName
anddeptCity
in thePeople
table contain joint foreign keys referencing candidate keys for thename
andcity
pair of theDepartment
table as unique keys
The generated RDF Graph is as follows:
<http://foo.example/DB/People/ID=7> rdf:type <http://foo.example/DB/People>;
<http://foo.example/DB/People/#ID> 7;
<http://foo.example/DB/People/#fname> "Bob";
<http://foo.example/DB/People/#addr> 18;
<http://foo.example/DB/People/#ref-addr> <http://foo.example/DB/Addresses/ID=18>;
<http://foo.example/DB/People/#deptName> "accounting";
<http://foo.example/DB/People/#deptCity> "Cambridge";
<http://foo.example/DB/People/#ref-deptName;deptCity> <http://foo.example/DB/Department/ID=23>;
.
<http://foo.example/DB/Addresses/ID=18> rdf:type <http://foo.example/DB/Addresses>.
<http://foo.example/DB/Addresses/#ID> 18;
<http://foo.example/DB/Addresses/#city> "Cambridge";
<http://foo.example/DB/Addresses/#addr> "MA"
.
<http://foo.example/DB/Departments/ID=23> rdf:type <http://foo.example/DB/Addresses>.
<http://foo.example/DB/Departments/#ID> 23;
<http://foo.example/DB/Departments/#name> "accounting";
<http://foo.example/DB/Departments/#city> "Cambridge";
<http://foo.example/DB/Departments/#manager> 8;
<http://foo.example/DB/Departments/#ref-manager> <http://foo.example/DB/People/ID=8>
.
The major difference, compared to the simpler example with foreign keys is the usage of the unique keys. To generate the right object URI in the RDF graph (i.e., http://foo.example/DB/Department/ID=23
) the RDF Direct Mapping processor has to find that unique key combination in the target table (Department
) and, using that combination, has to establish the subject for that specific row (http://foo.example/DB/Departments/ID=23
) which can then be used as an object URL. This can be done because the RDF Direct Mapping processor has access to several tables within the same Relation Database. If these tables are related through foreign and unique candidate keys, it is therefore straightforward to access all the tables in parallel and establish the relationships to generate the triples.
However, this combination cannot be handled in the CSVW Metadata. The situation for CSV files is indeed different: CSV tables are typically single and, potentially, very large files, meaning that a CSV processor cannot be expected to handle several CSV tables in parallel. That is the reason why the Metadata specification does not introduce features that would force to access several tables randomly; this also means that such an RDF mapping cannot be reproduced in CSV2RDF.