Direct mapping vs. CSV2RDF, problematic case with foreign keys

Consider three simple tables, called People, Addresses, and Departments respectively (see the original example in the Direct Mapping document):

ID	fname	addr	deptName	deptCity
7	Bob	18	accounting	Cambridge

ID	city	state
18	Cambridge	MA

ID	name	city
23	accounting	Cambridge

A relational table always has a schema, that the RDF Direct Mapping makes use of. In this example, the schema defines that:

For the People table
- ID is a primary key in the table, and the cells are integers
- The name column contains strings
- The addr column contains integers
For the Addresses table
- ID is primary key in the table, and the cells are integers
- both the city and state columns contain strings
For the Departments table
- ID is primary key in the table, and the cells are integers
- both the city and state columns contain strings
- both the name and city columns are strings
The addr column in the People table contains foreign keys that references the ID field of the Addresses table.
The deptName and deptCity in the People table contain joint foreign keys referencing candidate keys for the name and city pair of the Department table as unique keys

The generated RDF Graph is as follows:

<http://foo.example/DB/People/ID=7> rdf:type <http://foo.example/DB/People>;
    <http://foo.example/DB/People/#ID> 7;
    <http://foo.example/DB/People/#fname> "Bob";
    <http://foo.example/DB/People/#addr> 18;
    <http://foo.example/DB/People/#ref-addr> <http://foo.example/DB/Addresses/ID=18>;
    <http://foo.example/DB/People/#deptName> "accounting";
    <http://foo.example/DB/People/#deptCity> "Cambridge";
    <http://foo.example/DB/People/#ref-deptName;deptCity> <http://foo.example/DB/Department/ID=23>;
    .

<http://foo.example/DB/Addresses/ID=18> rdf:type <http://foo.example/DB/Addresses>.
    <http://foo.example/DB/Addresses/#ID> 18;
    <http://foo.example/DB/Addresses/#city> "Cambridge";
    <http://foo.example/DB/Addresses/#addr> "MA"
    .

<http://foo.example/DB/Departments/ID=23> rdf:type <http://foo.example/DB/Addresses>.
    <http://foo.example/DB/Departments/#ID> 23;
    <http://foo.example/DB/Departments/#name> "accounting";
    <http://foo.example/DB/Departments/#city> "Cambridge";
    <http://foo.example/DB/Departments/#manager> 8;
    <http://foo.example/DB/Departments/#ref-manager> <http://foo.example/DB/People/ID=8>
    .

The major difference, compared to the simpler example with foreign keys is the usage of the unique keys. To generate the right object URI in the RDF graph (i.e., http://foo.example/DB/Department/ID=23) the RDF Direct Mapping processor has to find that unique key combination in the target table (Department) and, using that combination, has to establish the subject for that specific row (http://foo.example/DB/Departments/ID=23) which can then be used as an object URL. This can be done because the RDF Direct Mapping processor has access to several tables within the same Relation Database. If these tables are related through foreign and unique candidate keys, it is therefore straightforward to access all the tables in parallel and establish the relationships to generate the triples.

However, this combination cannot be handled in the CSVW Metadata. The situation for CSV files is indeed different: CSV tables are typically single and, potentially, very large files, meaning that a CSV processor cannot be expected to handle several CSV tables in parallel. That is the reason why the Metadata specification does not introduce features that would force to access several tables randomly; this also means that such an RDF mapping cannot be reproduced in CSV2RDF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct mapping vs. CSV2RDF, problematic case with foreign keys

Clone this wiki locally