Skip to content

Direct mapping vs. CSV2RDF, problematic case with foreign keys

Ivan Herman edited this page Apr 25, 2015 · 6 revisions

Consider three simple tables, called People, Addresses, and Departments respectively (see the original example in the Direct Mapping document):

ID fname addr deptName deptCity
7 Bob 18 accounting Cambridge
ID city state
18 Cambridge MA
ID name city
23 accounting Cambridge

A relational table always has a schema, that the RDF Direct Mapping makes use of. In this example, the schema defines that:

  • For the People table
    • ID is a primary key in the table, and the cells are integers
    • The name column contains strings
    • The addr column contains integers
  • For the Addresses table
    • ID is primary key in the table, and the cells are integers
    • both the city and state columns contain strings
  • For the Departments table
    • ID is primary key in the table, and the cells are integers
    • both the city and state columns contain strings
    • both the name and city columns are strings
  • The addr column in the People table contains foreign keys that references the ID field of the Addresses table.
  • The deptName and deptCity in the People table contain joint foreign keys referencing candidate keys for the name and city pair of the Department table as unique keys

The generated RDF Graph is as follows:

<http://foo.example/DB/People/ID=7> rdf:type <http://foo.example/DB/People>;
    <http://foo.example/DB/People/#ID> 7;
    <http://foo.example/DB/People/#fname> "Bob";
    <http://foo.example/DB/People/#addr> 18;
    <http://foo.example/DB/People/#ref-addr> <http://foo.example/DB/Addresses/ID=18>;
    <http://foo.example/DB/People/#deptName> "accounting";
    <http://foo.example/DB/People/#deptCity> "Cambridge";
    <http://foo.example/DB/People/#ref-deptName;deptCity> <http://foo.example/DB/Department/ID=23>;
    .

<http://foo.example/DB/Addresses/ID=18> rdf:type <http://foo.example/DB/Addresses>.
    <http://foo.example/DB/Addresses/#ID> 18;
    <http://foo.example/DB/Addresses/#city> "Cambridge";
    <http://foo.example/DB/Addresses/#addr> "MA"
    .

<http://foo.example/DB/Departments/ID=23> rdf:type <http://foo.example/DB/Addresses>.
    <http://foo.example/DB/Departments/#ID> 23;
    <http://foo.example/DB/Departments/#name> "accounting";
    <http://foo.example/DB/Departments/#city> "Cambridge";
    <http://foo.example/DB/Departments/#manager> 8;
    <http://foo.example/DB/Departments/#ref-manager> <http://foo.example/DB/People/ID=8>
    .

The major difference, compared to the simpler example with foreign keys is the usage of the unique keys. To generate the right object URI in the RDF graph (i.e., http://foo.example/DB/Department/ID=23) the RDF Direct Mapping processor has to find that unique key combination in the target table (Department) and, using that combination, has to establish the subject for that specific row (http://foo.example/DB/Departments/ID=23) which can then be used as an object URL. This can be done because the RDF Direct Mapping processor has access to several tables within the same Relation Database. If these tables are related through foreign and unique candidate keys, it is therefore straightforward to access all the tables in parallel and establish the relationships to generate the triples.

However, this combination cannot be handled in the CSVW Metadata. The situation for CSV files is indeed different: CSV tables are typically single and, potentially, very large files, meaning that a CSV processor cannot be expected to handle several CSV tables in parallel. That is the reason why the Metadata specification does not introduce features that would force to access several tables randomly; this also means that such an RDF mapping cannot be reproduced in CSV2RDF.