Skip to content

ruby-rdf/rdf-rdfa

Repository files navigation

RDF::RDFa reader/writer

RDFa parser for RDF.rb.

Gem Version Build Status Coverage Status Gitter chat

DESCRIPTION

RDF::RDFa is an RDFa reader and writer for Ruby using the RDF.rb library suite.

FEATURES

RDF::RDFa parses RDFa into statements or triples.

  • Fully compliant RDFa 1.1 parser.
  • Template-based Writer to generate XHTML+RDFa.
    • Writer uses user-replaceable Haml -based templates to generate RDFa.
  • If available, uses Nokogiri for parsing HTML/SVG, falls back to REXML otherwise.

Install with gem install rdf-rdfa

Pure Ruby

In order to run as pure ruby (not requiring any C modules), this gem does not directly depend on Nokogiri and falls back to using REXML. As REXML is not really an HTML parsing library, the results will only be useful if the HTML is well-formed. For best performance, install the Nokogiri gem as well.

Important changes from previous versions

RDFa is an evolving standard, undergoing some substantial recent changes partly due to perceived competition with Microdata. As a result, the RDF Webapps working group is currently looking at changes in the processing model for RDFa. These changes are now being tracked in {RDF::RDFa::Reader}:

RDFa 1.1 Lite

This version fully supports the limited syntax of RDFa Lite 1.1. This includes the ability to use @property exclusively.

Vocabulary Expansion

One of the issues with vocabularies was that they discourage re-use of existing vocabularies when terms from several vocabularies are used at the same time. As it is common (encouraged) for RDF vocabularies to form sub-class and/or sub-property relationships with well defined vocabularies, the RDFa vocabulary expansion mechanism takes advantage of this.

As an optional part of RDFa processing, an RDFa processor will perform limited OWL 2 RL Profile entailment, specifically rules prp-eqp1, prp-eqp2, cax-sco, cax-eqc1, and cax-eqc2. This causes sub-classes and sub-properties of type and property IRIs to be added to the output graph.

{RDF::RDFa::Reader} implements this using the #expand method, which looks for rdfa:usesVocabulary properties within the output graph and performs such expansion. See an example in the usage section.

Experimental support for rdfa:copy template expansion

RDFa 1.1 is just about an exact super-set of microdata, except for microdata's @itemref feature. Experimental support is added for rdfa:copy and rdfa:Pattern to get a similar effect using expansion. To use this, reference another resource using rdfa:copy. If that resource has the type rdfa:Pattern, the properties defined there will be added to the resource containing the rdfa:copy, and the pattern and rdfa:copy will be removed from the output.

For example, consider the following:

<div>
  <div typeof="schema:Person">
    <link property="rdfa:copy" resource="_:a"/>
  </div>
  <p resource="_:a" typeof="rdfa:Pattern">Name: <span property="schema:name">Amanda</span></p>
</div>

if run with vocabulary expansion, this will result in the following Turtle:

@prefix schema: <http://schema.org/> .
[a schema:Person; schema:name "Amanda"] .

RDF Collections (lists)

One significant RDF feature missing from RDFa was support for ordered collections, or lists. RDF supports this with special properties rdf:first, rdf:rest, and rdf:nil, but other RDF languages have first-class support for this concept. For example, in Turtle, a list can be defined as follows:

[ a schema:MusicPlayList;
  schema:name "Classic Rock Playlist";
  schema:numTracks 5;
  schema:tracks (
    [ a schema:MusicRecording; schema:name "Sweet Home Alabama";       schema:byArtist "Lynard Skynard"]
    [ a schema:MusicRecording; schema:name "Shook you all Night Long"; schema:byArtist "AC/DC"]
    [ a schema:MusicRecording; schema:name "Sharp Dressed Man";        schema:byArtist "ZZ Top"]
    [ a schema:MusicRecording; schema:name "Old Time Rock and Roll";   schema:byArtist "Bob Seger"]
    [ a schema:MusicRecording; schema:name "Hurt So Good";             schema:byArtist "John Cougar"]
  )
]

defines a playlist with an ordered set of tracks. RDFa adds the @inlist attribute, which is used to identify values (object or literal) that are to be placed in a list. The same playlist might be defined in RDFa as follows:

<div vocab="http://schema.org/" typeof="MusicPlaylist">
  <span property="name">Classic Rock Playlist</span>
  <meta property="numTracks" content="5"/>

  <div rel="tracks" inlist="">
    <div typeof="MusicRecording">
      1.<span property="name">Sweet Home Alabama</span> -
      <span property="byArtist">Lynard Skynard</span>
     </div>

    <div typeof="MusicRecording">
      2.<span property="name">Shook you all Night Long</span> -
      <span property="byArtist">AC/DC</span>
    </div>

    <div typeof="MusicRecording">
      3.<span property="name">Sharp Dressed Man</span> -
      <span property="byArtist">ZZ Top</span>
    </div>

    <div typeof="MusicRecording">
      4.<span property="name">Old Time Rock and Roll</span>
      <span property="byArtist">Bob Seger</span>
    </div>

    <div typeof="MusicRecording">
      5.<span property="name">Hurt So Good</span>
      <span property="byArtist">John Cougar</span>
    </div>
  </div>
</div>

This basically does the same thing, but places each track in an rdf:List in the defined order.

Magnetic @about/@typeof

The @typeof attribute has changed; previously, it always created a new subject, either using a resource from @about, @resource and so forth. This has long been a source of errors for people using RDFa. The new rules cause @typeof to bind to a subject if used with @about, otherwise, to an object, if either used alone, or in combination with some other resource attribute (such as @href, @src or @resource).

For example:

<div typeof="foaf:Person" about="https://greggkellogg.net/foaf#me">
  <p property="name">Gregg Kellogg</span>
  <a rel="knows" typeof="foaf:Person" href="https://manu.sporny.org/#this">
    <span property="name">Manu Sporny</span>
  </a>
</div>

results in

<https://greggkellogg.net/foaf#me> a foaf:Person;
  foaf:name "Gregg Kellogg";
  foaf:knows <https://manu.sporny.org/#this> .
<https://manu.sporny.org/#this> a foaf:Person;
  foaf:name "Manu Sporny" .

Note that if the explicit @href is not present, i.e.,

<div typeof="foaf:Person" about="https://greggkellogg.net/foaf#me">
  <p property="name">Gregg Kellogg</span>
  <a href="knows" typeof="foaf:Person">
    <span property="name">Manu Sporny</span>
  </a>
</div>

this results in

<https://greggkellogg.net/foaf#me> a foaf:Person;
  foaf:name "Gregg Kellogg";
  foaf:knows [ 
        a foaf:Person;
        foaf:name "Manu Sporny" 
  ].

Support for embedded RDF/XML

If the document includes embedded RDF/XML, as is the case with many SVG documents, and the RDF::RDFXML gem is installed, the reader will add extracted triples to the default graph.

For example:

<?xml version="1.0" encoding="UTF-8"?>
<svg width="12cm" height="4cm" viewBox="0 0 1200 400"
    xmlns:dc="http://purl.org/dc/terms/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xml:base="http://example.net/"
    xmlns="http://www.w3.org/2000/svg" version="1.2" baseProfile="tiny">
  <desc property="dc:description">A yellow rectangle with sharp corners.</desc>
  <metadata>
    <rdf:RDF>
      <rdf:Description rdf:about="">
        <dc:title>Test 0304</dc:title>
      </rdf:Description>
    </rdf:RDF>
  </metadata>
  <!-- Show outline of canvas using 'rect' element -->
  <rect x="1" y="1" width="1198" height="398"
        fill="none" stroke="blue" stroke-width="2"/>
  <rect x="400" y="100" width="400" height="200"
        fill="yellow" stroke="navy" stroke-width="10"  />
</svg>

generates the following turtle:

@prefix dc: <http://purl.org/dc/terms/> .

<http://example.net/> dc:title "Test 0304" ;
  dc:description "A yellow rectangle with sharp corners." .

Support for embedded N-Triples or Turtle

If the document includes a &lt;script&gt; element having an @type attribute whose value matches that of a loaded RDF reader (text/ntriples and text/turtle are loaded if they are available), the data will be extracted and added to the default graph. For example:

<html>
  <body>
    <script type="text/turtle"><![CDATA[
       @prefix foo:  <http://www.example.com/xyz#> .
       @prefix gr:   <http://purl.org/goodrelations/v1#> .
       @prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
       @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

       foo:myCompany
         a gr:BusinessEntity ;
         rdfs:seeAlso <http://www.example.com/xyz> ;
         gr:hasLegalName "Hepp Industries Ltd."^^xsd:string .
    ]]></script>
  </body>
</html>

generates the following Turtle:

   @prefix foo:  <http://www.example.com/xyz#> .
   @prefix gr:   <http://purl.org/goodrelations/v1#> .
   @prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
   @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

   foo:myCompany
     a gr:BusinessEntity ;
     rdfs:seeAlso <http://www.example.com/xyz> ;
     gr:hasLegalName "Hepp Industries Ltd."^^xsd:string .

Support for Role Attribute

The processor will generate RDF triples consistent with the Role Attr specification.

<div id="heading1" role="heading">
  <p>Some contents that are a header</p>
</div>

generates the following Turtle:

@prefix xhv: <http://www.w3.org/1999/xhtml/vocab#> .
<#heading1> xhv:role xhv:heading.

Support for microdata

The RDFa reader will call out to RDF::Microdata::Reader, if an @itemscope attribute is detected, and the microdata reader is loaded. This avoids a common problem when pages contain both microdata and RDFa, and only one processor is run.

Support for value property

In an RDFA+HTML Errata, it was suggested that the @value attribute could be parsed to obtain a numeric literal; this is consistent with how it's treated in microdata+rdfa. This processor now parses the value of an @value property to determine if it is an xsd:integer, xsd:float, or xsd:double, and uses a plain literal otherwise. The datatype can be overriden using the @datatype attribute.

Usage

Reading RDF data in the RDFa format

graph = RDF::Graph.load("etc/doap.html", format: :rdfa)

Reading RDF data with vocabulary expansion

graph = RDF::Graph.load("etc/doap.html", format: :rdfa, vocab_expansion: true)

or

graph = RDF::RDFa::Reader.open("etc/doap.html").expand

Reading Processor Graph

graph = RDF::Graph.load("etc/doap.html", format: :rdfa, rdfagraph: :processor)

Reading Both Processor and Output Graphs

graph = RDF::Graph.load("etc/doap.html", format: :rdfa, rdfagraph: [:output, :processor])

Writing RDF data using the XHTML+RDFa format

require 'rdf/rdfa'

RDF::RDFa::Writer.open("etc/doap.html") do |writer|
  writer << graph
end

Note that prefixes may be chained between Reader and Writer, so that the Writer will use the same prefix definitions found during parsing:

prefixes = {}
graph = RDF::Graph.load("etc/doap.html", prefixes: prefixes)
puts graph.dump(:rdfa, prefixes: prefixes)

Template-based Writer

The RDFa writer uses Haml templates for code generation. This allows fully customizable RDFa output in a variety of host languages. The default template generates human readable HTML5 output. A minimal template generates HTML, which is not intended for human consumption.

To specify an alternative Haml template, consider the following:

require 'rdf/rdfa'

RDF::RDFa::Writer.buffer(haml: RDF::RDFa::Writer::MIN_HAML) << graph

The template hash defines four Haml templates:

  • doc: Document Template, takes an ordered list of _subject_s and yields each one to be rendered. From {RDF::RDFa::Writer#render_document}:

    {include:RDF::RDFa::Writer#render_document}

    This template takes locals lang, prefix, base, title in addition to subjects to create output similar to the following:

    <!DOCTYPE html>
    <html prefix='xhv: http://www.w3.org/1999/xhtml/vocab#' xmlns='http://www.w3.org/1999/xhtml'>
      <head>
        <base href="http://example/">
        <title>Document Title</title>
      </head>
      <body>
        ...
      </body>
    </html>
    

    Options passed to the Writer are used to supply lang and base locals. prefix is generated based upon prefixes found from the default profiles, as well as those provided by a previous Reader. title is taken from the first top-level subject having an appropriate title property (as defined by the heading_predicates option).

  • subject: Subject Template, take a subject and an ordered list of _predicate_s and yields each predicate to be rendered. From {RDF::RDFa::Writer#render_subject}:

    {include:RDF::RDFa::Writer#render_subject}

    The template takes locals rel and typeof in addition to predicates and subject to create output similar to the following:

    <div resource="http://example/">
      ...
    </div>
    

    Note that if typeof is defined, in this template, it will generate a textual description.

  • property_value: Property Value Template, used for predicates having a single value; takes a predicate, and a single-valued Array of objects. From {RDF::RDFa::Writer#render_property}:

    {include:RDF::RDFa::Writer#render_property}

    In addition to predicate and objects, the template takes inlist to indicate that the property is part of an rdf:List.

    Also, if the predicate is identified as a heading predicate (via :heading_predicates option), it will generate a heading element, and may use the value as the document title.

    Each object is yielded to the calling block, and the result is rendered, unless nil. Otherwise, rendering depends on the type of object. This is useful for recursive document descriptions.

    Creates output similar to the following:

    <div class='property'>
      <span class='label'>
        xhv:alternate
      </span>
      <a property='xhv:alternate' href='http://rdfa.info/feed/'>http://rdfa.info/feed/</a>
    </div>
    

    Note the use of methods defined in {RDF::RDFa::Writer} useful in rendering the output.

  • property_values: Similar to property_value, but for predicates having more than one value. Locals are identical to property_values, but objects is expected to have more than one value. Described further in {RDF::RDFa::Writer#render_property}.

    In this case, and unordered list is used for output. Creates output similar to the following:

    <div class='property'>
      <span class='label'>
        xhv:bookmark
      </span>
      <ul rel='xhv:bookmark'>
        <li>
          <a href='http://rdfa.info/2009/12/12/oreilly-catalog-uses-rdfa/'>
            http://rdfa.info/2009/12/12/oreilly-catalog-uses-rdfa/
          </a>
        </li>
          <a href='http://rdfa.info/2010/05/31/new-rdfa-checker/'>
            http://rdfa.info/2010/05/31/new-rdfa-checker/
          </a>
        </li>
      </ul>
    </div>
    

    If property_values does not exist, repeated values will be replecated using property_value.

  • Type-specific templates. To simplify generation of different output types, the template may contain a elements indexed by a URI. When a subject with an rdf:type matching that URI is found, subsequent Haml definitions will be taken from the associated Hash. For example:

    { document: "...", subject: "...", :property_value => "...", :property_values => "...", RDF::URI("http://schema.org/Person") => { subject: "...", :property_value => "...", :property_values => "...", } }

Dependencies

Documentation

Full documentation available on Rubydoc.info

Principle Classes

  • {RDF::RDFa::Format}
  • {RDF::RDFa::Reader}
    • {RDF::RDFa::Reader::Nokogiri}
    • {RDF::RDFa::Reader::REXML}
  • {RDF::RDFa::Context}
  • {RDF::RDFa::Expansion}
  • {RDF::RDFa::Writer}

TODO

  • Add support for LibXML and REXML bindings, and use the best available
  • Consider a SAX-based parser for improved performance

Resources

Change Log

See Release Notes on GitHub

Author

Contributors

Contributing

This repository uses Git Flow to mange development and release activity. All submissions must be on a feature branch based on the develop branch to ease staging and integration.

  • Do your best to adhere to the existing coding conventions and idioms.
  • Don't use hard tabs, and don't leave trailing whitespace on any line.
  • Do document every method you add using YARD annotations. Read the tutorial or just look at the existing code for examples.
  • Don't touch the .gemspec, VERSION or AUTHORS files. If you need to change them, do so on your private branch only.
  • Do feel free to add yourself to the CREDITS file and the corresponding list in the the README. Alphabetical order applies.
  • Do note that in order for us to merge any non-trivial changes (as a rule of thumb, additions larger than about 15 lines of code), we need an explicit public domain dedication on record from you, which you will be asked to agree to on the first commit to a repo within the organization. Note that the agreement applies to all repos in the Ruby RDF organization.

License

This is free and unencumbered public domain software. For more information, see https://unlicense.org/ or the accompanying UNLICENSE file.

FEEDBACK