Skip to content
/ sparkdq Public

A Spark extension to support Data Quality Assessment of Semantic Triples

Notifications You must be signed in to change notification settings

RaulRC/sparkdq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparkDQ

A Spark extension to support Data Quality Assessment of Semantic Triples

Docs: https://raulrc.github.io/sparkdq/index.html#package

Interlinking assessment for metric:

val graph = loadGraph(sparkSession, inputFile)
val depth = 3
getMeasurementSubgraph(graph.vertices, graph, depth),

Schema Completeness assessment:

val graph = loadGraph(sparkSession, inputFile)
val properties = Seq("Property1", "Property2")
getMeasurementSubgraph(graph.vertices, graph, properties),

Contextual Assessment

val graph = loadGraph(sparkSession, inputFile)
val depth = 3
def setLevels = udf((value: Double)=>{
    if (value <=0.34)
      "BAD"
    else if (value <= 0.67)
      "NORMAL"
    else
      "GOOD"
})
    
val result = applyRuleSet(getMeasurementSubgraph(graph.vertices, graph, depth),
      "measurement", "contextualAssessment", setLevels).toDF()

About

A Spark extension to support Data Quality Assessment of Semantic Triples

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages