Your browser doesn't support the features required by impress.mod.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

8.7.2019 | Utrecht, NL

Digital Humanities 2019 “Complexities”

XML2RDF

Extracting RDF statements from XML resources with XTriples

Slides: https://digicademy.github.io/ws-dh2019-xml2rdf/ | bit.ly/2Nrnpwi
Repo: https://github.com/digicademy/ws-dh2019-xml2rdf | bit.ly/2JgPGjW

Max Grüntgens | Thomas Kollatz @kol_t | @_epidat | @digicademy | Twitter digicademy | CC-BY 4.0

Abstract

The tutorial focuses on scholars, who want to acquaint themselves with a low threshold — generic, simple yet powerful, and explorative — workflow of modelling, extracting, processing (queries, visualizations), and publishing of structured data (LOD, Semantic Web) from heterogeneous XML-sources by means of XPath. The instructors assume basic acquaintance with XML and XPath (both will be quickly revised, cheat sheets will be provided).

Focal Points

Triple-Statement-Extraction from heterogeneous XML-Resources, Basic-Research-Workflow (Statement-Formulation, Statement-Extraction, Triple-Store-Import, SPARQL-Querying, Visualization).

Workshop Setup

Webservice
XTriples
XML-Editor
oXygen-XML-Editor (trial license)
XTriples-Framework (for oXygen-Author-Mode)
XTriples-Config-DTA for oXygen
Data
Der-Sturm-APIs (German)
Triple Store
XML2RDFDH2019
Viz Service
Rawgraphs.io

Table of Contents

  1. A Shamelessly Short Introduction to RDF
  2. XPath Recap
  3. XTriples Configuration
  4. RDF Extraction
  5. Querying with SPARQL
  6. Visualizing with Dariah GeoBrowser & Rawgraphs.io

01

A Shamelessly Short Introduction to RDF

01 02 03 04

Website: https://maurizzzio.github.io/greuler/#/

Important takeaways

😇
(At least try to)
Keep it simple
(at first)
!

02

XPath Recap

Important takeaways

Short exercise

Please download and open the file sturm_persons.xml. And query
  1. all xml:ids of persons,
  2. all persons that have a @source attribute,
  3. all persons that have a @source attribute with a GND-URL as value.

Please check the XPath handout & use the XPath functions.

03

XTriples Configuration

XTriple configuration sections

Preliminary checklist

  1. In what kind of structure & format is my data available? Do I have to do conversion tasks beforehand?
  2. Are all important structures & entities in my data addressable by unique ids?
  3. Are there any webservices I want to use to augment my data? Do they provide responses in XML?
  4. What entities are perceivable in my data? What relationships are perceivable between these entities?
  5. What are my research questions?

Workshop data checklist

  1. All our data is structured in TEI-XML & available via categorized web APIs.
  2. All entities (letters, persons, places, works) are referenced with project unique XML-IDs.
  3. Some entities have norm data entries, e.g. GND-numbers, thus we may use the Lobid-GND-API to augment it.
  4. As above the main entities available in our data are letters, persons, places, works.
  5. We are f.e. interested in the persons, places, and works mentioned in letters.

Whiteboard friendly prototyping

Drawing all entities and their relationships as circles and arrows representing nodes and edges may help to conceptualize the abstract network of relationships we want to extract from our sources more clearly and subsequently project on an common abstract metadata layer.

04

RDF Extraction

First Extraction

Please download and open the file sturm_config_works_xtriples.xml. Use the file to extract the first triples. To do this set the output format to XTriples (for debugging).

Further Extractions

Use the XTriples config files in the repository and extract the RDF statements.

Upload to the RDF4j Triple Store

Upload the RDF files to the Triple Store.

05

Querying with SPARQL

Basic SPARQL


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT  ?letter ?letterlabel  
WHERE
{
  	?letter rdfs:label ?letterlabel .
}
            

Basic SPARQL Constructs


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sturm: <https://xtriples.lod.academy/sturm/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 

SELECT  ?letter ?letterlabel ?mentiontypelabel ?gender
WHERE
{
    ?letter sturm:mentions ?mention .
    ?letter rdfs:label ?letterlabel .
    ?mention rdf:type ?mentiontype .
    ?mentiontype rdfs:label ?mentiontypelabel .
    
    Optional {
    	?mention rdf:type foaf:Person .
    	?mention foaf:gender ?gender .
    }
  
}
            

Basic SPARQL Functions


PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix sturm: <https://xtriples.lod.academy/sturm/>

SELECT ?letterlabel ?desc ?long ?lat ?place
WHERE
{
    ?letter rdfs:label ?letterlabel .
  	…
 
    OPTIONAL {
        ?sender sturm:sends ?letter .
        ?letter rdfs:label ?letterlabel .
        ?letter rdf:type ?lettertype .
        ?lettertype rdfs:label ?lettertypelabel .
        ?sender rdfs:label ?sendername .
        BIND(concat(str(?lettertypelabel), 
            " ", 
            str(?letterlabel),
            " sent from ", 
            str(?sendername)) as ?desc) .
    }
  …
}
            

Query the Triple Store based on the SPARQL Querys

06

Visualizing with Dariah GeoBrowser & Rawgraphs.io

Visualization Services

Dariah GeoBrowser

Rawgraphs.io

F I N I S

Thank you

Literature