Logo
Home Intro Statistics SPARQL Ontobeep Annotator Tutorial FAQs References Links Contact Acknowledge News

Tutorial: How to Use Ontobee SPARQL to Query RDF triple store?

The Ontobee program is developed based on the SPARQL technology. In this web page, we provide an introduction on the technology, examples of how to use SPARQL to query ontology data stored in our RDF triple store, and some relevent references and web links.

 

Table of Contents

  1. Introduction of SPARQL and RDF triple store
  2. Basic SPARQL query programming skills
    1. Query structure
    2. Common prefixes
    3. How to select?
    4. How to define from <...>?
    5. How to program inside where?
    6. How to use modifers?
    7. Web resources for learning SPARQL programming
  3. SPARQL Examples to Query Ontobee RDF Triple Store
    1. Find all class-containing ontology graphs
    2. Find subclasses of an ontology term
    3. Find all class (or object, annotation, or datatype property) terms of an ontology
    4. Retrieve definition and authors of all classes in an ontology
    5. Retrieve general annotations of an ontology
    6. Query the number of human tRNA genes
    7. Find the number of mouse genes associated with mitochondrial DNA repair
    8. Query on axiom: Find vaccines containing egg protein allergen
    9. Selected papers citing Ontobee SPARQL.
  4. Frequently Asked Questions (FAQs)
  5. References and Web Links

1. Introduction of SPARQL and RDF triple store:

RDF represents Resource Description Framework. RDF is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model (https://www.w3.org/RDF/). The RDF data model makes statements about resources (in particular web resources) expressions, known as triples. RDF triples follow a subject–predicate–object structure. The subject denotes the resource, and the predicate denotes traits or aspects of the resource, and expresses a relationship between the subject and the object.

SPARQL (pronounced "sparkle") is a recursive acronym. It stands for SPARQL Protocol and RDF Query Language. Current version of SPARQL is 1.1 (https://www.w3.org/TR/sparql11-query/). The early version was 1.0 (https://www.w3.org/TR/rdf-sparql-protocol/).

Ontobee uses Hegroup RDF Triple store which is generated using the Virtuoso Open-Source Edition software.

 

2. Basic SPARQL query programming skills:

2.1. Query structure:

A typical query includes the following structure, some parts are optional:

2.2. Common Prefixes:

2.3. How to select?

Similar to its use in SQL, SELECT in SPARQL allows you to define which variables you want values returned and output. Like SQL you can list these individually or use an asterisk (*) to specify values for each variable. E.g.

If you don't want duplicates you can append DISTINCT after SELECT. e.g.,

According to SPARQL 1.1, it is possible to apply mathematical functions to selected variables. The most straightforward of these is COUNT.

Other mathematical functions include SUM, AVG, MAX, MIN.

Note that tThese mathematical functions in the SELECT clause are quite basic and return just a single row of results. The GROUP BY function in the modifier section allows aggregation on a particular subject.

References:

2.4. How to define from <...>?

For Ontobee query, the question is: how to find the ontology graph URI in Ontobee RDF triple store (i.e., Hegroup triple store)?

The general naming pattern for the graph URI is to transform a PURL http://purl.obolibrary.org/obo/$foo.owl (note foo must be all lowercase by OBO conventions) to http://purl.obolibrary.org/obo/merged/uppercase($foo). However, this may not be consistent. Before a formal rule is set up and if the default naming pattern does not work, it would be good to run the Example 1 in http://www.ontobee.org/sparql and then find it out. See more in the related Ontobee-discuss item.

2.5. How to program inside where?

The WHERE clause defines where you want to find values for the variables defined in the SELECT clause.

2.5.1. Triple representations.

The basic unit is a triple, which is made up of three components, a subject, a predicate and an object. Each of the three components can take one of two forms:

These triples can then be concatenated together using a full-stop (.). Eventually, an interconnected graph of nodes joined by relationships is built up.

A semi-colon (;) can be used after each triple to replace the subject for the next triple if it is the same. In this way, you only need to define the predicate and object. e.g.,

?x rdf:type mebase:User ;
     foaf:homepage ?homepage ;
     foaf:mbox ?mbox

When you have the same predicate. you can use a comma (,) to separate each object, e.g.,

. ore:aggregates <workflow181>, <workflow246>

You can use 'a' rather than rdf:type to specify the type of an entity.

2.5.2. Clauses

The UNION clause: return results to match at least one of multiple patterns. Note that the OGG paper (http://ceur-ws.org/Vol-1327/icbo2014_paper_23.pdf) provides a good example (Fig. 7) of using UNION.

The OPTIONAL clause is also often used.

The FILTER clause filters the results based on certain conditions.

Reference:

2.6. How to use modifers?

A solution sequence modifier is one of:

GROUP BY: allow aggregation over one or more properties.

ORDER BY: e.g., ORDER BY DESC(?added), ORDER BY DESC(xsd:nonNegativeInteger(?downloaded))

References:

2.7. Web resources for learning SPARQL programming:

The W3C SPARQL recommendations:

More resources on how to learn SPARQL programming:

 

3. SPARQL Examples to Query Ontobee RDF Triple Store:

This section provides many examples on how to query the Ontobee RDF triple store:

(i). Example #1: Find all class-containing ontology graphs:

This example is aimed to find all ontologies in our RDF triple store. Typically every single ontology includes at least one class. This SPARQL script searches those ontology graphs in the RDF triple store that contains at least one class.

Below is the SPARQL script:


  PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
  PREFIX owl: <http://www.w3.org/2002/07/owl#> 
  SELECT distinct ?graph_uri 
  WHERE 
  {
    GRAPH ?graph_uri { ?s rdf:type owl:Class } .  }
  

This script is shown up in the default screen in the Ontobee SPARQL query website. This is also listd as the first example in the same website. Note that often the rdf and owl prefix definitions can be ignored since they are used as the default by the system.

(ii). Example #2: Find subclasses of an ontology term:

This example is aimed to find all subclasses of an ontology term.

Below is the SPARQL script:


  PREFIX obo-term: <http://purl.obolibrary.org/obo/>
  SELECT DISTINCT ?x ?label
  FROM <http://purl.obolibrary.org/obo/merged/OAE>
  WHERE
  {
    ?x rdfs:subClassOf obo-term:OAE_0000001.
    ?x rdfs:label  ?label. }

This script is listed as the second example provided on the Ontobee SPARQL query website. Note: to make recursive search, we can add the string "option (transitive)" beyond "?x rdfs:subClassOf obo-term:OAE_0000001". This addition will allow the search of the whole branch (not only the direct children) of the term OAE_0000001. To count how many subclasses there are, please refer several of the follow examples.

(iii). Example #3: Find the number of all class (or object, annotation, or datatype property) terms of an ontology:

This example is aimed to find the number of all class terms of an ontology.

Below is the SPARQL script:


  SELECT count(?s) as ?VO_class_count
  FROM <http://purl.obolibrary.org/obo/merged/VO>
  WHERE
  {
    ?s a owl:Class .
    ?s rdfs:label ?label .
  
    FILTER regex( ?s, "VO_" ) }
  

This script is listd as the third example provided on the Ontobee SPARQL query website.

This script can be easily modified to find other types of ontology terms. Here is some instruction:

(iv). Example #4: Retrieve definition and authors of all classes in an ontology:

This example is aimed to retrieve the definitions of all classes that have definitions in an ontology. In this script, the OBO IAO ontology annotation terms IAO_0000115 ("definition") and IAO_0000117 ("author") are used.

Below is the SPARQL script:


  PREFIX obo-term: <http://purl.obolibrary.org/obo/>
  SELECT ?s ?label ?definition ?author
  FROM <http://purl.obolibrary.org/obo/merged/VO>
  WHERE
  {
    ?s a owl:Class .
    ?s rdfs:label ?label .
    ?s obo-term:IAO_0000115 ?definition .
    ?s obo-term:IAO_0000117 ?author . }
  

This script is listd as the fourth example provided on the Ontobee SPARQL query website.

 

(v). Example #5: Retrieve general annotations of an ontology:

This example is aimed to retrieve general annotation descriptions of the ontology. The result will return different types of annotations such as "creator", "description"s, etc.

Below is the SPARQL script:


  PREFIX owl: <http://www.w3.org/2002/07/owl#>
  SELECT DISTINCT ?p, ?o
  FROM <http://purl.obolibrary.org/obo/merged/OAE>
  WHERE{
    ?s a owl:Ontology .
	?s ?p ?o .}
  

This script is listd as the fifth example provided on the Ontobee SPARQL query website.

 

(vi). Example #6: Query the number of human tRNA genes (OGG_2010009606):

This example is aimed to find the number of all class terms under human tRNA gene type (OGG_2010009606) in OGG ontology.

Below is the SPARQL script:


  PREFIX obo-term: <http://purl.obolibrary.org/obo/>
  SELECT count(DISTINCT ?x) as ?count
  FROM <http://purl.obolibrary.org/obo/merged/OGG>
  WHERE {
    ?x rdfs:subClassOf obo-term:OGG_2010009606 .
	?x a owl:Class .}
  

This script is listd as the sixth example provided on the Ontobee SPARQL query website. Note that this example came from the OGG paper: http://ceur-ws.org/Vol-1327/icbo2014_paper_23.pdf.

 

(vii). Example #7: Count the number of mouse genes associated with mitochondrial DNA repair:

This example is aimed to find the number of mouse genes associated with GO 'mitochondrial DNA repair' (GO_0043504). The query fetches the text of the "has GO association" (OGG_0000000029) annotation for each gene from OGG, and then retrieves those genes that have the GO ID "GO_0043504" in their annotation.

Below is the SPARQL script:


  PREFIX obo-term: <http://purl.obolibrary.org/obo/>
  SELECT count(DISTINCT ?s)
  FROM <http://purl.obolibrary.org/obo/merged/OGG-Mm>
  FROM <http://purl.obolibrary.org/obo/merged/GO>
  WHERE {
     #Note: Get OGG-Mm genes associated with GO_0043504
      ?s a owl:Class .
      ?s rdfs:label ?labelogg .
      ?s obo-term:OGG_0000000029 ?annotation .
      FILTER regex(?annotation, "GO_0043504") .}
  

This script is listd as the seventh example provided on the Ontobee SPARQL query website.

The answer of the query is 3. To know exactly what the three genes are, you can change the above code "SELECT count(DISTINCT ?s)" to: "SELECT DISTINCT ?s ?labelogg".

 

(viii). Example #8: Query on axiom: Find vaccines containing egg protein allergen:

The above examples are mostly based on ontology annotation properties. How to perform SPARQL queries based on logical axioms with object properties (or relations)? Here we provide some example.

Below is one such SPARQL script:

  
    PREFIX has_vaccine_allergen: <http://purl.obolibrary.org/obo/VO_0000531>
    PREFIX chicken_egg_protein_allergen: <http://purl.obolibrary.org/obo/VO_0000912>   
    SELECT distinct ?vaccine_label ?vaccine 
    FROM <http://purl.obolibrary.org/obo/merged/VO>
    WHERE {
        ?vaccine rdfs:label ?vaccine_label .
        ?vaccine rdfs:subClassOf ?vaccine_restriction .
        ?vaccine_restriction owl:onProperty has_vaccine_allergen:; owl:someValuesFrom chicken_egg_protein_allergen: .
	}      

This script is listd as the eighth example provided on the Ontobee SPARQL query website. Note that the ?vaccine_restriction is in essence an owl:Restriction. If you add a line of " ?vaccine_restriction a owl:Restriction.", the result will be the same. The answer of the query can be obtained by executing the query on the query website. This SPARQL quwery is the same as Figure 6 of the VICO paper (PMID:27099700).

Note that the following code has the same effect as the top code:

          
    Prefix obo: <http://purl.obolibrary.org/obo/>   
    SELECT distinct ?label ?s
    From <http://purl.obolibrary.org/obo/merged/VO>
    Where {
     ?s rdfs:label ?label .
     ?s rdfs:subClassOf ?s1 .
     ?s1 owl:onProperty obo:VO_0000531; owl:someValuesFrom obo:VO_0000912 .
     }   

The difference between the above two sets of SPARQL codes is that the first one looks easier to read.

How this works? It might be good to see the original VO code for one answer - VO_0000006 (i.e., Afluria).

   <!-- http://purl.obolibrary.org/obo/VO_0000006 --> 
    <owl:Class rdf:about="&obo;VO_0000006"> 
		... ...
        <rdfs:subClassOf> 
            <owl:Restriction>
                <owl:onProperty rdf:resource="&obo;VO_0000531"/> 
                <owl:someValuesFrom rdf:resource="&obo;VO_0000912"/> 
            </owl:Restriction>
        </rdfs:subClassOf> 
       ... ... 
    </owl:Class>     

Clearly, both SPARQL queries references the VO OWL code. It is also noted that the above queries do not use the <owl:Restriction> directly. To use it, we can update the code to the following with the same outcome:

          
    Prefix obo: <http://purl.obolibrary.org/obo/>   
    SELECT distinct ?label ?s
    From <http://purl.obolibrary.org/obo/merged/VO>
    Where {
	  ?s rdfs:label ?label .    
 	  ?s rdfs:subClassOf ?restriction .
  	  ?restriction a owl:Restriction .
  	  ?restriction owl:onProperty obo:VO_0000531 . 
  	  ?restriction owl:someValuesFrom obo:VO_0000912 .   
     } 

Note that the VICO GitHub file: https://github.com/VICO-ontology/VICO/blob/master/docs/developer/SPARQL%20query%20scripts.txt contains more examples.

 

(XI) Selected papers citing Ontobee SPARQL

The following is a list of peer-reviewed articles that reports the usage of Ontobee SPARQL. Each paper usually has a figure of Ontobee SPARQL usage screenshot. These provide various ways of using Ontobee SPARQL for different applications.

 

4. Frequently Asked Questions (FAQs):

 

5. References and Weblinks:

 

History of Document Preparation: