ADHD-KG

A Knowledge Graph of Adult ADHD integrating the various medical resources: PubMed, clinical trials, DrugBank, SIDER, MeSH and semantic annotations generated by scispaCy. This document describes the first release of the ADHD-KG.

DOI


Data and ADHD-KG design

Employing semantic web technologies, we build a knowledge graph that integrates several data sources to produce a well structured network of knowledge that allows explration of various aspects of adult ADHD. The following table lists the contents of each folder in the repository and explains how they are reflected as knowledge resources in the ADHD-KG.

Folder name Data Set Format Description
CT Clinical Trials .n3 A set of triples, formatted in RDF, that describe 660 medical studies that were conducted on adults with ADHD. Each study is described with numerous fields ranging from general data such as identifier, description, summary and keywords to study-specific information including disease being studied, associated interventions and drugs being used or links to external biomedical sources crucial for the study.
MeSH Medical Subject Headings .nt An extensive thesaurus of medical concepts. Triples are organized using Simple Knowledge Organization Standard (SKOS) to express hierarchical relation among concepts.
PubMed PubMed Publications .n3 A set of 9537 publications in adult ADHD expressed as triples. Each resource is detailed with basic information including title, authors, publishing venue and date, abstract, PubMed ID, digital object identifier and keywords
SIDER Side Effect Resource .n3 A collection of recorded adverse effects caused by marketed drugs. It includes 1430 drug entries expressed as RDF triples, which include information about naming of drugs, adverse reactions, their recorded frequency and classification, after MeDdra
DrugBank DrugBank .n3 A detailed database of drug data that includes comprehesince drug target information. This database is converted into RDF representation containing 14594 drug entries reflecting the DrugBank built-in schema. For each drug is associated with information such as naming, description, classification (e.g., stimulant), drug interactions and further chemical or pharmaceutical details.
SemanticAnnotations Customly generated .n3 This data set is a product of the integration procedure, especially the semantic annotation of free medical text using scispaCy. It contains links between free text found in titles and abstracts of PubMed publications and Clinical trials with medical concepts introduced in MeSH. The underlying schema of this data set, includes resources of PubMed or Clinical Trial resources connected with Semanti Annotation instances, which in turn are described by the actual span in the text and the reference to a MeSH concept.

Semantic links are introduced to connect the datasets described in the Table above, resulting in the architectural design shown below: architecturalDesign

How to set up the ADHD-KG

A local instance of the ADHD-KG can be set through any data management system speclized in RDF data (triplestore). Our experiments were conducted using GraphDB platform, which is a knowledge management system specialized in storing, representing and querying RDF data. Below, we provide a step by step guide on how to build a local copy of the ADHD-KG resource using GraphDB.

  1. Download and Install GraphDB
  2. Import the contents of each folder in a named graph following the naming convention shown below, where $base$ stands for the base URI for each named graph.
Folder name Named Graph
CT $base$/ctrials
MeSH $base$/mesh
sider $base$/sider
DrugBank $base$/dbcomplete
PubMed $base$/pubmed
SemmanticAnnotations $base$/semAn
  1. Done! You can now issue queries using the SPARQL Endpoint

Querying the Knowledge Graph

Using the build-in SPARQL endpoint it is possible to issue queries against the ADHD-KG. In the example below, we investigate the case where we are interested in retrieving the most frequent comorbid mental disorders found in ADHD literature.

step 1: prefixes & namespaces

Each data resource included in the ADHD-KG conforms to a custom schema specified by the data provider. We maintain the original format of the individual resources by reusing predefined namespaces or introducing new ones associated to the data provider. Furthermore, information related to data integration is concentrated into separate namespaces. Finally, for convenience we introduce a friendly name for the base URI of the named graph.

# RDF and XML namespaces
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# external knowledge resoureces
# pubmed namespace
PREFIX pm: <https://pubmed.ncbi.nlm.nih.gov/>

# MeSH namespace
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
PREFIX mesh: <http://id.nlm.nih.gov/mesh/2021/>

# DrugBank namespace
PREFIX db: <http://www.drugbank.ca/>

# Clinical Trials namespace
PREFIX ct: <https://clinicaltrials.gov/>

# SIDER namespace
PREFIX si: <http://sideeffects.embl.de/>

# custom namespace - includes triples that connect the various datasources listed above
# direct pattern- concept- based links
PREFIX adhd: <http://example.com/>

# text-based links
PREFIX sa: <http://example.com/semantic_annotations/>

# helper namespace - !! replace it with the base used while creating the named graphs
PREFIX graph: $base$

step 2: the actual query

ADHD-KG is organized in named graphs, consequently, queries are conducted through quad pattern matching. In particular, pattern matching occurs against the named graphs and the triples that populate them, as shown in the code snippet below.

select ?label (count(distinct ?research) as ?count)
where{
    
    # fetch all instances of mental disorders  - by searching the leaves of Mesh vocabulary under the Mental Disorders branch
    graph graph:mesh {
        ?family rdfs:label "Mental Disorders"@en;
		rdf:type meshv:TopicalDescriptor.
        
        ?mesh meshv:broaderDescriptor+ ?family;
              rdfs:label ?label.
        
        Filter (?mesh != mesh:D001289) # exclude ADHD
        Filter not exists {?meshD meshv:broaderDescriptor ?mesh} # exclude families of disorders
    }
    
    
    # find any publication with explicit or implicit (semantic annotation) reference to the retrieved mental disorders
    # adhd:refersTo associates resources with MeSH concepts
    
    { # implicit
        graph graph:pubmed {
            ?research rdf:type mesh:D011642. # resource is a publication
        }
        
        graph graph:semAn {
                ?research adhd:hasAnnotation ?sa.
                ?sa adhd:refersTo ?mesh;
         } 
    } 
    UNION
    { # explicit
        graph graph:pubmed {
            ?research adhd:refersTo ?mesh. 
        }
    }

}

# format the result into an aggregation
groupby ?label
having (?count > 50)
orderby DESC(?count)

The result of this query is visualized in the figure below. comorbidities