Knowledge Graphs and  Machine Learning - ISWC 2018 trip report
From https://www.flickr.com/photos/57632391@N02/5432754689

Knowledge Graphs and Machine Learning - ISWC 2018 trip report

Last week I attended the International Semantic Web Conference (ISWC) in Monterey, CA. My role was to speak at the Ada Lovelace Day Celebration about the work that Elsevier is doing with regards to measuring and fixing the gender gap in research (see more here). The rest of the conference felt like a journey to the past...in a good way :-)

Not only did I have a chance to meet with old friends (who I hadn't seen since 2013) and to make new ones, I met some of my super heroes (whose papers I read and I ideas I agree with but had never met in person) and I revisited some of my own semantic web research from 2008.

(If you know all about the semantic web and want to jump to the machine learning part, feel free to jump ahead to the section titled "Deep Learning and Knowledge Graphs")

The Semantic Web vision is alive and well - and widely adopted in industry

To recapitulate a well trodden topic, the vision of the Semantic Web is that of a web of data that is "meaningful to computers" (as described by Tim Berners Lee, James Hendler and Ora Lassila in The Semantic Web in a Scientific American article). ISWC is a community of researchers and engineers who share that vision; their contributions, in the form of research papers, aim to make that vision a reality. In specific, semantic web researchers' methodology is to create knowledge graphs - data structures where entities are uniquely identified by URL and linked to other entities via triples using a language called RDF - that can be used to infer knowledge/new triples (using rules languages) or to find meaningful relationships in text (or other media) using the knowledge graph as the training set (TimBL's writings on the topic are a fun read). 

Whereas some have declared the semantic web to be dead or dying, what I observed was quite the opposite: that the semantic web is alive and well in industry was reinforced multiple times throughout the conference - several papers in the industry and the healthcare tracks were very good examples of how semantic web technologies are being used to address enterprise problems. Just as important was the panel on enterprise-scale knowledge graphs which highlighted the investments in knowledge graphs from Microsoft, Facebook (e.g. used for stickers on messenger), Ebay (e.g. used for improved product search), Google (e.g. improved search) and IBM (they let their users build their own knowledge graphs) - all of which are using knowledge graphs in their products aimed at improving the user experience by giving better answers to keyword based searches (picture above is the panel that included representatives from each of the 'giants' of technology using knowledge graphs).

A few other example applications presented in the industry/healthcare tracks:

  • Babylon is using a knowledge graph to make medical and healthcare knowledge accessible to everyone. They use inference to map symptoms with the right disease information and use that on a chatbot in a mobile app. Below is an image of an automated interaction from their paper:
  • Montefiore Health Systems, in partnership with Franz Inc and Intel Corp, is using knowledge graphs to identify and flag patients who are at risk in order to help their physicians find the appropriate treatment plans. To the right is a diagram of the knowledge graph from their paper
  • Elsevier presented a data network that provides internal developers access to health data from different systems using Linked Data principles. The paper also discusses some of the challenges and lessons learned in the process, including how to integrate a linked data approach into the development cycle. The slides presented by Paul Groth on this paper are below:


  • NuMedii is using knowledge graphs for finding effective drugs for incurable diseases - namely, by offering domain experts with visuals that can be explored in the search for meaningful relationships and for cohort building. The use case that was used as example highlighted drug discovery for a fibrotic disease (idiopathic pulmonary fibrosis or IPF) for which there is no cure. By mining 700K pubmed abstracts for any disease of fibrotic type, NuMedii was able to identify potential drugs for IPF by discovering validated targets associated with drugs approved for other fibrotic diseases. To the right is an image from their paper.


  • FINRA is using knowledge graphs (and text mining) to capture metadata about millions of documents and help their users find documents that relate to one another by using metadata linkages instead of relevance ranked text search. The knowledge graph helps them improve the effectiveness of regulatory analysis. To the left is a figure from their paper.

Deep Learning and Knowledge Graphs

The value that many engineers see in deep learning applied to knowledge graphs is its potential uses in creating or validating triples using nothing but the other triples in the graph. Classic knowledge representation techniques allow a knowledge engineer to create rules that can be interpreted by a reasoner to infer new or missing triples. For example, the rule "A entities of type person must have a property date of birth" would result in the creation of a date of birth triple for every instance of type person. Usually these rules are expressed via an ontology which allows for the propagation of properties from upper classes to lower classes (here's a wikipedia page about it if you'd like to learn more). Identifying the right set of rules is a time consuming manual process that can be automated with machine learning.

Finding the right representation of the graph to feed the triples into a machine learning algorithm, however, is still an open area of research. A few approaches were presented in the "deep learning" track of ISWC:

  • Vecsigrafo: This approach relies on joint word-concept embeddings. They use swivel to (introduced here) generate embeddings, which relies on a co-occurrence matrix. They differ from swivel because instead of using words as the rows/columns of the matrix, they use higher level lexical terms collected from the knowledge graph. The authors evaluated the method on English-Spanish translation (of words, not sentences, afaik). Their best NN was able to include the correct translation for a lexical entry in the top-5 of nearest neighbors in 78% of cases. In 90% of the cases they found that the top 5 suggested translations were indeed semantically close.
  • Researchers from the University of Mannheim compared methodologies for knowledge graph completion using rule-based approaches (learning rules from statistical regularities) and embedding based approaches (embedding the knowledge graph into a lower dimensional/latent space). They used three datasets for the evaluation: one from wordnet and 2 from freebase. Each triple in the test set has two completion tasks: given the relation and object, complete the subject (?, p, o); given the relation and subject, complete the object (s,p,?). They evaluated RuleN and AMIE as the rule-based approaches and TransE, RESCAL and HolE for the embeddings. Overall, the authors found the rules-based approaches to be more precise. Based on these results, they also built an ensemble method that outperforms all others.
  • Researchers from the University of Zurich used a multi-task approach combining knowledge graphs and document embeddings to improve predictive and analytical tasks - called KADE (intuition from the paper in the figure above). For Kade, the authors created a common embedding space for documents and triples - the aim was to achieve an embedding strategy that could bridge different models (the graph node embedding and the document embedding) without losing the characteristics of the original embeddings. This work has great potential for being able to complete knowledge graphs with more triples extracted from documents since it represents both the triple and the document in the same vector space.

Knowledge Graphs, Semantic Science and Reproducible Research

A workshop worth mentioning is the one that happened around semantic science (SemSci). This topic is particularly interesting because scientific research is the engine for the generation of new knowledge yet the output of knowledge generation is still optimized for human consumption. SemSci is about the vision that will make that scientific knowledge available in a knowledge graph.

The workshop was kicked off by Paul Groth, who made the argument for increased science reproducibility through automation of experimental methods. In his vision, provenance of knowledge should be automatically gathered from the machine and robotic arms programmed to carry out an experiment. Paul's work showed that a lot of the methods used in a lab can be automated via API calls. Paul's slides are here. A related talk was given by Yolanda Gil, who told us about her work on using AI for automated discovery (slide above). Yolanda argued that AI can offer systematic, correct and unbiased approaches to scientific knowledge generation, not to mention a much better reporting of the results of scientific experiments. In the ecosystem that Yolanda presented, the AI comes up with the hypothesis and finds a way to test it - automatically.

Other interesting talks on this topic included:

  • WhyIs: presented by Jim McCusker, allows users to interact with cognitive agents which rely on knowledge, ontology and data (nanopublications) to provide useful answers and explanations. Jim describes his system as a framework for knowledge curation, interaction and inference. A demo is available at https://redrugsdev.tw.rpi.edu/ and the paper is here.
  • Evidence Extraction: Gully Burns presented very cool work around extracting the data that supports molecular interactions from studies that report them. This work used 2K open access papers mentioned in the INTACT database, extracted the images from PDF files (by looking a regions with low density of words) and used the "you only look once (YOLO)" method for subfigure identification. Each subfigure was classified into a subtype with varying levels of accuracy (97% for histology images but 40% accuracy for diagrams).

The Social Semantic Web and Privacy

Tim Berners Lee was at ISWC this year to tell us about Solid as part of a workshop called Decentralizing the Semantic Web, co-organized with Ruben Verborgh and Tobias Kuhn. The goal of the workshop was the creation of intelligent web clients and decentralized applications that make use of knowledge graphs to create value for the user (they accepted 7 paper, listed on the workshop program). The second part of the workshop was about Solid; the idea behind Solid is to support social activities on the web (much like those supported by Facebook and LinkedIn) while allowing the people interacting with each other to still own their own data yet co-exist with and link to others' profiles and posts using the principles of the semantic web, including validation of data shapes. Social apps must ask permission to use attributes of the user data, which means that predatory apps that steal and sell user's social data for a profit won't be as easy to build as they are today. Libraries are currently being built as part of the Solid ecosystem which will allow developers to use Solid with Javascript (either using LDFlex or React). Solid.inrupt.com has more information.

Also on the topic of privacy and consent, Jen Goldbeck gave a very powerful keynote about the increasing awareness and importance of privacy. The key insight from this talk was the need to start thinking about "privacy" as the action of "giving consent". For example, facebook apps requesting access to facebook's user data are asking for user's consent to use the data with a specific purpose. Semantic web techniques allow the aggregation of really large graphs of data about users - from whom consent should be requested. Talking about consent instead of talking about "privacy" will help innovation because it will allow the conversation to be much more focused on specific actions, individual data points and pragmatism.

Querying and Federation

A challenge that seems to be persistent in industry's ability to use semantic web technologies (for master data management, reasoning or other applications) has been the query speed. The query language for the semantic web is SPARQL and data is usually stored in a triple store. For queries that would work well in a relational or document database, those are probably better options than SPARQL. SPARQL based systems are appropriate when there is a need to federate queries to other systems or the data necessary to answer the question is being maintained in multiple places (internal or external to the businesses' firewall). A few advances in this area presented included:

  • Saleem et al presented a new benchmark for federated SPARQL querying that takes into account data metrics, query federation metrics (including complex queries that other systems cannot support) and performance metrics. See this tweet for details on metrics considerations. The authors found that some federated query systems return incomplete query results without letting the user know about this incompleteness. The following federated query engines were compared using the benchmark described: FedX, Splendid, Anapsid and HiBisCus.
  • Janke et al presented a methodology to decide how to best distribute large RDF graphs in multiple compute nodes. Surprisingly, the authors found that balancing the query workload across all compute nodes may be more important for fast query execution than network traffic. This tweet shows some of the details of the work.

Thanks Helena for sharing, an insightful summary on many various aspects - seeing how knowledge graphs are used in several use cases and the added-value for deep learning applied to graphs

Thanks Helena, great summary.

carey glenn butler

Knowledge Representation Beyond Semantics Expressed by Ontology and Taxonomy

5y

Ontology 'blooms'.

Yuri Simione

VP of Global Partnerships & Alliances at Ultipa | I specialize in leveraging Graph Databases and Graph Analytics to drive innovative use cases for companies.

5y

Just add your post the Graph Database Experts group on Linkedin! https://www.linkedin.com/groups/8659061/ Thanks!

Pradeepta Mishra

Co-Founder & Chief Architect at Data Safeguard Inc| Building AI Products for Data Privacy | 40Under40DataScientist (2019 & 2020) | Author 9 Books | 14 Patents | Visiting Faculty AI/ML IITs & IIMs | Mentor | TEDx Speaker

5y

Good starting point!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics