Is The Enterprise Knowledge Graph Finally Going To Make All Data Usable?

Evolved Media

When we ask Siri, Alexa or Google Home a question, we often get alarmingly relevant answers. Why? And more importantly, why don’t we get the same quality of answers and smooth experience in our businesses where the stakes are so much higher?

The answer is that these services are all powered by extensive knowledge graphs that allow the questions to be mapped to an organized set of information that can often provide the answer we want.

Is it impossible for anyone but the big tech companies to organize information and deliver a pleasing experience? In my view, the answer is no. The technology to collect and integrate data so we can know more about our businesses is being delivered in different ways by a number of products. Only a few use constructs similar to a knowledge graph.

But one company I have been studying this year, Cambridge Semantics, stands out because it is focused primarily on solving the problems related to creating knowledge graphs that work in businesses. Cambridge Semantics technology is powered by AnzoGraph, its highly scalable graph database, and uses semantic standards, but the most interesting thing to me is how the company has assembled all the elements needed to create a knowledge graph factory. Because in business we are going to need many knowledge graphs that can be maintained and evolved in an orderly manner.

By studying how Cambridge Semantics has productized a solution to this problem, we can understand the range of challenges to making data inside a company accessible in much richer ways that can power business decisions and new kinds of applications.

The Power of the Knowledge Graph

Siri, Alexa, and Google Home are all powered by different versions of knowledge graphs, which are integrated collections of information about keywords but also contains huge numbers of links between keywords. The keywords represent concepts, objects, things, and people – the nouns of this world. The graph fills in the relationships, the connections between the concepts. From knowing which album a song is part of to understanding how many clinical trials used a certain research paper, a knowledge graph is a semantic web that essentially knows things by capturing all this information and allowing it to support a search.

That’s why you can ask Siri, Alexa or Google Home a question about common topics and get a great answer. Ask Siri who is the quarterback of the New York Jets and you get a list of the current quarterbacks on the roster.

But it is easy to see the limits of these systems. Ask Siri who is Taylor Swift’s boyfriend, and Siri knows. Ask who are the members of the Irish band, The Chieftains, and Siri doesn’t know. Even if you ask about who are the members of the Irish Band U2, it doesn’t know. It can provide you information about Bono, but doesn’t have an idea of the members of U2. But if you ask Google about the members of the Chieftains, you get a great response, a list of all the past and current members.

The reason Google does so much better is that the knowledge graph it has created is so much better.

Knowledge graphs are powerful for a variety of reasons. When they are based on semantic standards, it is possible to relate knowledge to language in an orderly way. In this way, language can provide a way into the graph. You can match words to the concepts represented by the graph.

Knowledge graphs also allow you to create structures for the relationships in the graph. You can tell a graph that parents have children and parents can be children and children can be brothers or sisters, and all of these are people. Providing such descriptive information allows new information to be inferred from the graph such as the fact that if two people have the same parents they must be siblings.

Because they are graphs, knowledge graphs are more intuitive. People don’t think in tables, but they do immediately understand graphs. When you draw the structure of a knowledge graph on a whiteboard, it is obvious what it means to most people.

In addition, graphs are great at capturing sparse, incomplete, and messy information, just like our brains.

So it is not hard to agree that having a knowledge graph of all the data in a company would be a good idea. But how would you get the data in? How would you find it when you need it? How do you manage it over time, allowing it to grow and change? To make knowledge graphs work in business, these questions must have good, complete answers. What we really need is a knowledge graph factory.

Do We Really Need an Enterprise Knowledge Graph in Business?

In the past 20 years, the ways we can organize and use data in business have gotten much better. The data warehouse gave us a canonical model of all the data in the business along with reporting and dashboards. Data discovery tools allow end users to explore much large collections of information.

All of this progress is to the good. But it doesn’t build an architecture that would support the kind of experience that we get from Siri, Alexa, and Google Home. The reason is that the data model in the data warehouse, while an awesome achievement, cannot absorb the huge amount of data that is coming at us. The process of creating relational data models just can’t keep up.

In addition, the extracts of data that are used to power data discovery are also too small. Some of these have expanded the size of the models that can be explored by a dashboard, but you could never run a Siri off them.

Here are the reasons we need a knowledge graph:

The amount of information we need to integrate has dramatically grown.
We don’t just need a model of that data; we need to know about its structure.
The data won’t be well behaved. We will have massive differences in completeness and quality.

Data lakes were one attempt to address this problem. But they succeeded only as a repository for a much larger collection of data. The data integration didn’t happen at a large scale.

The knowledge graph is the only currently implementable and sustainable way for businesses to move to the higher level of integration needed to make data truly useful for a business.

With a knowledge graph:

The model can be as large, wide, and deep as you want.
A knowledge graph uses semantic standards to describe the structure of the information in the graph to support reasoning and inference.
Both high-quality complete data and sparse and incomplete data can be captured and made useable.

It’s one thing to imagine such a knowledge graph. The larger challenge is to create such graphs that work in production and can handle the tests of the real world.

How Would a Knowledge Graph Really Work?

We know from our friends at Apple, Amazon, and Google that it is possible to create knowledge graphs at scale. In essence, each of these companies has created a knowledge graph factory. And we also know from the examples cited above that their graphs are based on user interests. Where users are focused, the graph gets built out.

In the enterprise, we have a different problem. We have various hives of information that are closely linked and may link out with less frequency to other hives. And inside a hive there may be layers of graphs. Here are some examples:

All product information could be in one hive that had groupings for categories of products and varying levels of detail about the products in layers.
All information about customers could be collected in a graph.
All information about purchases could be collected in a graph.
These graphs would link to each other.

This is just a simplified example that we can use to show the complexity of making this work. Here are the challenges.

Loading and Updating the Graph: Loading graphs is not like ETL for relational databases, where data is loaded into tables with a predefined structure. Graphs can have predefined structure for properties of nodes and edges, but loading up a graph means understanding what edges must be created to and from which nodes. This knowledge has to come from a program that knows what to do or from some other metadata and structural information that allow the connections to be made. Updating graphs involves the same sort of calculations. If every graph needs a custom program, it will severely limit how many graphs can be created.

Allowing Powerful Queries and Reasoning: Graph query languages can express complex queries that can sift through information in a precise way. In addition, graphs also can be explored by algorithms and analytics. If you construct a graph properly, it is usually possible to answer questions that would be practically impossible to answer if the data was in an SQL database, or would take a hugely complex query. But on top of that, if you create a graph with semantic standards, the queries can be more powerful. You can use the information about the structure to automate queries, algorithms, and analytics and make them intelligent. This automation depends on tagging the information correctly on the way in, and using a structure for the graph defined by an ontology.

Guiding the User through Asking Questions: One of the reasons that voice and natural language powered systems are so powerful that you just ask a question and get the answer. The knowledge graph is there, but under the covers. When you get an answer, usually you start seeing the structure of the graph because the answer often has links to related information. In a business setting, we want the same thing. Users should be able to ask questions in simple ways and then be guided through a process that recommends and suggests new ways to go. When a useful context for exploring information is defined, it should be supported by more advanced dashboards and applications to further enhance productivity.

Scaling: The knowledge graph and the process to create, load, update, and query it must be scalable. It doesn’t make sense to have technical limitations retard the scope and power of a knowledge graph.

So if we had all of this working, we would be able to do the following things:

Identify use cases that could benefit from knowledge graphs, quick create them, and then evolve them to greater maturity.
Present users with a guided process to use the information in the knowledge graph.
Gradually expand the scope of the data integrated in a knowledge graph, allowing links between graphs to make them more powerful.
Create advanced applications and dashboards for high value use cases

We have wanted to expand data integration and deliver data to those who need it for a long time. Knowledge graphs just help us do a better job.

How Cambridge Semantics Approaches this Problem

Cambridge Semantics unlocks the power of the knowledge graph by providing all the elements needed to create a knowledge graph factory. At the center of the factory is the AnzoGraph OLAP database that allows graphs to be stored and explored with queries, algorithms, and analytics.

But the complete factory must solve all the problems mentioned so far. Here’s how Cambridge Semantics does that:

A layered graph architecture: Enterprises need an approach in which all of their data (both structured as well as the enormous volumes of document/text-oriented information) can be integrated and linked into a large coherent knowledge graph made up of harmonized data that originates from many discrete underlying systems (that continue to do their job uninterrupted). Cambridge Semantics calls this the enterprise data fabric. The data fabric is made up of layers that can be independently maintained and prepared for inclusion in the graph. With this structure, it is possible to have a detailed layer of information at the bottom and then layers that aggregate and integrate that data above it into forms that are useful for different types of analysis and applications. This layering can go all the way up to very targeted knowledge graphs that are purpose built for high value applications and dashboards.

A robust strategy for loading and updating the graph: Cambridge Semantics uses semantic standards to describe the information that is entered into the knowledge graph. In a relational database, the ETL process moves the data into a table structure. In Cambridge Semantics’ knowledge graph factory, the ETL process copies data from each system and puts it into the format for semantics standards that capture the structure of the information, which is recorded in an ontology, and then consistently tag it so that the AnzoGraph OLAP database can create connections when it is loaded into the graph. In this way, data from any source system can be made available to the knowledge graph. When new data arrives, it can flow into the graph in an orderly way without having to create a complex custom program.

Use of semantic standards: Cambridge Semantics allows you to future-proof your data by using the same standards to encode meaning in o it (to make the data self-describing), so that someone or some computer (yes, the standards help to make the data machine-readable too) coming across it down the tracks can understand both the context and the meaning of the data and make use of it, either alone or in combination with other data described using the same standards. Once data has been described using semantics standards, the task of integrating it with additional datasets is easier and can often be done automatically. In addition the structure of the data that is described in the ontology makes queries, algorithms, and analytics more powerful.

Guided query experience for end users: Querying graphs is complicated and so is the language used to do so. Luckily much of this can be automated so that users do not need to understand either. This is done through a combination of using the ontological models (that provide the meaning for the data that is stored and how it is all connected) and automated query generation that relieves the human of having to understand the mechanics of formulating a query. It is on top of this that even more conversational systems can be built (like Alexa) that abstract the user even further away from the graph query to something more natural for people (like conversation).

Scaling: It's no use if a graph cannot both store all the data you need and then let you retrieve it very quickly. The AnzoGraph OLAP database is highly scalable and is designed to support both chatty conversational workloads and high volume batch-style analytics.

With this approach, Cambridge Semantics not only provides a knowledge graph but a knowledge graph factory. In an important way, this capability achieves the goals for data integration and ability to explore that we have been seeking in the data warehouse and through data discovery tools. But it also achieves the goal of the data lake of capturing and making accessible a wide swath of information.

In my opinion, the ability to put graph technology to work in effective ways will be a differentiating skill in the next 5 to 10 years. Cambridge Semantics has productized a powerful approach to creating knowledge graphs, which will be a huge benefit to those who decide to adopt graph technology at scale.

[Disclosure: Dan Woods has worked with Cambridge Semantics on messaging and content creation projects.]

Follow me on Twitter or LinkedIn. Check out my website.

More From Forbes

Is The Enterprise Knowledge Graph Finally Going To Make All Data Usable?