July 11th, 2019
Set up a knowledge base to encode domain knowledge for Rasa
This post is the follow-up of "Integrating Rasa with graph databases". In this tutorial, you will learn in detail about knowledge bases and how you can set one up. You will also learn how you can use the data from your knowledge base to improve your NER.
Let's quickly recap what challenges you will be able to solve when using knowledge bases:
- Answer questions of the user that require domain knowledge, such as, "What account do I have more money on?".
- Resolve references to previously mentioned entities, such as "What is the headquarters of the first bank you just mentioned?". You need to recognise the mention of an entity and resolve it to a real-world entity that was used before.
This tutorial will refer to an example implementation of a knowledge-base bot, called banking bot, to demonstrate how to set up a knowledge base. The code for banking bot can be found here.
What is a knowledge base?
A knowledge base can be used to represent domain knowledge. Typically, graph database are used to represent this knowledge. Graph databases store data in form of entities (sometimes also called nodes), attributes, and relations. Representing domain knowledge in this kind of form feels natural. For example, a bank becomes an entity that has attributes like name, headquarters, or whether they offer free accounts or not. Every bank has employees, which corresponds to another entity. So, banks and employees are related to each other. Using graph databases offers you the possibility to construct complex schemas. Because graph database are working with entities, attributes, and relations, you can think in a more object-oriented way in comparison to relational databases. You can even model hierarchies. For example, an employee is a specific type of person. You can assign general attributes to the person entity, such as birthdate, and state that the employee entity "inherits" from the person entity, but has some more specific attributes, such as role.
If you just have a couple of data points to store setting up a graph database might be an overkill. Instead you can also put your domain knowledge in a graph-like data structure, such as a dictionary in python and use that as your knowledge base.
How to set up a graph database?
To set up a graph database as your knowledge base, you need to perform three steps:
Step 1: Decide on a graph database to use
The first thing you have to do is choose a graph database. There are many different databases, such as Grakn, neo4j, OrientDB, GraphDB, each with its own advantages and disadvantages. Choose a graph database depending on the requirements you have. Banking bot uses the open-source graph database Grakn (version 1.5.7). As there is no standard query language for graph databases, almost every graph database has its own language. Grakn uses graql. If you want to use Grakn for your bot, you can find the installation instructions here.
Step 2: Decide on a schema
Before you can store data in your graph database, you need to design your schema. To do this, consider the following:
- What entities do you need?
- What attributes do they have?
- How are they related to each other?
Let's take a look at the banking bot schema. The banking bot has four entities: bank, person, account, and card. Each of them has a couple of attributes. The bank entity has, for example, a name and a headquarters attribute. In the following figure you will find all attributes of the bank entity.
As mentioned before, a graph database can store relations between entities. The bank, for example, is connected to person and account via the relation contract: The bank provides a contract over an account for a customer (person). The following graphs shows this relation.
As you may have noticed, the relation contract also has an attribute. You can assign as many attributes to relations and entities as you want. The complete schema of the banking bot looks like this:
Once you've designed the schema, you need to write it in graql syntax. You can find the complete schema of the banking bot in graql syntax here. To actually create your schema you need to execute the following command in the terminal of your choice
grakn console --keyspace banking --file schema.gql
This will create a keyspace called "banking" in your graph database where the schema defined in "schema.gql" will be created.
Step 3: Load data into your graph database
After you've created your schema, you need to load some data into it. To load data into our schema, Grakn suggest writing a migration script that allows you to upload data from .csv files into your graph database. The banking bot knowledge base consists of seven banks and twenty people. Every person has up to three accounts with up to hundred transactions. You can find the data and the migration script here.
Now your knowledge base is set up and you can start using it with Rasa.
The knowledge base in action
Banking bot uses the above defined graph database to incorporate domain knowledge into the conversation. For more information on how this is done, check out the tutorial "Integrating Rasa with knowledge bases".
To run the banking bot, clone its repository tutorial-knowledge-base. Install all needed requirements via
pip install -r requirements.txt
and train the model using rasa train. Before you can start your bot, you need to set up the knowledge base by following the steps in the README. Afterwards, you can talk to the bot via the command line via
Here is a short conversation with the banking bot showing how the bot can answer a couple of questions regarding its user's accounts and transactions.
Improving your bot by adding lookup tables
The NER sometimes misses an entity if there is little or no context at all. For example, if the bot is asking you
and you just answer with "Max", it can be very hard for the NER to detect that "Max" is actually a person. To improve your NER, you can add lookup tables to your bot. And as you already have a knowledge base with quite a few entities and examples, let's reuse them.
As a first step, you need to get the entities from your knowledge base and write them into a file. There should be one file per entity type. The following query will give you all entities of a specific entity type:
match $x isa <entity_type>; get;
The result should be stored in a .txt file. You can find a small script to extract all entities from your knowledge base here. Feel free to adapt and reuse it.
Once you have your lookup table files, you need to add them to your NLU training data. Assume your .txt file with all banks from your knowledge is located in
data/lookup_tables/bank.txt. You need to add the following lines to your NLU training data in order to make use of the lookup table during training:
## lookup:bank data/lookup_tables/bank.txt
For more information on lookup tables, check our documentation. Please note that for lookup tables to be effective, there must be a few examples of matches in your training data. Otherwise the model will not learn to use the lookup table match features. So you should, for example, add "Max" to your training data instead of his full name and also have an entry for "Max" in the lookup table for the entity person.
After you've added the lookup tables to your training data, you need to retrain your model. In addition, if you update lookup tables with new data, you need to retrain your NLU model to actually make use of the newly added data.
If you have any questions about this tutorial or this repository, feel free to share them on Rasa Community Forum.