Skip to content

May 24th, 2018

How to build a GDPR compliant chatbot or voice assistant

  • portrait of Philipp Wolf

    Philipp Wolf

TL;DR: At Rasa, many Fortune 500 companies use our machine learning toolkits to expand bots and assistants beyond answering simple questions. The General Data Protection Regulation (GDPR) is going into effect and everyone needs to comply with it. But in order to build bots and conversational AI you need data. This data comes mostly from your customers and can be sensitive. Using open-source software lowers the risks of third-party cloud services processing data you don't have control of.

The EU data regulation

Everyone is talking about it: starting the 25th of May 2018, the new EU data protection regulation, GDPR, will be enforceable. So you and your company can get sky-high fines of up to 20M € if you don't comply.

This regulation will completely change how customer data is treated. It empowers users by giving them more rights about their personal data. Why? Because privacy is the key concern of the EU and this data regulation takes it to the next level.

This affects everyone

Data breaches like the recent one at Facebook has led Europe to strengthen the privacy rights of individuals. This should recover the consumers' trust in how corporations deal with their personal data. One thing is clear: the protection of customer data becomes more important in the EU and everyone needs to care about it.

Even if your company is based outside of Europe, as long as you store or process data from EU citizens, you have to comply with the regulation. So almost all companies are affected by the GDPR!

No data = no bot

Bots are all about data. If you want to build good conversational software, you need to use Natural Language Understanding (NLU) and dialogue systems. The underlying machine learning algorithms need training data in order to hold conversations. Collecting this data is necessary to train the models and the more data you have the better the bot performs.

You also want to create a personalised user experience with your dialogue. More information about your customer leads to a better targeted interaction. For example, a weather bot would need the location of the user to give the right weather forecast. Otherwise, it wouldn't make sense. Personal data becomes essential to determine context of the conversation. Most bots connect to the CRM or any backend system which let your users have useful conversations beyond asking simple FAQs.


Data is essential - but what can you do in order to reduce the risk of data breaches and be in charge of that data?

Don't outsource your customer data!

At Rasa, more and more enterprises approach us because they are concerned about the new regulation and how it will affect chatbots.

There are two important roles defined in the GDPR that affect you as a company and the chatbot you build. Firstly, the data controller and secondly, the data processor:

  • The Data Controller represents the entity which determines the purposes and means of the processing of personal data
  • The Data Processor represents the entity which processes personal data on behalf of the controller

Data controllers are the decision makers about which personal data gets collected, stored and processed - so most companies are considered controllers! These controllers must fully comply with the EU data regulation.

Data processors include all companies that process the personal data on behalf of the controller. This includes internal entities but also external services like cloud providers and third-party companies. So if you use a cloud-based bot platform, this would be your data processor.

One important distinction is that the data controller is held accountable for the activities of the data processor. This means that even if only the data processor is compliant with the regulation, the data controller gets the penalty as well. Data controllers must ensure that data processors also comply with the GDPR.


So how can you avoid being accountable for the third party? The answer is simple: try to reduce the number of data processors and instead own the stack yourself. Using many third-party cloud solutions will send your data out of your company.

Open source is the way forward

Obviously, as a data controller, you have to comply to the GDPR if you collect personal data. But how can you ensure that the bot platform you use is also compliant? You can either build your own framework in-house or use on-premise and private cloud installations so that no data leaves your company. This not only puts you in charge of your customers' data but also lowers the risk of data breaches.

Data privacy used to be important only in a handful of industries like banking, insurance and healthcare - now it concerns everyone. Customer data needs the highest protection and cannot leave the firewalls. Under the new EU data regulation these standards got even higher. That's why you cannot risk to lose control over your data if you send it all to third-party cloud providers. But if you now think "I can just anonymise or encrypt all data" then this could lead in the wrong direction. 1) Anonymising data can be a really difficult task for machines and 2) the data is important in training the algorithms to extract entities and classify intents.

We at Rasa believe that open-source software is the way forward. You can deploy and install it wherever you want - so an on-premise deployment is no problem. In contrast, open source brings the necessary transparency you need in order to deal with personal data. You want to trace exactly where the data is stored and how it is processed. You also want to be in charge of it so that you can comply to the GDPR.

The risk of data breaches and bugs within the code is also reduced because open-source software means you get full insight into the code base. Most proprietary software is held secretive and the software provider does not want to share the code publicly. In addition, the code of open-source software is more secure because it is thoroughly reviewed by the community. Important to mention: you don't need to sacrifice on performance. A paper by TU Munich has shown that Rasa performs just as well as the cloud solutions.


Next steps

It's the very first time an EU regulation on data protection has such a wide impact on companies. That's why it is essential to take this seriously. Customer data becomes more important and companies need to comply with the GDPR. So here are your nexts steps and takeaways:

  • You're responsible for your customers' data
  • Check if the services you use are GDPR-compliant
  • Check out Rasa!

Want to talk about GDPR and data privacy? Get in touch via if you'd like to chat.

Rasa is the leading open source machine learning toolkit that lets developers expand bots beyond answering simple questions.

*Disclaimer: This is not legal advice. The GDPR is an important regulation and you should consult your lawyers about this matter and get professional guidance.