Open Source Natural Language Processing (NLP)

Customizable, open source NLP software for text- and voice-based AI assistants.



Turn human language into structured data

Rasa Open Source provides open source natural language processing to turn messages from your users into intents and entities that chatbots understand. Based on lower-level machine learning libraries like Tensorflow and spaCy, Rasa Open Source provides natural language processing software that’s approachable and as customizable as you need. Get up and running fast with easy to use default configurations, or swap out custom components and fine-tune hyperparameters to get the best possible performance for your dataset.

Rasa Open Source is the most flexible and transparent solution for conversational AI—and open source means you have complete control over building an NLP chatbot that really helps your users.

What is natural language processing?

Natural language processing is a category of machine learning that analyzes freeform text and turns it into structured data. Natural language understanding is a subset of NLP that classifies the intent, or meaning, of text based on the context and content of the message. The difference between NLP and NLU is that natural language understanding goes beyond converting text to its semantic parts and interprets the significance of what the user has said.

Rasa Open source is a robust platform that includes natural language understanding and open source natural language processing. It’s a full toolset for extracting the important keywords, or entities, from user messages, as well as the meaning or intent behind those messages. The output is a standardized, machine-readable version of the user’s message, which is used to determine the chatbot’s next action.

Natural language processing is used anywhere an application needs to take raw user text as input: whether it’s a voice assistant receiving input from speech-to-text software, or a chatbot asking a user to type in their question. Natural language processing is the essential step that turns a string of words into a form that can be interpreted and acted upon by other systems in the application.

Want to see what other developers have built with Rasa? Visit the Rasa Community Showcase.

Why open source NLP?

Rasa Open Source is licensed under the Apache 2.0 license, and the full code for the project is hosted on GitHub. Rasa Open Source is actively maintained by a team of Rasa engineers and machine learning researchers, as well as open source contributors from around the world. This collaboration fosters rapid innovation and software stability through the collective efforts and talents of the community.

Unlike NLP solutions that simply provide an API, Rasa Open Source gives you complete visibility into the underlying systems and machine learning algorithms. NLP APIs can be an unpredictable black box—you can’t be sure why the system returned a certain prediction, and you can’t troubleshoot or adjust the system parameters. Rasa Open Source is completely transparent. You can see the source code, modify the components, and understand why your models behave the way they do.

Open source NLP also offers the most flexible solution for teams building chatbots and AI assistants. The modular architecture and open code base mean you can plug in your own pre-trained models and word embeddings, build custom components, and tune models with precision for your unique data set. Rasa Open Source works out-of-the box with pre-trained models like BERT, HuggingFace Transformers, GPT, spaCy, and more, and you can incorporate custom modules like spell checkers and sentiment analysis.

Leverage the latest state-of-art NLP research

Rasa’s dedicated machine learning Research team brings the latest advancements in natural language processing and conversational AI directly into Rasa Open Source. Working closely with the Rasa product and engineering teams, as well as the community, in-house researchers ensure ideas become product features within months, not years.

The Rasa Research team brings together some of the leading minds in the field of NLP, actively publishing work to academic journals and conferences. The latest areas of research include transformer architectures for intent classification and entity extraction, transfer learning across dialogue tasks, and compressing large language models like BERT and GPT-2. As an open source NLP tool, this work is highly visible and vetted, tested, and improved by the Rasa Community. Open source NLP for any spoken language, any domain Rasa Open Source provides natural language processing that’s trained entirely on your data. This enables you to build models for any language and any domain, and your model can learn to recognize terms that are specific to your industry, like insurance, financial services, or healthcare.

In the insurance industry, a word like “premium” can have a unique meaning that a generic, multi-purpose NLP tool might miss. Rasa Open Source allows you to train your model on your data, to create an assistant that understands the language behind your business. This flexibility also means that you can apply Rasa Open Source to multiple use cases within your organization. You can use the same NLP engine to build an assistant for internal HR tasks and for customer-facing use cases, like consumer banking.

Regional dialects and language support can also present challenges for some off-the-shelf NLP solutions. Rasa’s NLU architecture is completely language-agostic, and has been used to train models in Hindi, Thai, Portuguese, Spanish, Chinese, French, Arabic, and many more. You can build AI chatbots and virtual assistants in any language, or even multiple languages, using a single framework.

Support multiple intents and hierarchical entities

In the real world, user messages can be unpredictable and complex—and a user message can’t always be mapped to a single intent. Rasa Open Source is equipped to handle multiple intents in a single message, reflecting the way users really talk. Consider an example like “Yes, place my order. When will it arrive?” Rasa’s NLU engine can tease apart multiple user goals, so your virtual assistant responds naturally and appropriately, even to complex input.

Rasa’s open source NLP engine also enables developers to define hierarchical entities, via entity roles and groups. This unlocks the ability to model complex transactional conversation flows, like booking a flight or hotel, or transferring money between accounts. Entity roles and groups make it possible to distinguish whether a city is the origin or destination, or whether an account is savings or checking.

Open source NLP tools for complete control of data privacy

Protecting the security and privacy of training data and user messages is one of the most important aspects of building chatbots and voice assistants. Organizations face a web of industry regulations and data requirements, like GDPR and HIPAA, as well as protecting intellectual property and preventing data breaches.

Rasa Open Source deploys on premises or on your own private cloud, and none of your data is ever sent to Rasa. All user messages, especially those that contain sensitive data, remain safe and secure on your own infrastructure. That’s especially important in regulated industries like healthcare, banking and insurance, making Rasa’s open source NLP software the go-to choice for enterprise IT environments.

Built-in NLU model performance testing and training data version control

Rasa’s open source NLP engine comes equipped with model testing capabilities out-of-the-box, so you can be sure that your models are getting more accurate over time, before you deploy to production.

Measure F1 score, model confidence, and compare the performance of different NLU pipeline configurations, to keep your assistant running at peak performance. All NLU tests support integration with industry-standard CI/CD and DevOps tools, to make testing an automated deployment step, consistent with engineering best practices.

The Rasa stack also connects with Git for version control.Treat your training data like code and maintain a record of every update. Easily roll back changes and implement review and testing workflows, for predictable, stable updates to your chatbot or voice assistant.

A conversation-driven approach to natural language processing

Even the best NLP systems are only as good as the training data you feed them. Compared to other tools used for language processing, Rasa emphasises a conversation-driven approach, using insights from user messages to train and teach your model how to improve over time. Rasa’s open source NLP works seamlessly with Rasa X to capture and make sense of conversation data, turn it into training examples, and track improvements to your chatbot’s success rate.

Get started with open source NLP projects, with source code

Try Rasa’s open source NLP software using one of our pre-built starter packs for financial services or IT Helpdesk. Each of these chatbot examples is fully open source, available on GitHub, and ready for you to clone, customize, and extend. Includes NLU training data to get you started, as well as features like context switching, human handoff, and API integrations.

GDPR compliant

Rasa Open Source runs on-premise to keep your customer data secure and consistent with GDPR compliance, maximum data privacy, and security measures.

Customizable

Customize and train language models for domain-specific terms in any language. Modular pipeline allows you to tune models and get higher accuracy with open source NLP.

Open Source

Available via the Apache 2.0 license. Free to use and modify for commercial projects, and maintained by an active community of hundreds of contributors.

Join our fast-growing developer community

Rasa Community is a diverse group of makers and conversational AI enthusiasts.

3m+

Downloads

10k+

Forum Members

450+

Contributors