Research at Rasa
Our research team enables developers to build conversational AI that couldn't be built today.
We make Rasa the best tool developers could use to build conversational AI. We engage and share ideas with the broader research community. And we attract the best people in the field to come work with us.
We've added support for our model pipelines to support incremental training. This allows you to fine-tune an existing model after adding new training examples instead of training a new model from scratch. It's an ongoing area of research on how to do this effectively, but our first results look promising
"Whatlies" in Word Embeddings
Whatlies is an open source toolkit for visually inspecting word and sentence embeddings. The project offers a unified and extensible API with current support for a range of popular embedding backends including spaCy, tfhub, huggingface transformers, gensim, fastText and BytePair embeddings.
Dual Intent and Entity Transformer (DIET)
DIET is a new state of the art NLU architecture that jointly predicts intents and entities. It outperforms fine-tuning BERT and is 6x faster to train. You can use DIET together with BERT and other pre-trained language models in a plug-and-play fashion.Explainer Video
What's the best data structure for dialogue memory - a stack? a graph? a flat list? Self-attention gives you great flexibility without a complex memory structure.Read the paper
Transfer Learning Across Dialogue Tasks
You've built an assistant and it can already help users with a few things. Now you're adding new functionality. How can your assistant re-use the dialogue elements it already knows about in this new context?
Compressing Transformer Language Models
Large-scale language models like BERT, GPT-2, and XLNet show excellent performance on a number of NLU tasks but are very resource intensive. Can we compress these models to get something that's almost as accurate but much faster?Read about Quantizing BERT
Entity Resolution using Knowledge Bases
Combining a dialogue system with a knowledge base allows developers to encode domain knowledge in a scalable way and integrate it with statistical NLU and dialogue models. It also helps your assistant understand messages like the second one or which of those is cheaper? .
Supervised Word Embeddings
Pre-trained word embeddings like word2vec and GloVe are a great way to build a simple text classifier. But learning supervised embeddings for your specific task helps you deal with jargon and out-of-vocabulary words. This is now our default intent classification model.
Mixing Single and Multi-turn Dialogue
Dialogue elements like small talk and FAQs are single-turn interactions. New retrieval-based models in Rasa can handle all of these simple responses in a single action. This means your dialogue policy becomes much simpler and you need fewer training stories.Read the blog post
Most language models and word embeddings are trained on prose and don't know anything about the rules of conversation. How can we build embeddings that understand the difference between purposeful dialogue and chit-chat, and can detect non-sequiturs?
Talks and Meetups
We regularly host external speakers at our #botsBerlin meetup to talk about their research.
Excited? We're hiring!
Felicia has a master’s degree in Computer Science from Johns Hopkins University, with the human language technology concentration from the Center for Language and Speech Processing.
Before Rasa, her research was mostly focused on parallel corpus filtering for the purposes of machine translation. Through this research, she gained experience in a wide variety of natural language processing and machine learning techniques.
Alan is co-founder and CTO of Rasa, an open source company providing the tools required to build better, more resilient contextual assistants. Rasa has raised $40m from top venture firms including a16z and Accel. Prior to Rasa, Alan co-founded a productivity startup funded by Techstars. He holds a PhD in machine learning from the University of Cambridge.
Tanja studied Software Engineering with a focus on NLP. Her master thesis dealt with the question of how to extract relations between entities from German text.
She is one of the early contributors of the NLP framework Flair. At Rasa, Tanja currently focuses on natural language understanding, in particular how to leverage the data from knowledge base in a conversation.
Daksh holds a Masters in Data Science from IIIT, Bangalore and has worked on diverse research problems from NLP and Computer Vision in the past. His current research includes representation learning in conversational AI and better interpretable models in deep learning.
Johannes studied physics and mathematics, covering a wide variety of subjects from particle physics and general relativity to ocean modeling. During his PhD he became interested in machine learning and started a blog about it before joining our team at Rasa as ML Researcher in June 2019.
His current research is focused on transfer learning across dialogue tasks.
Sam has a Master's in Informatics from the University of Edinburgh, where he focused on machine learning, cognitive sciences and natural language. Previously, Sam did an internship and further academic collaboration with Rasa in accelerating large language models.
Thomas has a PhD in Computational Linguistics from the University of Sussex and subsequently post-docced at the NLP group at the University of Edinburgh before joining Rasa.
His main research interests are in the area of lexical semantics, specifically distributional semantics, distributional composition and entailment.
Adam joined Rasa from the University of Edinburgh, where he continues to advise PhD students. His research interests are broad, and he has worked on many basic algorithmic, scientific, and mathematical problems in natural language processing, publishing more than sixty papers in the field.
He has developed and taught several advanced courses in natural language processing during his time as faculty at Edinburgh and at Johns Hopkins University.
Aciel received her PhD from the University of Edinburgh working on conversational modelling for dialog systems. Subsequently she moved to developing and deploying statistical models in commercial biomedical systems, and then a postdoc automating speech and language therapy from acoustic and articulatory imaging data. At Rasa, Aciel is excited to return to conversational modelling while drawing on her past academic and industry experience in machine learning.
Chris Kedzie is a researcher in the areas of natural language processing, natural language generation, and machine learning. His research interests include neural network models of language generation, with a focus on building controllable and semantically accurate language generators as well as modelling useful inductive biases for learning from language data.
He holds a Ph.D. in computer science from Columbia University. Before turning to computer science, he studied classical guitar and music composition at Loyola Marymount University in Los Angeles, and occasionally, although increasingly rarely, makes music.
Saba is a Machine Learning research intern at Rasa. He is currently completing his undergrad studies in Computer Science at Free University of Tbilisi. At Rasa he'll be working on making fine-tuning procedures faster and more efficient.
Kathrin did her PhD in theoretical machine learning with a focus on clustering algorithms. Before joining Rasa, she developed machine learning solutions for the health insurance industry. In particular, she created custom deep learning models for NLP problems as well as clinical use cases using large-scale health and insurance data.
We’re a partner of the UKRI Centre for Doctoral Training in Natural Language Processing at the University of Edinburgh. We are also sponsors of SigDIAL 2020.
Every year we supervise a few MSc students and take on some interns to work on research. If you are using Rasa in a course, get in touch and we can share materials for use in lectures and group projects.