Warning: This document is for an old version of Rasa. The latest version is 1.3.3.

Training Data Importers

By default, you can use command line arguments to specify where Rasa should look for training data on your disk. Rasa then loads any potential training files and uses them to train your assistant.

If needed, you can also customize how Rasa imports training data. Potential use cases for this might be:

  • using a custom parser to load training data in other formats
  • using different approaches to collect training data (e.g. loading them from different resources)

You can instruct Rasa to load and use your custom importer by adding the section importers to the Rasa configuration file and specifying the importer with its full class path:

importers:
- name: "module.CustomImporter"
  parameter1: "value"
  parameter2: "value2"
- name: "module.AnotherCustomImporter"

The name key is used to determine which importer should be loaded. Any extra parameters are passed as constructor arguments to the loaded importer.

Note

You can specify multiple importers. Rasa will automatically merge their results.

RasaFileImporter (default)

By default Rasa uses the importer RasaFileImporter. If you want to use it on its own, you don’t have to specify anything in your configuration file. If you want to use it together with other importers, add it to your configuration file:

importers:
- name: "RasaFileImporter"

Writing a Custom Importer

If you are writing a custom importer, this importer has to implement the interface of TrainingDataImporter:

from typing import Optional, Text, Dict, List, Union

import rasa
from rasa.core.domain import Domain
from rasa.core.interpreter import RegexInterpreter, NaturalLanguageInterpreter
from rasa.core.training.structures import StoryGraph
from rasa.importers.importer import TrainingDataImporter
from rasa.nlu.training_data import TrainingData


class MyImporter(TrainingDataImporter):
    """Example implementation of a custom importer component."""

    def __init__(
        self,
        config_file: Optional[Text] = None,
        domain_path: Optional[Text] = None,
        training_data_paths: Optional[Union[List[Text], Text]] = None,
        **kwargs: Dict
    ):
        """Constructor of your custom file importer.

        Args:
            config_file: Path to configuration file from command line arguments.
            domain_path: Path to domain file from command line arguments.
            training_data_paths: Path to training files from command line arguments.
            **kwargs: Extra parameters passed through configuration in configuration file.
        """

        pass

    async def get_domain(self) -> Domain:
        path_to_domain_file = self._custom_get_domain_file()
        return Domain.load(path_to_domain_file)

    def _custom_get_domain_file(self) -> Text:
        pass

    async def get_stories(
        self,
        interpreter: "NaturalLanguageInterpreter" = RegexInterpreter(),
        template_variables: Optional[Dict] = None,
        use_e2e: bool = False,
        exclusion_percentage: Optional[int] = None,
    ) -> StoryGraph:
        from rasa.core.training.dsl import StoryFileReader

        path_to_stories = self._custom_get_story_file()
        return await StoryFileReader.read_from_file(path_to_stories, await self.get_domain())

    def _custom_get_story_file(self) -> Text:
        pass

    async def get_config(self) -> Dict:
        path_to_config = self._custom_get_config_file()
        return rasa.utils.io.read_config_file(path_to_config)

    def _custom_get_config_file(self) -> Text:
        pass

    async def get_nlu_data(self, language: Optional[Text] = "en") -> TrainingData:
        from rasa.nlu.training_data import loading

        path_to_nlu_file = self._custom_get_nlu_file()
        return loading.load_data(path_to_nlu_file)

    def _custom_get_nlu_file(self) -> Text:
        pass

TrainingDataImporter

class rasa.importers.importer.TrainingDataImporter

Common interface for different mechanisms to load training data.

get_domain()

Retrieves the domain of the bot.

Returns:Loaded Domain.
Return type:Domain
get_config()

Retrieves the configuration that should be used for the training.

Returns:The configuration as dictionary.
Return type:Dict[~KT, ~VT]
get_nlu_data(language='en')

Retrieves the NLU training data that should be used for training.

Parameters:language – Can be used to only load training data for a certain language.
Returns:Loaded NLU TrainingData.
Return type:TrainingData
get_stories(interpreter=<rasa.core.interpreter.RegexInterpreter object>, template_variables=None, use_e2e=False, exclusion_percentage=None)

Retrieves the stories that should be used for training.

Parameters:
  • interpreter – Interpreter that should be used to parse end to end learning annotations.
  • template_variables – Values of templates that should be replaced while reading the story files.
  • use_e2e – Specifies whether to parse end to end learning annotations.
  • exclusion_percentage – Amount of training data that should be excluded.
Returns:

StoryGraph containing all loaded stories.

Return type:

StoryGraph