Skip to main content

PII Management Overview

New in 3.13.0

Rasa Pro provides tools to help you manage personally identifiable information (PII) collected by your assistant.

Rasa Pro provides a comprehensive solution for managing personally identifiable information (PII) collected by your assistant. The PII management capability allows you to:

  • Identify and anonymize PII from slot events, user and bot messages.
  • Configure PII anonymization for specific slots.
  • Manage PII data retention policies in the tracker store.
  • Stream anonymized events to event brokers. Among supported event brokers are Kafka and RabbitMQ.

PII Identification

Rasa Pro adopts a tiered approach to PII identification, which includes:

  1. Slot-based PII identification: The sensitive data is stored in a slot whose name is defined in the privacy YAML config. This enables Rasa Pro to use the existing CALM slot filling mechanisms to identify PII in user messages, bot responses and slot events. To learn more about how to configure PII slots, see the Anonymization Rules section below.

  2. GLiNER PII identification: Optional integration with GLiNER PII model to identify PII in two particular edge cases:

    • end chat user could be specifying some PII in their user message that is not captured by any domain slot.
    • usage of free text slots like problem description or customer notes that could have multiple types of PII To learn more about how to configure GLiNER PII identification, see the GLiNER requirements section below.

PII Anonymization

Rasa Pro supports two types of PII anonymization:

  • redaction: replaces PII plaintext value with a redaction character (default is *) across the full length of the PII value. This anonymization method can be configured to keep N characters of the PII value to the left or right of the redaction character. For example, for a credit card number 1234-5678-9012-3456 and a redaction character * with left=0 and right=4, the anonymized value would be ****-****-****-3456.
  • masking: replaces PII plaintext value with the uppercase slot or entity name (e.g. [EMAIL_ADDRESS]).

To learn more about how to configure PII anonymization, see the Anonymization Rules section below.

Performance

The PII management capability helps you comply with data protection regulations and ensure that sensitive user data is handled appropriately without compromising the functionality of your assistant. Rasa Pro ensures that the PII management capability does not affect the performance of your assistant.

PII Management Jobs

The PII management jobs, such as publishing anonymized events to the supported event brokers, anonymization and deletion in the tracker store, are run in the background using a job scheduler. This allows your assistant to continue processing user requests without significant delays.

If you have configured both the anonymization and deletion jobs with overlapping cron triggers, Rasa Pro ensures that both jobs are run sequentially to avoid a potential race condition in updating the tracker store. First the anonymization job is run, followed by the deletion job as long as the deletion job is due to run once the anonymization job is completed.

Recommendation

We recommend that you configure the PII management jobs to run at a low traffic time to minimize the number of reads and writes to the tracker store. This will help reduce the load on the tracker store.

Deletion Cron Job

The deletion cron job is responsible for deleting PII data from the tracker store, and it runs periodically based on the configured cron schedule.

The deletion job is designed to remove PII data that is no longer needed, based on the configured retention period. The deletion job loops through all conversation sessions in the tracker store, including trackers that contain multiple sessions, where a session is marked by the action_session_start event at the beginning. Then it checks if the session has ended by computing the difference between the current deletion job run timestamp and the timestamp of the last event in the session. If this difference is greater than the sum of the USER_CHAT_INACTIVITY_IN_MINUTES environment variable value and the value of the configured retention period via the min_after_session_end parameter in the privacy YAML config, the session is considered eligible for deletion.

When the job encounters sessions that are not eligible for deletion, either because they are still active or because they have not reached the end of the retention period, the job retains these events in the tracker store by overwriting the pre-existing tracker with only the retained events.