Version: Latest

Automatic Conversation Deletion

Overview

Automatic Conversation Deletion allows for the periodic removal of old conversations and their associated data from the system. This behavior is designed to help manage data retention and comply with data protection regulations.

Configuration

Automatic Deletion is controlled by two environment variables:

  1. DELETE_CONVERSATIONS_OLDER_THAN_HOURS: Specifies the age threshold for conversations to be deleted, in hours.
  2. DELETE_CONVERSATIONS_CRON_EXPRESSION: Defines when the deletion process should run using a cron expression.

Example helm chart configuration:

DELETE_CONVERSATIONS_OLDER_THAN_HOURS=720 # 30 days
DELETE_CONVERSATIONS_CRON_EXPRESSION="0 * * * *" # Run hourly

Deletion Process

Default Behavior

The deletion is turned off by default. It will not run since default value in DELETE_CONVERSATIONS_OLDER_THAN_HOURS is not set. If DELETE_CONVERSATIONS_OLDER_THAN_HOURS is set, the cron job will run hourly (controlled by DELETE_CONVERSATIONS_CRON_EXPRESSION).

Scope

The system will delete entire conversations, including all associated messages and events, when the first message or event in the conversation is older than the specified time threshold. This means that as soon as the first message in a conversation is older than the threshold, the entire conversation will be deleted.

Batch Deletion

Deletion is performed in batches to manage system load and database transaction limits (100 per batch). If a deletion process is interrupted (e.g., by a server restart), it will continue from where it left off during the next scheduled run.

Data Affected

When a conversation is deleted, the following associated data is also removed:

  • Messages
  • Conversation Events
  • Intents
  • Predicted Flows
  • Channel information
  • Machine Learning Model data
  • Utterance Entities
  • Utterance Intents

Performance Considerations

The deletion process is designed to run periodically and in batches to minimize impact on system performance. However, during times of high message ingestion, the deletion process may have a noticeable impact on system resources.

Users should consider setting the cron schedule to run during off-peak hours.

Limitations and Known Issues

The deletion will only run while Studio is running. In rare cases, the deletion process may encounter transaction timeouts when dealing with a large number of conversations.

There is a potential for write conflicts or deadlocks during deletion operations, which may require the process to retry.

Best Practices

Set the deletion threshold (DELETE_CONVERSATIONS_OLDER_THAN_HOURS) to a value that balances your data retention needs with system performance. Monitor system performance during and after deletion runs to ensure it's not negatively impacting your application.

Regularly review and adjust the cron schedule as needed based on your system's usage patterns.

Compliance Note

While this feature can assist with data retention policies, users are responsible for ensuring their specific use of the system complies with relevant data protection regulations (e.g., GDPR).

Conversations via API

For developers looking to programmatically tag and delete conversations, please refer to our Conversations API Documentation.