The log retention period in any SIEM can have a big impact on your cost as well as your investigation and threat hunt capabilities. Defining a low period can be cheaper but it also limits your capabilities to find patterns in your network, to do proper incident response, and to carry out a threat hunt on older data based on newly discovered techniques. All the logs have different values in the long run. While some events can be worthless after a few days, others can be invaluable, and there can be some you have to keep for compliance purposes. Azure Sentinel offers the capability to define different retention periods for different data tables.
As I said, different logs have different values, especially in a longer period. For example, Antivirus logs can be really useful when they are created. They tell you when the machine is infected, whether the infection removal was successful or not. You can create logic in your SIEM to tell you when the same infection reappears on the machine in a short period, or when the same thing can be seen on multiple machines at once.
But the same logs are rarely used in investigations or threat hunts, thus they are frequently rolled out after a short period of time (30 days).
In other situations, you just don’t have teams which could utilize the logs. Keeping the logs without any means to use them is a waste of storage space and therefore a waste of money. And if your analysts have to query through useless logs then it is also a waste of time.
Also, a lot of logs have to be kept for various purposes, but you don’t want to query them at all. Or at least you don’t want to store them in your expensive hot storage. In this case, you can send them to cold storage.
There can be multiple reasons why you don’t want to keep your logs in an expensive SIEM. Sentinel gives you the option to define different retention periods for different tables. This way, you can roll out logs from your SIEM in a more effective way. If something is not useful after a short period, just define a lower retention period. Important, valuable logs on the other hand can be kept in the SIEM for a longer time to be used by your queries and teams.
Unfortunately, this option can’t be found on the Sentinel GUI, you must use some coding to define per-table retention. Be aware, that Sentinel gives you 90 days of free storage (you do not have to pay data retention price) for all your data. So, cost-wise, it is worth keeping all the ingested data for 90 days.
In this post, I’m showing a quick and easy way to define per-table retention period with ARM templates.
Defender for Endpoint logs in Sentinel
Microsoft provides an ARM template to change retention for individual tables but only for one at a time. I created my own ARM template because it was a constant request from companies to be able to change the retention of multiple data tables at once. One talkative log source clients frequently want to keep for a longer period is Microsoft Defender for Endpoint (MDE). Defender for Endpoint is an EDR solution that collects telemetry data from the endpoints and can easily forward this data to Sentinel.
On the other hand, a lot of people don’t know that you can keep Defender for Endpoint logs in the EDR itself for 180 days. And you can keep it there for free. On the other hand, pushing this data to Sentinel will incur an ingestion cost, and also keeping it for 180 days adds an additional 3 months of retention cost.
It is still typical that people want these logs in Sentinel. Having everything in one solution to provide a single-pane-of-glass can be convenient for the analysts. To handle rules in one place or to create complex correlation rules between log sources is also easier if you have all the logs in one place. However, during these scenarios, you don’t need the MDE logs in Sentinel for 180 days. It can be a good idea to forward the logs to the SIEM, keep them there for the free 90 days, and then roll them out. At the same time, you can keep all of the logs in MDE for 180 days, so if you need them, they are still going to be available.
Per-Table data retention configuration with ARM
To the gist, you can find the code below or on gitlab
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata":{
"comments": "Template to configure per-table retention period for multiple tables at once.",
"author": "Sandor Tokesi"
},
"parameters": {
"logAnalyticsWorkspaceName": {
"type": "string",
"metadata": {
"description": "The name of the log analytics workspace."
}
},
"configureDefaultRetention": {
"type": "bool",
"defaultValue": false
},
"defaultRetention": {
"type": "int",
"defaultValue": 90,
"minValue": 7,
"maxValue": 730,
"metadata": {
"description": "Number of days of retention as a default value for Log Analytics Workspace. Only used if configureDefaultRetention is set to 'true'."
}
},
"perTableRetentionSettings": {
"type":"array",
"defaultValue":[],
"metadata": {
"description": "Array containing the list of data table - retention day pairs for which you want to change the retention period. Use the default empty array value if you want the data table to use the default Log Analytics Workspace retention period."
}
}
},
"resources": [
{
"name": "[parameters('logAnalyticsWorkspaceName')]",
"type": "Microsoft.OperationalInsights/workspaces",
"condition" : "[parameters('configureDefaultRetention')]",
"apiVersion": "2020-08-01",
"location": "[resourceGroup().location]",
"properties":
{
"retentionInDays": "[parameters('defaultRetention')]"
}
},
{
"name": "[concat(parameters('logAnalyticsWorkspaceName'), '/', parameters('perTableRetentionSettings')[copyIndex()][0])]",
"type": "Microsoft.OperationalInsights/workspaces/tables",
"apiVersion": "2017-04-26-preview",
"dependsOn": ["[resourceId('Microsoft.OperationalInsights/workspaces', parameters('logAnalyticsWorkspaceName'))]"],
"properties": {
"retentionInDays": "[parameters('perTableRetentionSettings')[copyIndex()][1]]"
},
"copy": {
"name": "retentionConfigurationCopy",
"count": "[length(parameters('perTableRetentionSettings'))]"
}
}
],
"outputs": {}
}
You can load the template with the “Build your own template in the editor” button in the Azure Custom deployment site.
There are 7 parameters you can change during the deployment. 3 of them are default ones:
- Subscription: Define the subscription your Sentinel is in.
- Resource Group: Define the resource group or create a new one.
- Region: Going to be filled automatically if you choose an existing resource group, or you can pick the location where you want to create your new resource group if you choose to do so.
And 4 custom parameters:
- logAnalyticsWorkspaceName: for which you want to change the retentions
- ConfigureDefaultRetention: ‘True’ if you want to change the default retention, ‘False’ if you want to leave the default Log Analytics Workspace retention intact.
- DefaultRetention: If you want to change the default retention period of the Log Analytics Workspace and the “ConfigureDefaultRetention” is set to ‘true’, then you can configure the retention period in this field.
- PerTableRetentionSettings: A 2D array that contains the list of Tables for which you want to change the retention period and also the wanted retention period values for these tables. If you only want to configure the default LAW retention period, then leave this field untouched (an empty array).
For example, if you want to change the retention of all the MDE tables to 120 days, but you also want to change one of them - the DeviceProcessEvents - to 180 days, you can use the following array as the value in the PerTableRetentionSettings field:
[
["DeviceInfo",120],
["DeviceNetworkInfo",120],
["DeviceProcessEvents",180],
["DeviceNetworkEvents",120],
["DeviceFileEvents",120],
["DeviceRegistryEvents",120],
["DeviceLogonEvents",120],
["DeviceImageLoadEvents",120],
["DeviceEvents",120],
["DeviceFileCertificateInfo",120]
]
Like this:
So, the pattern for the PerTableRetentionSettings table is:
[
["TableName1", RetentionPeriod1],
["TableName2", RetentionPeriod2]
]
There are two ways to define the RetentionPeriod value:
- Use the ’null’ value to tell the table to follow the default log analytics workspace retention period. Example: [[“DeviceInfo”,null]]. After this, if you change the default retention period then the retention period of this data table is also going to be changed to that value.
- Define a number between 7 and 730. In this case, the data table is going to use that value as retention period.
Finding the data tables
One easy way to find the relevant data tables is to query it in KQL. However, you can’t query metadata in KQL so you will only find the tables which contain data. There are multiple ways to find your tables, and sometimes it is not even needed, you know what you want to change. So, I’m not going to go into details.
But still, here is a simple way you can query this information by using the ARMClient.exe tool from PowerShell:
ARMClient.exe get "/subscriptions/<subscription_id>/resourceGroups/<resource_group_name>/providers/Microsoft.OperationalInsights/workspaces/<workspace_name>/Tables?api-version=2017-04-26-preview" | ConvertFrom-Json | Select -expand value | Select name -expand properties | Select name, retentionInDays
Interesting behavior
There are some tables that you can’t set to a value lower than 90 days. You can increase their retention period to a number higher than 90 days, but if you define a lower one, then they will default to 90 days. They also won’t follow the default log analytics workspace retention period if it is lower than 90 days. In this case, they are going to opt for 90 days. These are the systems I have found:
On the Azure portal GUI, you can choose the LAW retention period from a fixed list. Even though you have a textbox in which you can type any number after leaving the field it changes the value to the closest fixed value. You can define the following retention period days on the GUI (in days): 30, 31, 60, 90, 120, 180, 270, 365, 550, 730 and your recent retention period. If you type 150 in the textbox it changes to 120, in case of 151 it changes to 180. No in-between. With this script of mine, you can define any number between 7 and 730.
The template is always going to execute the default LAW retention configuration first and the custom per-table retention configuration after that. Be aware of this behavior.