Analytics Tier
Data Lake Tier
Monthly Cost Projection
Track cost evolution over time as data accumulates
Cost Summary
Export monthly data (GB) and cost (USD) projections for up to 12 years
Understand your Microsoft Sentinel SIEM and data lake costs, their behavior and how they affect your overall spending.
This page aims to consolidate the various cost components and implications of enabling the data lake feature. It includes answers to frequently asked questions and provides a simple cost calculator - useful while Microsoft continues to enhance their official calculator with more cost factors.
For an introduction to the core concepts of the data lake, how it operates, and its use cases, I recommend consulting Microsoft's official documentation. This resource focuses specifically on cost-related aspects. All information here has been validated with Microsoft (although on some topics different MS experts provided differing opinions).
Once data lake is enabled, two table types coexist with distinct roles: the familiar analytics tier and the new data lake tier. Both tiers support interactive queries, but with different performance and cost profiles.
Remains the primary repository for security data and rule execution.
Use for: Real-time detections, frequent queries, high-value security data. This tier is designed for high-performance analytics and real-time data processing.
Serves as a long-term storage solution for less frequently used, verbose, or lower security-value data.
Use for: It's designed for cost effective retention of large volumes of security data for up to 12 years.
With data lake enabled, data ingested into the analytics tier (for supported tables) is mirrored to the data lake tier at no extra ingestion or processing charge. As long as the data in data lake remains within the analytics retention window, storage in the data lake tier is free. Note that only newly ingested data is mirrored; existing legacy data will not be moved to data lake automatically.
With Sentinel data lake enabled, the traditional long-term retention mechanism is migrating toward a data lake-based model for supported tables. Data is mirrored from the analytics tables to data lake and stored there for at least the duration of the analytics retention period at no charge.
If your Total Retention period exceeds the analytics retention, the additional retained data beyond the analytics window will incur data lake storage costs while remaining accessible via data lake queries.
If you ingest data into an analytics table, configure the analytics tier retention to 120 days, and set the Total Retention to 180 days, the following applies:
๐ฐ Cost Savings: For tables that do not support data lake, legacy archiving remains active. This reinforces the value of migrating to supported DCR-based tables. Data lake provides not only direct query access to long-term retained data but also cost savings through Microsoft's 6:1 compression discount on data lake storage. This discount reduces storage costs by a fixed factor of six.
Ingesting log data into an analytics tier table.
Click for details โKeeping the data in analytics tier (beyond the free 90 days).
Click for details โIngesting data into a data lake tier table.
Click for details โStoring (compressed) data in the data lake.
Click for details โCost for running queries against data in the data lake.
Click for details โProcess the ingested data for data lake.
Click for details โCompute costs for running advanced analytics on data lake data.
Click for details โTrack cost evolution over time as data accumulates
Export monthly data (GB) and cost (USD) projections for up to 12 years
Currently, historical data that is archived for long-term storage is not moved or mirrored from Sentinel to data lake. Your existing archived data will remain in Sentinel and will not be accessible in data lake. However, since the billing meter changes to the new data lake-based approach, you will benefit from the 6:1 Storage compression discount regardless. Consequently, the cost of your long-term retained data will decrease to one-sixth of its previous price.
According to Microsoft's latest guidance, the data processing fee applies to uncompressed data that enters the ingestion pipelines. This fee does not apply to data mirrored from your analytics tierโmirrored data incurs no ingestion or processing charges.
If you use a Data Collection Rule (DCR) to filter data, you can reduce ingestion costs, but data processing costs remain unaffected.
Example:
If you send 100โฏGB of data to your DCR and drop 90% of it, you will pay ingestion costs on only 10% of the data, but the processing fee will still apply to the full 100%.
Microsoft addresses this scenario in their guidance documents. The data lake feature only mirrors new data moving forward. For example, if you enable data lake and after 30 days reduce interactive (analytics) retention from 180 to 90 days, then all the queryable data you will have is 90 days in analytics (interactive) and 30 days (overlapping with the first 30 days of analytics) in data lake. This means you will lose the ability to run interactive queries on your last 90 days of data. Be sure to carefully plan your retention policy and query requirements before making changes.
This is a Microsoft Sentinel platform requirement to ensure data continuity and consistency. Since analytics tier data can transition to data lake tier after the analytics retention period (long-term retention), the data lake must be configured to store data for at least as long as analytics.
This prevents gaps in your security data and maintains a continuous timeline for compliance and forensic investigations.
Think of it as a data flow: analytics (hot) โ data lake (cold/archived), where the cold tier must be able to receive and retain data from the hot tier.
The data not kept in data lake tier longer than its analytics retention won't generate data lake storage costs.
All newly ingested analytics data is mirrored to the data lake storage at no additional charge. Therefore, during the interactive retention period, data is accessible both in the free-to-query analytics tier and in the pay-to-query data lake storage. However, if you specifically query your data lake - for example, from the KQL queries page or via KQL jobs - you will incur query costs, even if the same data exists in analytics. If you access the data through Sentinel within the analytics retention window, no additional fee is applied. At present, Microsoft does not waive query fees for data lake storage data even if the same data is available for free in analytics; your costs will depend on which tool or interface you use to access the data.
Query charges are incurred when you directly query the data lake table using KQL jobs, KQL queries, or via the API (for example, through the Sentinel MCP server). All standard workloads that access the data lake in this manner will be subject to these charges. The advanced data insights charge applies when you utilize managed workboos Notebooks. Whether you run Notebooks interactively or as scheduled jobs, charges are based on the CPU hours used by the given execution. When using Notebooks for querying, you will not incur regular query charges. In summary, Notebook-based exploration of the data lake is exempt from query charges but will generate advanced data insights charges for the compute resources consumed.
The most reliable method for minimizing query charges in the Sentinel data lake tier is to leverage the TimeGenerated field. Explicitly define time windows for your queries to limit data scanning to the intervals required. Be careful when running queries via the Sentinel MCP server - it tends to query ALL your existing data -, or when you run a KQL query in the GUI - easy to misconfigure, and due to a bug you can easily query more data than expected.
With Notebooks, billing is based on CPU hours consumed. To minimize costs, focus on developing efficient, fast-executing code. Applying filters on the TimeGenerated field speeds up queries due to partitioning. Additionally, selecting only the necessary columns (field pruning) reduces data processed, improving query performance and lowering compute time.
To simplify pricing, Microsoft applies a fixed 6x discount on data lake storage charges, effectively assuming that your stored data achieves an average compression ratio of 6:1. This discount applies exclusively to data lake storage fees and does not impact other costs such as query charges, which are calculated based on uncompressed data. In reality, your data's actual compression ratio may be better or worse than 6:1. If the compression is more effective, you may pay slightly more than the true compressed cost; if less effective, you may pay less. Microsoft uses the fixed 6x factor as a standard assumption across all data. Overall, this 6x discount represents a new cost-saving benefit that was not available previously. Note that the Azure Cost Management page displays raw data storage in GB but does not factor in the 6x compression, so reported storage size will appear larger than what the discounted pricing reflects.
Only advanced data insights charges apply to this activity. Although you are querying data in the data lake, no separate data query charges are incurred. Instead, data query costs are incorporated into the advanced data insights billing.
This comparison focuses solely on cost and does not consider feature differences.
- In North Europe, analytics ingestion costs $5.16 USD per GB (Pay-as-you-Go pricing).
- Ingesting data into the data lake costs $0.06 USD per GB (Ingestion) plus $0.12 USD per GB (data processing), totaling $0.18 USD per GB.
- Querying data lake data costs $0.006 USD per GB.
Therefore, ingesting data into the data lake is $4.98 USD per GB cheaper than analytics ingestion.
From a cost perspective, you can run 4.98 / 0.006 โ 830 queries on the same amount of data in the data lake before analytics becomes the more economical option (excluding data lake storage costs).
If your team runs queries that look back 90 days, with one query per day, the same data will be queried about 90 times. This means you can conduct roughly 830 / 90 โ 9 queries per day. After that it will be more expensive than pushing the data into analytics tier.
If your team only looks back 7 days (a more realistic scenario), you could run roughly 830 / 7 โ 120 queries per day. While that may seem high, interactive query execution often involves many trial runs or incorrect queries, generating additional charges. It's important to carefully monitor and manage query patterns in your environment.
Contrary to Microsoftโs earlier statements, a recent update confirms that the data processing charge applies to all data reaching the ingestion pipeline, not only to the data that is ultimately ingested.
This means that even if you filter out data using Data Collection Rules (DCR), you will still incur the data processing charge for all data that reaches the pipeline before being filtered.
This has been updated for auxiliary logs now: Microsoft Learn - Data Collection Transformations
Note: This clarification is currently documented for auxiliary logs but not yet specifically updated for Sentinel data lake documentation.