Analytics Tier
Data Lake Tier
Monthly Cost Projection
Track cost evolution over time as data accumulates
Cost Summary
Export monthly data (GB) and cost (USD) projections for up to 12 years
Understand your Microsoft Sentinel SIEM and data lake costs, their behavior and how they affect your overall spending.
This page aims to consolidate the various cost components and implications of enabling the data lake feature. It includes answers to frequently asked questions and provides a simple cost calculator - useful while Microsoft continues to enhance their official calculator with more cost factors.
For an introduction to the core concepts of the data lake, how it operates, and its use cases, I recommend consulting Microsoft's official documentation. This page explains various aspects and behaviors of Sentinel data lake, but it focuses specifically on cost-related aspects.
Once Sentinel data lake is enabled, two main table types become available: the familiar analytics tier and the new data lake tier. Both support interactive queries without data restoration, but they offer different performance and cost profiles.
Remains the primary repository for security data and rule execution.
Use for: Real-time detections, frequent queries, high-value security data. This tier is designed for high-performance analytics and real-time data processing.
- Speed: Query-optimized data provides faster interactive analysis.
- Costs: Higher ingestion and storage costs, but offers free data usage and 90 days free retention.
Serves as a long-term storage solution for less frequently used, verbose, or lower security-value data.
Use for: It's designed for cost effective retention of large volumes of security data for up to 12 years. Use as a reliable soure of truth for your pipelines, or for rarely accessed data you keep for compliance/DFIR reasons.
- Speed: Storage-optimized tier with slower query performance.
- Costs: Lower ingestion and storage costs, but incurs additional fees for data processing and usage.
- Long-term retention:Replaces the traditional archiving (long-term retention) feature.
With data lake enabled, data ingested into the analytics tier (for supported tables) is automatically mirrored to the data lake tier at no extra ingestion or processing charge. As long as the mirrored data in data lake remains within the analytics retention period window, storage in the data lake tier is free.
Note that only newly ingested data is mirrored; existing legacy data will not be moved to data lake automatically. Mirroring happens automatically once you enable the data lake; no additional manual configuration is needed.
Mirrored data exists simultaneously in both an analytics table and a data lake table under the same name. Which data you access depends on your method: for example the Advanced Hunting page queries free analytics data, while the Data lake exploration tab queries billable data lake events.
Mirroring data to the lake is free - there are no charges for ingestion, data processing, or storage while the data remains in the analytics tier. However, querying the mirrored logs in the data lake tier still incurs additional costs.
With Sentinel data lake enabled, the traditional long-term retention mechanism is migrating toward a data lake-based model for supported tables. Data is mirrored from the analytics tables to data lake and stored there for at least the duration of the analytics retention period at no charge.
Tables without data lake support will continue using traditional long-term retention (archive), while tables that support data lake will leverage both data lake-based and traditional retention simultaneously. Microsoft charges only once for the long-term retention setup, so there won't be duplicated costs.
If your Total Retention period exceeds the analytics retention, the additional retained data beyond the analytics window will incur data lake storage costs while remaining accessible via data lake queries.
The introduction of the Sentinel data lake fundamentally reshapes how data is retained long-term.
Pre-Data Lake Retention Model (The "Old Way")
1. Interactive Retention (Legacy Name):
- This was the primary, high-performance tier where data was ingested and made immediately available for lightning-fast queries, real-time detections, and analytics.
2. Long-Term Retention (LTR) (Legacy Name):
- This was the tier created for long-term retention. Data could not be sent here directly.
- Data was automatically moved from the Interactive tier to the long-term retention tier after its Interactive retention period expired. This was a "rolling out" process β as data aged out of Interactive, it transitioned to LTR.
- Accessing data in LTR typically involved a "rehydration" process, making it less ideal for immediate, high-volume analysis.
"Total Retention" (Old Way): Your total retention period was the sum of your Interactive retention and your LTR. For example, 90 days Interactive + 120 days LTR = 210 days Total Retention. Data existed in only one tier at any given moment.
Post-Data Lake Retention Model (The "New Way")
With data lake, the concept of "moving" data between tiers is largely replaced by mirroring and independent retention policies.
1. Analytics Tier:
- This is still the primary table tier. Important data resides in this tier for high-performance querying and is directly leveraged by Sentinel's real-time detection rules, workbooks, and automation.
- Key Change: While the analytics tier retains its own retention policy (e.g., 30, 90 days), any new data ingested into it (for supported tables) is simultaneously mirrored to the data lake from the moment of ingestion. This means that for the duration of the analytics retention, data physically exists in both the analytics tier and the data lake tier.
2. Data Lake Retention (The New "Total Retention")
- This is the new primary mechanism for long-term storage and replaces the traditional LTR.
- Both Tiers are Interactive (with a caveat): Unlike the old LTR which involved rehydration, data in data lake is always interactive β meaning it's always online and queryable. You don't need to explicitly restore it (but it is still possible to use the Search future with all of its benefits)
- Accessing data in LTR involved a "rehydration" process, making it less ideal for immediate, high-volume analysis.
"Data Lake Retention = Total Retention": This is the most crucial change. Because data is mirrored ingestion-time into the data lake, you simply set your data lake retention policy to your desired total data retention period. Data lake holds the mirrored copy for the entire duration.
If you ingest data into an analytics table, configure the analytics tier retention to 120 days, and set the Total Retention to 180 days, the following applies:
1. Days 1-90: Data retention in the analytics tier is free, and the mirrored data in the data lake tier is also retained free of charge for those 90 days.
2. Days 90-120: analytics retention charges apply for that 30-day period, while the same data remains free in the data lake.
3. Days 120-180: Beyond the analytics retention data lake storage costs are incurred for the additional 60 days.
π° Cost Savings: For tables that do not support data lake, legacy archiving remains active. This reinforces the value of migrating to supported DCR-based tables. Data lake provides not only direct query access to long-term retained data but also cost savings through Microsoft's 6:1 compression discount on data lake storage. This discount reduces storage costs by a fixed factor of six.
Ingesting log data into an analytics tier table.
Click for details βKeeping the data in analytics tier (beyond the free 90 days).
Click for details βIngesting data into a data lake tier table.
Click for details βStoring (compressed) data in the data lake.
Click for details βCost for running queries against data in the data lake.
Click for details βProcess the ingested data for data lake.
Click for details βCompute costs for running advanced analytics on data lake data.
Click for details βTrack cost evolution over time as data accumulates
Export monthly data (GB) and cost (USD) projections for up to 12 years
Currently, historical data that is archived for long-term storage is not moved or mirrored from Sentinel to data lake. Your existing archived data will remain in Sentinel and will not be accessible in data lake. However, since the billing meter changes to the new data lake-based approach, you will benefit from the 6:1 Storage compression discount regardless. So, the cost of your long-term retained data will decrease to one-sixth of its previous price.
According to Microsoft's latest guidance and my tests, the data processing fee applies to uncompressed data that enters the ingestion pipelines and is charged on a per data flow basis. This fee does not apply to data mirrored from your analytics tier.
If you use a Data Collection Rule (DCR) to filter data, you can reduce ingestion costs, but data processing costs remain unaffected.
Example 1:
If you send 100β―GB of data to your DCR and drop 90% of it, you will pay ingestion cost for only 10% of the data, but the processing fee will still apply to the full 100%.
Example 2:
If you send 100β―GB of data to your DCR and via various data flows, you send the data into 3 different data lake-only tables (log splitting), you will incur 100GB ingestion cost 3 times.
Microsoft addresses this scenario in their guidance documents. The data lake feature only mirrors new data moving forward. For example, if you enable data lake and after 30 days reduce analytics retention from 180 to 90 days, then all the queryable data you will have is 90 days in analytics (interactive) and 30 days (overlapping with the first 30 days of analytics) in data lake. This means you will lose the ability to run interactive queries on your last 90 days of data. The data won't be lost; after the switch data older than 90 days will go to traditional archive. It will be available but not interactively queryable, so you lose the interactivity of the data if you don't wait for the full analytics retention period before making such changes. Be sure to carefully plan your retention policy and query requirements before making changes.
This is a Microsoft Sentinel platform requirement to ensure data continuity and consistency. Since analytics tier data can transition to data lake tier after the analytics retention period (long-term retention), the data lake must be configured to store data for at least as long as analytics.
This prevents gaps in your security data and maintains a continuous timeline for compliance and forensic investigations.
Think of it as a data flow: analytics (hot) β data lake (cold/archived), where the cold tier must be able to receive and retain data from the hot tier.
The data not kept in data lake tier longer than its analytics retention won't generate data lake storage costs.
All newly ingested analytics data is mirrored to the data lake storage at no additional charge. Therefore, during the interactive retention period, data is accessible both in the free-to-query analytics tier and in the pay-to-query data lake storage. However, if you specifically query your data lake - for example, from the KQL queries page or via KQL jobs - you will incur query costs, even if the same data exists in analytics. If you access the data through Sentinel within the analytics retention window, no additional fee is applied. At present, Microsoft does not waive query fees for data lake storage data even if the same data is available for free in analytics; your costs will depend on which tool or interface you use to access the data.
Query charges are incurred when you directly query the data lake table using KQL jobs, KQL queries, or via the API (for example, through the Sentinel MCP server). All standard workloads that access the data lake in this manner will be subject to these charges. The advanced data insights charge applies when you utilize managed workboos Notebooks. Whether you run Notebooks interactively or as scheduled jobs, charges are based on the CPU hours used by the given execution. When using Notebooks for querying, you will not incur regular query charges. In summary, Notebook-based exploration of the data lake is exempt from query charges but will generate advanced data insights charges for the compute resources consumed.
The most reliable method for minimizing query charges in the Sentinel data lake tier is to leverage the TimeGenerated field. Explicitly define time windows for your queries to limit data scanning to the intervals required. Be careful when running queries via the Sentinel MCP server - it tends to query ALL your existing data -, or when you run a KQL query in the GUI - easy to misconfigure, and due to a bug you can easily query more data than expected.
With Notebooks, billing is based on CPU hours consumed. To minimize costs, focus on developing efficient, fast-executing code. Applying filters on the TimeGenerated field speeds up queries due to partitioning. Additionally, selecting only the necessary columns (field pruning) reduces data processed, improving query performance and lowering compute time.
To simplify pricing, Microsoft applies a fixed 6x discount on data lake storage charges, effectively assuming that your stored data achieves an average compression ratio of 6:1. This discount applies exclusively to data lake storage fees and does not impact other costs such as query charges, which are calculated based on uncompressed data. In reality, your data's actual compression ratio may be better or worse than 6:1. If the compression is more effective, you may pay slightly more than the true compressed cost; if less effective, you may pay less. Microsoft uses the fixed 6x factor as a standard assumption across all data. Overall, this 6x discount represents a new cost-saving benefit that was not available previously. Note that the Azure Cost Management page displays raw data storage in GB but does not factor in the 6x compression, so reported storage size will appear larger than what the discounted pricing reflects.
Only advanced data insights charges apply to this activity. Although you are querying data in the data lake, no separate data query charges are incurred. Instead, data query costs are incorporated into the advanced data insights billing.
This comparison focuses solely on cost and does not consider feature differences.
- In North Europe, analytics ingestion costs $5.16 USD per GB (Pay-as-you-Go pricing).
- Ingesting data into the data lake costs $0.06 USD per GB (Ingestion) plus $0.12 USD per GB (data processing), totaling $0.18 USD per GB.
- Querying data lake data costs $0.006 USD per GB.
Therefore, ingesting data into the data lake is $4.98 USD per GB cheaper than analytics ingestion.
From a cost perspective, you can run 4.98 / 0.006 β 830 queries on the same amount of data in the data lake before analytics becomes the more economical option (excluding data lake storage costs).
If your team runs queries that look back 90 days, with one query per day, the same data will be queried about 90 times. This means you can conduct roughly 830 / 90 β 9 queries per day. After that it will be more expensive than pushing the data into analytics tier.
If your team only looks back 7 days (a more realistic scenario), you could run roughly 830 / 7 β 120 queries per day. While that may seem high, interactive query execution often involves many trial runs or incorrect queries, generating additional charges. It's important to carefully monitor and manage query patterns in your environment.
A recent update confirms that the data processing charge applies on a per data flow basis.
This means that in scenarios like log splitting, where the same data is processed multiple times to be forwarded into different tables, the charge will be incurred multiple times.
A recent update confirms that the data processing charge applies to all data reaching the ingestion pipeline, not only to the data that is ultimately ingested.
This means that even if you filter out data using Data Collection Rules (DCR), you will still incur the data processing charge for all data that reaches the pipeline before being filtered.
This has been updated for auxiliary logs now: Microsoft Learn - Data Collection Transformations
Note: This clarification is currently documented for auxiliary logs but not yet specifically updated for Sentinel data lake documentation.