Understanding Sentinel Data Lake
A comprehensive guide to Microsoft Sentinel's data lake feature and how it impacts your security data architecture.
What You'll Learn
Analytics vs Data Lake storage options
How data flows between tiers
Archive strategies and costs
Introduction
This guide consolidates the various cost components and implications of enabling the Sentinel data lake feature. It provides clarity on pricing structures and helps you make informed decisions about your security data architecture.
For core concepts about the data lake, how it operates, and its use cases, consult Microsoft's official documentation. This guide focuses specifically on cost-related aspects to help you understand charges optimize spending.
The Two Storage Tiers
Once Sentinel data lake is enabled, two main table types become available. Both support interactive queries without data restoration, but they offer different performance and cost profiles.
Analytics Tier
The primary repository for security data, rule execution and investigations.
- Best for: Real-time detections, frequent queries, high-value data
- Speed: Query-optimized for fast interactive analysis
- Costs: Higher ingestion cost, but free queries & free 90-day retention
- Retention: Up to 2 years
Data Lake Tier
Long-term storage for verbose or lower-value security data.
- Best for: Compliance, DFIR, safe haven of your data
- Speed: Storage-optimized, slower query performance
- Costs: Lower ingestion cost, but additional costs for usage
- Retention: Up to 12 years, acting as the long-term retention option for any data
These table types do not work completely separately. Before data lake analytics, basic and auxiliary tables were completely independent. Now, while a data lake table can exist independently without an analytics one, all supported analytics tables will also have a representation in the lake.
Data Mirroring
With data lake enabled, data ingested into the analytics tier (for supported tables) is automatically mirrored to the data lake tier at no extra ingestion or processing charge.
Key Points About Mirroring
- Only newly ingested data is mirrored — existing legacy data won't be moved
- Mirrored data exists in both tiers simultaneously under the same table name - in the SIEM and in data lake as well
- Mirroring happens after DCR filtering - so data filtered, modified in a DCR will be stored as such in both tiers
- Storage in data lake is free while within the analytics retention window
- Querying mirrored logs in data lake tier still incurs query costs
- Mirroring is the basis of the new long-term retention
Which tier you query depends on your method: Advanced Hunting queries free analytics data, while the Data Lake Exploration tab queries billable data lake events.
MCP server: The 'Data exploration' collection uses data in the data lake, so it will always generate costs. The 'Triage' collection queries Defender and the connected Sentinel SIEM (so the analytics data), which is always free. Fresh, mirrored data you can query in both places, so be careful which collection and tool you use with the MCP server.
Long-term Retention: The Paradigm Shift
With Microsoft Sentinel data lake integration, the traditional long-term retention process is evolving toward a data lake–based model for supported tables. Even the terminology has changed — what was previously referred to as long-term retention is now transitioning to data lake retention.
Why the Change?
The key reason lies in how data is stored and accessed. When data lake is enabled:
- All data ingested into a data lake-only table is automatically stored in the lake at ingestion time
- Data in supported analytics tables is available both in log analytics (for quick querying) and in the lake (for cost-efficient storage)
- Retention settings are split into two parts:
- Analytics Retention: Controls how long data is kept in the SIEM (analytics retention) for performant queries and interactivity.
- Data Lake Retention: Defines how long data is kept in the data lake for long-term and alternative lower-cost storage.
Traditional vs. Data Lake–Based Retention
- Traditional Model: Data older than the analytics retention period is archived on a rolling basis.
- Data Lake Model: Data is stored in the lake at the time of ingestion — no separate archiving step is needed.
Coexistence and Cost Considerations
- Tables without data lake support: Continue using the traditional archiving approach.
- Tables with data lake support: Currently use both methods in parallel. During this transition, data is stored in both data lake and traditional archive without additional cost — you only pay once.
Cost Benefit: Data lake storage includes an assumed 6:1 compression ratio, lowering storage costs significantly. This cost benefit applies globally, whether or not the given table supports data lake yet.
Transition Challenge: Data Coverage Alignment
A major challenge during this transition is maintaining consistent data coverage - keeping the interactivity of the logs.
When moving from a longer analytics retention period to a shorter one with extended total retention, the interactive accessibility of your logs can be affected. While no data is lost, some logs may only be retrievable through Search or Restoration rather than being immediately available for interactive queries.
Let's walk through a scenario:
You have 180 days of analytics retention without data lake enabled.
From this point forward, newly ingested data is stored in the lake as well - but not already existing data.
You now have data in both tiers, but with different coverage:
You reduce analytics retention from 180 → 90 days and set data lake to 180 days total retention (data lake storage).
Without traditional archiving: The older data (days 90–180) would be deleted — and since data lake only has 30 days, that historical data would be lost forever.
Traditional archiving is still in place for your analytics tables if the total retention is bigger than the analytics retention.
The coexistence of both systems protects your data:
As data lake continues to ingest new data daily, it will eventually hold the full 180 days of total retention. At this point, you have complete data coverage in the data lake modell.
This also shows that in order to have 180 days of data in the lake, you have to wait 180 days. If you switch from full analytics retention to a shorter analytics with extended total retention, the data won't be available in data lake and you will lose the interactivity of your data.
💡 Key Takeaway: When transitioning to data lake retention, plan your retention changes carefully. Wait until data lake has accumulated enough historical data before reducing your analytcs retention to avoid gaps.
Cost Elements
Understand each billing component that affects your Microsoft Sentinel data lake costs.
Click on any cost element below to learn more about how it affects your billing.
Analytics Ingestion
Ingesting log data into an analytics tier table.
Learn more →Analytics Retention
Keeping the data in analytics tier (beyond the free 90 days).
Learn more →Data Lake Processing
Processing the ingested data for the data lake.
Learn more →Data Lake Ingestion
Ingesting data into a data lake tier table.
Learn more →Data Lake Storage
Storing (compressed) data in the data lake.
Learn more →Data Lake Query
Cost for running queries against data in the data lake.
Learn more →Advanced Data Insights
Compute costs for running advanced analytics on data lake data.
Learn more →Cost Calculator
Estimate your Microsoft Sentinel SIEM and data lake costs with our interactive calculator.
Add Cost Elements
Added Elements
No cost elements added yet. Click the buttons above to add elements.
Cost Summary
Frequently Asked Questions
Common questions about Microsoft Sentinel data lake costs, billing, and optimization strategies.
Currently, historical data that is archived for long-term storage is not moved or mirrored from Sentinel to data lake. Your existing archived data will remain in Sentinel and will not be accessible in data lake.
However, since the billing meter changes to the new data lake-based approach, you will benefit from the 6:1 storage compression discount regardless. As a result, the cost of your long-term retained data will decrease to one-sixth of its previous price.
According to Microsoft's latest guidance and my tests, the data processing fee applies to uncompressed data that enters the ingestion pipelines and is charged on a per-data-flow basis. This fee does not apply to data mirrored from your analytics tier.
If you use a Data Collection Rule (DCR) to filter data, you can reduce ingestion costs, but data processing costs remain unaffected.
Example 1: If you send 100 GB of data to your DCR and drop 90% of it, you will pay ingestion cost for only 10% of the data, but the processing fee will still apply to the full 100%.
Example 2: If you send 100 GB of data to your DCR and, via various data flows, send the data into 3 different data lake-only tables (log splitting), you will incur the 100 GB data processing cost 3 times.
This is a Microsoft Sentinel platform requirement to ensure data continuity and consistency. Since the long-term retention of analytics data is not data lake based, the lake must be configured to store data for at least as long as the analytics tier.
This prevents gaps in your security data and maintains a continuous timeline for compliance and forensic investigations.
All newly ingested analytics data is mirrored to data lake storage at no additional charge. Therefore, during the interactive retention period, data is accessible both in the free-to-query analytics tier and in the pay-to-query data lake storage.
However, if you specifically query your data lake - for example, from the KQL queries page or via KQL jobs - you will incur query costs, even if the same data exists in analytics. If you access the data through Sentinel within the analytics retention window, no additional fee is applied.
Query charges are incurred when you directly query the data lake table using KQL jobs, KQL queries, or via the API (for example, through the Sentinel MCP server). All standard workloads that access the data lake in this manner will be subject to these charges.
The advanced data insights charge applies when you utilize managed Notebooks. Whether you run Notebooks interactively or as scheduled jobs, charges are based on the CPU hours used by the given execution. When using Notebooks for querying, you will not incur regular query charges.
To simplify pricing, Microsoft applies a fixed 6x discount on data lake storage charges, effectively assuming that your stored data achieves an average compression ratio of 6:1. This discount applies exclusively to data lake storage fees and does not impact other costs such as query charges, which are calculated based on uncompressed data.
In reality, your data's actual compression ratio may be better or worse than 6:1. If the compression is more effective, you may pay slightly more than the true compressed cost; if less effective, you may pay less. Microsoft uses the fixed 6x factor as a standard assumption across all data.
Note that the Azure Cost Management page displays raw data storage in GB but does not factor in the 6x compression, so reported storage size will appear larger than what the discounted pricing reflects.
This comparison focuses solely on cost and does not consider feature differences.
• In North Europe, analytics ingestion costs $5.16 USD per GB (Pay-as-you-Go pricing).
• Ingesting data into the data lake costs $0.06 USD per GB (Ingestion) plus $0.12 USD per GB (data processing), totaling $0.18 USD per GB.
• Querying data lake data costs $0.006 USD per GB.
Therefore, ingesting data into the data lake is $4.98 USD per GB cheaper than analytics ingestion. From a cost perspective, you can run approximately 830 queries on the same amount of data in the data lake before analytics becomes the more economical option.
If your team runs queries that look back 90 days, with one query per day, the same data will be queried about 90 times. This means you can conduct roughly 9 queries per day. After that, it will be more expensive than pushing the data into the analytics tier.
You can route data from one data lake table to another data lake-only table using KQL jobs.
This unlocks sophisticated data management use cases.
Note that this undocumented feature is viewed by Microsoft as an unintended behavior. If you rely on it, keep a close watch - Microsoft may remove it at any time.
Blog
Stay updated with the latest insights on Microsoft Sentinel, security operations, and cloud security.
Visit Tokesi Cloud Blog
Explore in-depth articles about Microsoft Sentinel, data lake architecture, cost optimization strategies, and security best practices. The blog covers practical guides, tutorials, and insights from real-world implementations.