Microsoft Sentinel SIEM and data lake calculator

Understand your Microsoft Sentinel SIEM and data lake costs, their behavior and how they affect your overall spending.

Microsoft Sentinel SIEM and data lake

Understanding the basics

Introduction

This page aims to consolidate the various cost components and implications of enabling the data lake feature. It includes answers to frequently asked questions and provides a simple cost calculator - useful while Microsoft continues to enhance their official calculator with more cost factors.

For an introduction to the core concepts of the data lake, how it operates, and its use cases, I recommend consulting Microsoft's official documentation. This resource focuses specifically on cost-related aspects. All information here has been validated with Microsoft (although on some topics different MS experts provided differing opinions).

Tiers

Once data lake is enabled, two table types coexist with distinct roles: the familiar analytics tier and the new data lake tier. Both tiers support interactive queries, but with different performance and cost profiles.

📊 Analytics tier

Remains the primary repository for security data and rule execution.

Use for: Real-time detections, frequent queries, high-value security data. This tier is designed for high-performance analytics and real-time data processing.

🌊 Data lake tier

Serves as a long-term storage solution for less frequently used, verbose, or lower security-value data.

Use for: It's designed for cost effective retention of large volumes of security data for up to 12 years.

With data lake enabled, data ingested into the analytics tier (for supported tables) is mirrored to the data lake tier at no extra ingestion or processing charge. As long as the data in data lake remains within the analytics retention window, storage in the data lake tier is free. Note that only newly ingested data is mirrored; existing legacy data will not be moved to data lake automatically.

Long-term Retention

With Sentinel data lake enabled, the traditional long-term retention mechanism is migrating toward a data lake-based model for supported tables. Data is mirrored from the analytics tables to data lake and stored there for at least the duration of the analytics retention period at no charge.

If your Total Retention period exceeds the analytics retention, the additional retained data beyond the analytics window will incur data lake storage costs while remaining accessible via data lake queries.

📋 Example Scenario

If you ingest data into an analytics table, configure the analytics tier retention to 120 days, and set the Total Retention to 180 days, the following applies:

Days 1-90: Data retention in the analytics tier is free, and the mirrored data in the data lake tier is also retained free of charge for those 90 days.
Days 90-120: analytics retention charges apply for that 30-day period, while the same data remains free in the data lake.
Days 120-180: Beyond the analytics retention data lake storage costs are incurred for the additional 60 days.

💰 Cost Savings: For tables that do not support data lake, legacy archiving remains active. This reinforces the value of migrating to supported DCR-based tables. Data lake provides not only direct query access to long-term retained data but also cost savings through Microsoft's 6:1 compression discount on data lake storage. This discount reduces storage costs by a fixed factor of six.

Cost Elements

📊

Analytics Ingestion

Ingesting log data into an analytics tier table.

Click for details →

🗄️

Analytics Retention

Keeping the data in analytics tier (beyond the free 90 days).

Click for details →

🌊

Data Lake Ingestion

Ingesting data into a data lake tier table.

Click for details →

🗃️

Data Lake Storage

Storing (compressed) data in the data lake.

Click for details →

🔍

Data Lake Query

Cost for running queries against data in the data lake.

Click for details →

⚙️

Data Lake Processing

Process the ingested data for data lake.

Click for details →

🔬

Advanced Data Insights

Compute costs for running advanced analytics on data lake data.

Click for details →

Cost Calculator

Advanced Cost Calculator

Pricing Configuration

Region

Analytics Tier Pricing

Commitment Tier

Analytics ingestion (per GB)

Analytics retention (per GB, beyond 90 days)

Data Lake Tier Pricing

Data lake ingestion (per GB)

Data lake processing (per GB)

Data lake storage (per GB/month)

Data lake query (per GB analyzed)

Advanced Data Insights (per compute hour)

Accumulating Mode

Analytics Tier

Analytics ingestion (GB/day)

Analytics retention (days)

Data Lake Tier

Data lake ingestion (GB/day)

Data lake storage (days)

Data lake query (GB/month)

Advanced Data Insights (compute hours/month)

Data lake processing (GB/day)

Monthly Cost Projection

Track cost evolution over time as data accumulates

Cost Summary

First Month Total

$0.00

Stabilized Monthly

$0.00

First Year Total

$0.00

Second Year Total

$0.00

Export monthly data (GB) and cost (USD) projections for up to 12 years

FAQ

Frequently Asked Questions

▼

What happens with my traditionally archived (long-term storage) data?

Currently, historical data that is archived for long-term storage is not moved or mirrored from Sentinel to data lake. Your existing archived data will remain in Sentinel and will not be accessible in data lake.

However, since the billing meter changes to the new data lake-based approach, you will benefit from the 6:1 Storage compression discount regardless. Consequently, the cost of your long-term retained data will decrease to one-sixth of its previous price.

▼

To what data does the data processing cost apply?

According to Microsoft's latest guidance, the data processing fee applies to uncompressed data that enters the ingestion pipelines. This fee does not apply to data mirrored from your analytics tier—mirrored data incurs no ingestion or processing charges.

If you use a Data Collection Rule (DCR) to filter data, you can reduce ingestion costs, but data processing costs remain unaffected.

Example:
If you send 100 GB of data to your DCR and drop 90% of it, you will pay ingestion costs on only 10% of the data, but the processing fee will still apply to the full 100%.

▼

I have 180 days of interactive retention and I want to use data lake. After enabling data lake, can I switch to 90 days interactive and 180 days total retention?

Microsoft addresses this scenario in their guidance documents. The data lake feature only mirrors new data moving forward.

For example, if you enable data lake and after 30 days reduce interactive (analytics) retention from 180 to 90 days, then all the queryable data you will have is 90 days in analytics (interactive) and 30 days (overlapping with the first 30 days of analytics) in data lake. This means you will lose the ability to run interactive queries on your last 90 days of data.

Be sure to carefully plan your retention policy and query requirements before making changes.

▼

Why can't data lake retention be shorter than analytics retention?

This is a Microsoft Sentinel platform requirement to ensure data continuity and consistency. Since analytics tier data can transition to data lake tier after the analytics retention period (long-term retention), the data lake must be configured to store data for at least as long as analytics.

This prevents gaps in your security data and maintains a continuous timeline for compliance and forensic investigations.

Think of it as a data flow: analytics (hot) → data lake (cold/archived), where the cold tier must be able to receive and retain data from the hot tier.

The data not kept in data lake tier longer than its analytics retention won't generate data lake storage costs.

▼

If I query data in data lake that I still have in the analytics tier, do I have to pay for it?

All newly ingested analytics data is mirrored to the data lake storage at no additional charge. Therefore, during the interactive retention period, data is accessible both in the free-to-query analytics tier and in the pay-to-query data lake storage.

However, if you specifically query your data lake - for example, from the KQL queries page or via KQL jobs - you will incur query costs, even if the same data exists in analytics. If you access the data through Sentinel within the analytics retention window, no additional fee is applied. At present, Microsoft does not waive query fees for data lake storage data even if the same data is available for free in analytics; your costs will depend on which tool or interface you use to access the data.

▼

Advanced data insights charge vs query charge

Query charges are incurred when you directly query the data lake table using KQL jobs, KQL queries, or via the API (for example, through the Sentinel MCP server). All standard workloads that access the data lake in this manner will be subject to these charges.

The advanced data insights charge applies when you utilize managed workboos Notebooks. Whether you run Notebooks interactively or as scheduled jobs, charges are based on the CPU hours used by the given execution. When using Notebooks for querying, you will not incur regular query charges. In summary, Notebook-based exploration of the data lake is exempt from query charges but will generate advanced data insights charges for the compute resources consumed.

▼

What strategies can I use to optimize data lake query costs in Microsoft Sentinel?

The most reliable method for minimizing query charges in the Sentinel data lake tier is to leverage the TimeGenerated field. Explicitly define time windows for your queries to limit data scanning to the intervals required. Be careful when running queries via the Sentinel MCP server - it tends to query ALL your existing data -, or when you run a KQL query in the GUI - easy to misconfigure, and due to a bug you can easily query more data than expected.

With Notebooks, billing is based on CPU hours consumed. To minimize costs, focus on developing efficient, fast-executing code. Applying filters on the TimeGenerated field speeds up queries due to partitioning. Additionally, selecting only the necessary columns (field pruning) reduces data processed, improving query performance and lowering compute time.

▼

How does the 6:1 compression factor work?

To simplify pricing, Microsoft applies a fixed 6x discount on data lake storage charges, effectively assuming that your stored data achieves an average compression ratio of 6:1. This discount applies exclusively to data lake storage fees and does not impact other costs such as query charges, which are calculated based on uncompressed data.

In reality, your data's actual compression ratio may be better or worse than 6:1. If the compression is more effective, you may pay slightly more than the true compressed cost; if less effective, you may pay less. Microsoft uses the fixed 6x factor as a standard assumption across all data.
Overall, this 6x discount represents a new cost-saving benefit that was not available previously.

Note that the Azure Cost Management page displays raw data storage in GB but does not factor in the 6x compression, so reported storage size will appear larger than what the discounted pricing reflects.

▼

I query my data lake data via Notebooks. What charges will apply?

Only advanced data insights charges apply to this activity. Although you are querying data in the data lake, no separate data query charges are incurred. Instead, data query costs are incorporated into the advanced data insights billing.

▼

When should I choose the analytics tier versus the data lake tier from a cost perspective?

This comparison focuses solely on cost and does not consider feature differences.

- In North Europe, analytics ingestion costs $5.16 USD per GB (Pay-as-you-Go pricing).
- Ingesting data into the data lake costs $0.06 USD per GB (Ingestion) plus $0.12 USD per GB (data processing), totaling $0.18 USD per GB.
- Querying data lake data costs $0.006 USD per GB.

Therefore, ingesting data into the data lake is $4.98 USD per GB cheaper than analytics ingestion.
From a cost perspective, you can run 4.98 / 0.006 ≈ 830 queries on the same amount of data in the data lake before analytics becomes the more economical option (excluding data lake storage costs).

If your team runs queries that look back 90 days, with one query per day, the same data will be queried about 90 times. This means you can conduct roughly 830 / 90 ≈ 9 queries per day. After that it will be more expensive than pushing the data into analytics tier.

If your team only looks back 7 days (a more realistic scenario), you could run roughly 830 / 7 ≈ 120 queries per day. While that may seem high, interactive query execution often involves many trial runs or incorrect queries, generating additional charges. It's important to carefully monitor and manage query patterns in your environment.

Updates

Important Updates

2025/10/15

Data Processing Cost Clarification

Contrary to Microsoft’s earlier statements, a recent update confirms that the data processing charge applies to all data reaching the ingestion pipeline, not only to the data that is ultimately ingested.

This means that even if you filter out data using Data Collection Rules (DCR), you will still incur the data processing charge for all data that reaches the pipeline before being filtered.

This has been updated for auxiliary logs now: Microsoft Learn - Data Collection Transformations

Note: This clarification is currently documented for auxiliary logs but not yet specifically updated for Sentinel data lake documentation.

Impact: The calculator and cost element descriptions have been updated to reflect this change. This also means, using a pipeline solution that can pre-filter your data can actually decrease the cost of your Sentinel data lake.

📘 How to Use the Cost Calculator

Getting Started

This calculator helps you estimate Microsoft Sentinel costs for both the analytics tier and the data lake tier. Follow these steps to get accurate cost projections:

Configure Pricing: Start by expanding the "Pricing Configuration" section (above) and either select your Azure region and Commitment Tier or enter custom pricing values for your specific agreement.
Set Ingestion Rates: Enter your daily data ingestion volumes in GB/day for both analytics and data lake tiers.
Define Retention: Specify how long you want to retain data in each tier (in days).
Configure Expected Usage: Provide information about the expected usage of your data lake.
Review Results: The calculator automatically updates cost projections, showing monthly costs and a 12-year cost graph.
Download: Download the calculation in CSV for further processing or saving.

Accumulating Mode

What it does: This mode simulates starting with zero stored data and gradually building up to your retention threshold over time.

Enabled (Default): Costs increase gradually as data accumulates until reaching steady-state at your retention period. This represents a realistic deployment scenario. (For NEW deployments)
Disabled: Assumes you immediately have the full retention period's worth of data stored. Useful for calculating steady-state costs. (For existing deployments)

How the Calculation Works

The calculator uses the following logic to compute your costs:

analytics tier costs

Ingestion: Calculated as (Daily Ingestion GB) × (Days in Month) × (Price per GB). Includes commitment tier calculations if applicable. Uses fixed 30 day months.
analytics retention: Only charged for data retained beyond 90 days. Formula: (Daily Ingestion) × (Days beyond 90) × (Retention Price per GB/month).
Total Retention: The new long-term retention feature uses data lake storage. To configre it, set your data lake retention period longer than your analytics retention period. You only incur charges for data stored beyond the analytics retention limit. Cost formula: Formula: (Daily Ingestion) × (Days beyond analytics retention) × (data lake storage Price per GB/month).
Accumulation: When enabled, data volume increases linearly from day 1 until reaching the retention threshold.

data lake tier costs

Ingestion: Charged per GB ingested directly into data lake tables (not mirrored from analytics).
Processing: Charged per GB of data reaching the ingestion pipeline of data lake only tables (not mirrored from analytics). E.g if you send 100 GB to the DCR and you filter out 90GB, then 10 GB goes to data lake ingestion and 100 GB goes to data lake data processing.
Storage: Only charged for data beyond analytics retention period (analytics retention for data lake-only tables is zero day). Includes 6:1 compression discount applied automatically.
Query: Calculated as (Monthly Query Volume GB) × (Price per GB scanned).
Advanced Insights: Charged per compute hour when running advanced analytics on data lake data.

Key Calculation Notes

analytics data is automatically mirrored to data lake at no extra ingestion and data processing cost.
Data lake storage is free during the analytics retention window (on a per-table basis).
Calculations are based on 30-day months.
The graph shows cumulative costs over a 12-year period (144 months).
Commitment tier minimum ingestion requirement is taken into consideration.
The calculator assumes a global Total Retention configuration (configured via the data lake storage option).

Understanding the Results

The calculator provides multiple views of your costs:

Summary Cards: Show total costs for analytics, data lake, and combined totals.
Monthly Cost Projection Graph: Visualizes cost trends over 144 months. Click legend items to show/hide specific cost components.
Cost Breakdown: Click any summary card to see detailed cost breakdowns for that month.
Download: Export detailed monthly cost data as CSV for further analysis in Excel or other tools.