August 25, 2025October 17, 2025 Microsoft Sentinel Data lake (preview) Table of Contents UPDATE: Sentinel Data Lake is on GA. Released on 30/9/2025. Read more here.Microsoft released the Sentinel data lake to public preview about three weeks ago. What does it mean?Sentinel has been great tool for SOC analyst to see everything from one-point what is happening in cybersecurity environment. When it has been properly configured with different data sources, analytic rules, playbooks and other settings Sentinel have been your “all-you-need” tool at least for those who like to use Microsoft Security products. "You can’t protect what you can’t see" But because there are so huge amount of security logs everywhere there are always this thing called the $cost$ of using these products. IT decision makers need to make painful tradeoffs: reduce logging by risking blind spots, shorten retention by compromising forensic depth when aiming to manage all their security data within a SIEM. This creates a paradox—more data makes security harder. Without unified, long-term visibility, even advanced AI falls short, leading to missed threats, slower investigations, and wasted tools. Microsoft Sentinel data lake is trying to solve this problem and with that Microsoft is also bringing Microsoft Defender Threat Intelligence (MDTI) capabilities for free of charge for Defender XDR and Sentinel starting in October 2025 when all Microsoft first-party threat reports, including intel profiles and indicators of compromise (IoCs), will be available in Defender XDR. SOURCE: Microsoft Defender Threat Intelligence web page So what is the Sentinel data lake? Microsoft Sentinel data lake is a cloud-native security data lake that transforms how organizations manage and analyze security data. It is fully managed, so you don’t need to deploy or maintain data infrastructure. It provides a unified data platform for end-to-end threat analysis and response. It stores a single copy of security data across assets, activity logs, and threat intelligence in the lake and leverages multiple analytics tools like KQL and Jupyter notebooks for deep security analytics. Traditional SIEM solutions struggle with the cost and complexity of storing and querying long-term security data. Sentinel data lake enables deep security insights with up to 12 years of security data and telemetry you can query and analyze. Sentinel data lake enables SOC teams into the next era of security operations. Being able to ensure coverage of your security estate—across all security data sources and vast time horizons—enables security teams to proactively detect latent cyberattacks, detect emerging cyberthreats with AI-powered models, reconstruct cyberattack timelines in forensic detail, and retroactively uncover indicators of compromise that might otherwise go unnoticed. It’s built on Azure‘s scalable infrastructure. More in MS Learn. SOURCE: Microsoft Security Microsoft Sentinel has expanded with modern security data lake infrastructure to unify and manage all of your data with a single copy and run analytics on over it.Let’s explore the key features:1. Unified data management. This is absolutely critical.Think about it: security teams need to easily onboard all their security data, regardless of its format – structured, semi-structured, or unstructured. In addition, customers need to have access to their existing security data and platforms through federation. This means integrations such as with popular tools like AWS S3. This flexibility is essential, allowing us to unify data and provide complete visibility across the ecosystem.2. Microsoft delivers cost-effective storage in the data lake using Delta Parquet. ( Built on Microsoft Fabric Lakehouse data architecture platform). This is a game-changer. It not only allows us to perform real-time analytics with SQL and Spark, but it does so while maintaining a single copy of the data. And we need to be able easily manage the data tiering between hot and cold to ensure flexibility, value and significant cost reductions.3. The real power of a modern security data lakes advanced analytics capabilities. We’re talking about things like the graph and machine learning models. These advanced techniques enable real-time processing and semantic search, transforming the data lake into a truly powerful tool for addressing today’s complex security challenges.By leveraging Microsoft’s infrastructure leadership with a purpose-built data lake and management system, Sentinel is an even more powerful tool for modern security needs. Storage tiers: Analytics or Data lake tier You can retain data in Microsoft Sentinel in one of two tiers: Analytics tier: This tier makes data available for alerting, hunting, workbooks, and all Microsoft Sentinel features. It retains data in two states: Analytics retention: In this “hot” state, data is fully available for real-time analytics – including high-performance queries and analytics rules – and threat hunting. By default, Microsoft Sentinel and Microsoft Defender XDR retain data in this tier for 30 days. You can extend the retention period of all tables to up to two years at a prorated monthly long-term retention charge. You can extend the retention period of Microsoft Sentinel solution tables to 90 days for free. Total retention: By default, all data in the analytics tier is mirrored to the data lake for the same retention period. You can extend the retention of your data in the lake beyond the analytics retention, for up to 12 years of total retention at a low cost. Data lake tier: In this low-cost “cold” tier, Microsoft Sentinel retains your data in the lake only. Data in the data lake tier isn’t available for real-time analytics features and threat hunting. However, you can access data in the lake whenever you need it through KQL jobs, analyze trends over time by running scheduled KQL or Spark jobs, and aggregate insights from incoming data at a regular cadence by using summary rules. The comparison of these tiers is available here. I will also describe it below. SOURCE: MS Learn (Click to enlarge) The comparison of data tiers Comparison Analytics tier Data lake tier Key characteristics High-performance querying and indexing for logs (also knows as hot, or interactive retention) Cost-effective long-term retention of large data volumes (also known as cold storage). Best for Real-time analytics rules, alerting, hunting, workbooks, and all Sentinel features -Compliance and regulatory logging,-historical trend analysis and forensics,-low-touch data that’s not needed for real-time alerts Ingestion cost Standard Minimal Query price included ✅ ❌ Query performance optimized ❌ ✅ Query capabilities Full query capabilities in the Microsoft Defender and Azure portals and using APIs. – Full query capabilities including unions and joins.– Run scheduled KQL or Spark jobs.– Use Notebooks. Full set of real-time analytics features ✅ ❌Limitations on some features, including analytics rules, hunting queries, parsers, watchlists, workbooks, and playbooks (in preview). Search jobs ✅ ✅ Summary rules ✅ ✅ Single table KQL, which you can extend with data from an analytics table using lookup Restore ✅ ❌ Data export ✅ ❌ Retention period 90 days for Microsoft Sentinel, 30 days for Microsoft Defender XDR.Can be extended to up to two years at a prorated monthly long-term retention charge. Same as analytics retention, by default. Can be extended to up to 12 years. Access to Sentinel data lake When you have passed the pre-requisites and the onboarding you can see the portal access from Defender XDR portal menu: SOURCE:Defender portal / MS Learn KQL Jobs You can create a job to run a KQL query against the data in the data lake tier and promote the results to the analytics tier. You can create jobs to run on a schedule or one-time. When you create a job, you specify the destination workspace and table for the results. The results can be written to a new table or appended to an existing table in the analytics tier. You can create and manage jobs from the Jobs management page under Data lake exploration in the navigation panel. Use this page to create new jobs, view their status and details, and run, edit, delete, or disable jobs. For more information, see Manage KQL jobs. There are pre-requisites for the KQL Jobs. SOURCE:Defender portal / MS Learn Jupyter notebooks Jupyter notebooks in the Microsoft Sentinel data lake offer a powerful environment for data analysis and machine learning. Use Python libraries to build and run machine learning models, conduct advanced analytics, and visualize your data. The notebooks support rich visualizations, enabling you to gain insights from your security data. Schedule notebooks to summarize data regularly, run machine learning models, and promote data from the data lake tier to the analytics tier. The notebooks are provided by the Microsoft Sentinel Visual Studio Code extension (preview) that allows you to interact with the data lake using Python for Spark (PySpark). More info in MS learn. To create Jupyter Notebook with VSCODE. SOURCE: VS Code / MS Learn Data lake exploration scenarios for notebooks Scenario Description User behavior from failed sign ins Establish a baseline of normal user behavior by analyzing patterns of failed sign in attempts. Investigate operations attempted before and after the failed logins to detect potential compromise or brute-force activity. Sensitive data paths Identify users and devices that have access to sensitive data assets. Combine access logs with organizational context to assess risk exposure, map access paths, and prioritize areas for security review. Anomaly threat analysis Analyze threats by identifying deviations from established baselines, such as logins from unusual locations, devices, or times. Overlay user behavior with asset data to identify high-risk activity, including potential insider threats. Risk-scoring prioritization Apply custom risk scoring models to security events in the data lake. Enrich events with contextual signals such as asset criticality and user role to quantify risk, assess blast radius, and prioritize incidents for investigation. Exploratory analysis and visualization Perform exploratory data analysis across multiple log sources to reconstruct attack timelines, determine root causes, and build custom visualizations that help communicate findings to stakeholders. Microsoft Sentinel extension setup for VS Code VS Code extension installation Read more of the setup: Running notebooks on the Microsoft Sentinel data lake (preview) – Microsoft Security | Microsoft Learn (Preview) Pricing NOTE The preview pricing below is provided in USD for reference only, based on pricing in the East US region and does not include taxes or currency adjustments. SKU Data lake ingestionData processingData lake storageData lake queryAdvanced data insights Meter type Data Processed (GB)Data Processed (GB)Data Stored (GB/Month)Data Analyzed (GB)1 Compute Hour Price $0.05$0.10$0.026$0.005$0.15 Starting August 4, 2025, refer to the Microsoft Sentinel pricing page for current pricing within your relevant region. NOTE Prices are estimates only and are not intended as actual price quotes. Actual pricing may vary depending on the type of agreement entered with Microsoft, date of purchase, the currency exchange rate and taxes which may be applicable. During preview, the data lake tier includes 30 days of free storage. Data processing in the data lake is also available at no cost during this time. For more information on these meters, see Data lake tier. Preview limitations Public preview has limitations: Microsoft Sentinel tables with the Basic plan can only be managed from the Log Analytics workspace. For more information, see Manage tables in a Log Analytics workspace. To manage retention and tiering from the Microsoft Defender portal, change the table plan to analytics from the Log Analytics workspace. Some Microsoft Defender XDR tables can only be viewed in the Microsoft Defender portal. Currently, the Microsoft Defender portal supports managing these Microsoft Defender XDR tables: AlertEvidence AlertInfo CampaignInfo CloudAppEvents DeviceEvents DeviceFileCertificateInfo DeviceFileEvents DeviceImageLoadEvents DeviceInfo DeviceLogonEvents DeviceNetworkEvents DeviceNetworkInfo DeviceProcessEvents DeviceRegistryEvents EmailAttachmentInfo EmailEvents EmailPostDeliveryEvents EmailUrlInfo FileMaliciousContentInfo IdentityDirectoryEvents IdentityLogonEvents IdentityQueryEvents SecurityAlert SecurityIncident UrlClickEvents Public resources There are some public resources available: News: Blog Azure security podcast: Link Blog: Tech community blogCut costs & boost threat detection Pricing blog: Pricing Sentinel in MS learn Link Sentinel data lake in MS learn Link How to use/enable and set-up the unified datalake Jeffrey Appel’s blog about the setup Jussi Metso Author is a lifelong IT enthusiast, Microsoft Security MVP and interested in Cloud Security, XDR, SIEM and AI. Motto: Learning is the key for your future. Share on Social Media x facebook linkedinwhatsapp Discover more from Jussi Metso Subscribe to get the latest posts sent to your email. Type your email… Subscribe SENTINEL #sentinel#siem#soc
SENTINEL Sentinel – New incident experience January 19, 2023January 19, 2023 Table of Contents New incident experience Microsoft Sentinel is your bird’s-eye view across the enterprise… Read More
SENTINEL Modernizing your on-prem SIEM with Microsoft Sentinel – part 1 June 27, 2025June 27, 2025 Are you wondering to transfer your classic on-prem SIEM to fancy and modernized cloud SIEM. Read my suggestions of the advances of Microsoft Sentinel Read More
SENTINEL Microsoft Sentinel All-in-One v2 June 8, 2023January 15, 2024 What is Microsoft Sentinel? Table of Contents Update Jan 15th, 2024:There’s a good Microsoft Sentinel-All-One… Read More