Chapter 1. What Is Microsoft Fabric?

Before we really get started, we’d like to introduce the structure of this book. This book is organized into three parts. In Part 1, we’ll talk about the foundation of fabric (Chapter 1-3), in Part 2 we’ll dive deeper into all the features and offerings of Fabric (Chapters 4-15,) followed by Part 3, guidelines on when to use what, and governance and administration (Chapters 16-18).

The why and the what of Microsoft Fabric

Let us start with the question, what is Microsoft Fabric? Or to take it even further, why do we need such an offering?

Fabric is the answer to a (if not the) challenge of our times as data professionals:

All organizations have data that surfaces in different formats, is constantly growing, and needs to be consumed by users to drive decisions and generate insights. Those insights need to run through different workloads from real-time queries through analytical workloads, reporting, or AI-driven applications as you’ll see in Figure 1-1.

Requirements across an organization
Figure 1-1. Requirements across an organization

Those users or personas usually have vastly different skill sets when it comes to working with data. They may not have any coding skills at all or they may be using different languages like Python or SQL across multiple teams in your organization as illustrated in Figure 1-2.

Skillsets across an organization
Figure 1-2. Skillsets across an organization

This means that to be able to cope with these new challenges and tasks, like responding to unprecedented amounts of data in real time, in our data driven world, you need to produce a solution that caters to all those different needs. You will need to allow different users to use different tools that match their skillset and needs, as well as to organize their data in a way that works for them. This can be raw data, a data lake, a data warehouse, or just a data mart.

Microsoft Fabric aims to solve all these problems in one single offering.

The Big Picture of Fabric

Fabric is an end-to-end solution that addresses exactly those complexities, challenges, and needs with a single product – or at least a single brand, as you’ll see in Figure 1-3.

Fabric Overview
Figure 1-3. Fabric Overview

Fabric includes solutions, features, services, and tools for all the needs along the way from ETL, to building a data lake or warehouse, real-time analytics, and visualization. Above all that, Fabric provides Artificial Intelligence and Copilots along the way to support your entire experience and journey.

While we will talk about all the concepts, components and offerings in Fabric in more detail in Chapter 4, let us give you a 30.000 feet overview of what they are and how they contribute to the Fabric ecosystem.

Workspaces and Domains

Your main organizational units in Fabric are called workspaces and your workspaces can be grouped into domains.

Your domains and workspaces are the organizational overarching concepts for both your data as well as your lakehouses, your notebooks, your warehouses, your pipelines and all your other artifacts.

Your workspaces are also what will be controlled through CI/CD integrations.

All workspaces in Fabric support Power BI premium features.

OneLake

Fabric uses one joint storage layer called OneLake as its foundation to house all your data. OneLake is often referred to as the OneDrive for Data. Figure 1-4 shows the general overview of OneLake including its main components and APIs.

Overview of OneLake
Figure 1-4. Overview of OneLake

OneLake is basically an Azure Data Lake Storage Gen2 account (Microsoft’s data lake solution for big data analytics) that manages all your data and artifacts across your domains and workspaces. In addition to storing data, which can be ingested in a multitude of ways (more on that in Chapter 3), it also comes with a data virtualization concept called shortcuts which allows you to access and use data in real-time that is sitting in other storage accounts like an AWS S3 bucket or Dataverse.

Data in OneLake is natively stored in Delta format but OneLake also supports access to storage accounts that use the Apache Iceberg format, another open-source table format that is - just like delta - agnostic to processing engine and file format.

Data Factory

Microsoft Fabric’s Data Factory is designed to offer cloud-scale data movement and transformation services, simplifying the management of complex data integration and ETL (Extract, Transform, Load) processes. It provides a user-friendly, robust, and enterprise-grade experience for data management. As the successor to Azure Data Factory, this service has evolved to incorporate cloud-scale capabilities, addressing the most intricate ETL challenges. Data Factory delivers a modern data integration experience, enabling users to ingest, prepare, and transform data from various sources, including databases, data warehouses, lakehouses, and real-time data streams.

Data Engineering

Synapse Data Engineering in Microsoft Fabric is a comprehensive suite tailored for data engineers to efficiently manage and transform large data volumes using Spark, facilitating the creation of lakehouse architectures. A lakehouse integrates the scalable storage of a data lake with the management capabilities of a data warehouse, providing a unified platform for data ingestion, preparation, and sharing. This architecture supports SQL queries, analytics, machine learning, and other advanced techniques on both structured and unstructured data. Microsoft Fabric enables users to create and manage lakehouses, design data movement pipelines, and utilize Spark job definitions for batch or streaming jobs. Additionally, notebooks are available for writing code related to data ingestion, preparation, and transformation.

Data Science

Synapse Data Science in Microsoft Fabric provides a robust platform for data scientists, supporting an end-to-end workflow for building, deploying, and operationalizing machine learning models. Key features include the use of R and Python in notebooks for data exploration, Data Wrangler for simplified analysis, and MLFlow for tracking and comparing model experiments. Users can efficiently perform batch scoring at scale with Predict, benefiting from deep integration across Fabric’s stack. This integration enables seamless data scoring in lakehouses, writing back predictions, and visualizing data in reports.

Data Warehouse

Synapse Data Warehouse in Microsoft Fabric streamlines data insights while ensuring robust data security and governance through T-SQL constructs. It utilizes a distributed engine to deliver high performance and scalability. Data stored in parquet format supports ACID transactions and interoperability across platforms, eliminating the need to duplicate data. Microsoft Fabric provides two warehousing experiences: the SQL analytics endpoint of the Lakehouse for read-only queries and the Synapse Data Warehouse for full transactional support, accommodating various data ingestion methods. This dual offering caters to diverse user needs, from data engineering to complex SQL operations.

Real-Time Intelligence

Synapse Real-Time Intelligence in Microsoft Fabric is a fully managed big data analytics platform tailored for streaming and time-series data. It simplifies data integration, allowing organizations to scale their analytics solutions and make data accessible to various users, from citizen data scientists to advanced data engineers. The platform includes features specifically designed for real-time analytics, such as automatic data streaming, indexing, and partitioning for any data source or format, along with on-demand query generation and visualizations. This enables organizations to quickly access data insights with minimal effort and high efficiency.

Power BI

You probably know this one, but if not: Power BI is a business analytics tool that enables users to visualize data and share insights across their organization or embed them in an app or website. It provides interactive visualizations and business intelligence capabilities with an interface simple enough for end users to create their reports and dashboards. Power BI is now part of Microsoft Fabric, enhancing its integration and scalability within the broader ecosystem of Microsoft’s cloud services. This integration allows users to seamlessly connect Power BI with other Fabric services, streamlining the process of data analysis and visualization within a unified analytics solution.

Data Activator

Data Activator in Microsoft Fabric is a no-code tool designed to automate actions based on patterns or conditions detected in changing data. It allows users to monitor data within Power BI reports and Event Streams, triggering actions when specified thresholds or patterns are met. This tool streamlines the process of responding to data changes, enabling users to efficiently manage data-driven actions without the need for coding.

Copilots

And, as previously mentioned, a variety of AI and copilot support along the way. There is not a single copilot in Fabric but a multitude of them, serving different purposes. They need to be enabled and require a certain minimum system size. More on that later as well as in the documentation (https://learn.microsoft.com/en-us/fabric/get-started/copilot-fabric-overview).

The Fabric Roadmap

Note

Another thing to be aware of is that Fabric is very agile. Some of the things, features and options in this book probably already got enhanced in one way or another by the time you play with it.

The Fabric public preview was announced at Microsoft Build in May 2023.

It went to GA at Microsoft Ignite in November 2023.

Another huge feature wave was announced at the Fabric Conferences in the US and Europe in 2024 as well as at Build and Ignite 2024 again. Figure 1-5 gives you an overview of the initial and post-launch feature waves of Fabric.

Microsoft Fabric Release Cycle and History
Figure 1-5. Microsoft Fabric Release Cycle and History

And since then, again, a huge list of new features has been provided, some of them still in preview.

Fabric constantly keeps getting new features and capabilities, so we’re going to see new features and the future popping up on very short cadences.

No matter how fast we publish this book or an update of it, the actual roadmap will always be out of date by the time of publishing due to the speed and agility.

Check out https://aka.ms/FabricRoadmap for an always up to date list of features that are currently the making or in preview.

The Fabric Pricing Model

Another interesting part about fabric is licensing or rather: Pricing. Rather than sizing, deploying, and paying for a variety of individual services and offerings, you are basically paying for one single product.

The fabric pricing consists of three components: storage, compute, and user licenses as you’ll see in Figure 1-6.

The three components of Fabric pricing
Figure 1-6. The three components of Fabric pricing

Compute & Capacities

When we talk about Compute in Fabric, we talk about something called capacities. Capacities form the compute power of your entire Fabric deployment, no matter which specific offerings you’re using so the words compute, and capacity are synonymous in the context of Fabric. Figure 1-7 for example is shows two capacities using different sizes in different regions.

Capacity sizes and settings
Figure 1-7. Capacity sizes and settings

Your entire fabric environment can be based off a single capacity, or you can run multiple capacities which will have different names, may sit in different regions and they may have different sizes. Depending on their sizes, they will have the ability to host a certain number of workspaces and workloads.

Capacity types

There are three kinds of fabric capacities: Fabric, Fabric Reserved and Fabric Trial. They are all shown in Figure 1-8.

Fabric capacity types
Figure 1-8. Fabric capacity types

A regular Fabric capacity will be billed on a pay as you go basis, so you only pay for the compute time your capacities are running whereas with a reserved capacity, you enter a commitment for a certain amount of time in exchange for a discount.

Both, Fabric and Fabric Reserved capacities will use Azure billing.

Trials give you access to almost the full feature set of Fabric at no cost for 60 days with the equivalent of an F64 capacity – more on that in the next section. The most important feature currently missing from trials are Copilots.

While any capacity can be paused, this usually only makes sense for Pay as You Go workloads as with a paused capacity, you will only pay for it while it is running – a reserved capacity will be billed 24/7 anyway. The process of pausing is for the capacity however – not the workload so unless you pause your capacity, you will still be billed even with no workload happening.

Capacity sizes

Fabric capacities come in different sizes and the size determines the number of capacity units – which is an equally abstract metric as you may have come across in other cloud offerings - and therefore the amount, performance and size of workloads this capacity can provide.

The smallest one is an F2 - which has 2 capacity units - followed by an F4 up to the largest one available right now: an F2048 which has 2,048 capacity units.

To make this a bit more transparent: A day has 86,400 seconds. An F2 capacity has 2 capacity units so

2*86,400 seconds = 172,800 compute seconds available to you while an F8 capacity has 8*86,400 resulting in 691,200 compute seconds available to you in a day (assuming you are running your capacity 24 hours a day) as you can also see in Figure 1-9.

Meaning of capacity sizes
Figure 1-9. Meaning of capacity sizes

Unfortunately, there is no great guidance or general rule on how much can be achieved within a compute second as that is very workload (and offering) specific so the best practice when it comes to sizing your environment is still to start with an educated guess and then work your way up or down from there.

Capacity Bursting (and Smoothing)

Even the smallest capacities allow you to make use of a feature called bursting. Cool. What does that mean?

Let’s assume you run an F8 capacity, and you keep it running 24/7.

We now make another assumption: You run a workload that would usually take 10 minutes to complete using your F8 capacity.

Now through bursting, Fabric *MAY* grant you additional compute power – more than you pay for – so this workload completes in 5 minutes. As if you had bought an F16 capacity (see Figure 1-10).

Bursting
Figure 1-10. Bursting

So, Microsoft is gifting you free compute? Well, no. Think of it more as a credit line. Any extra compute that you consumed through bursting will need to be recovered through its counterpart, smoothing within 24 hours.

How does that impact you?

If you don’t use the full compute units all the time, the smoothing will simply be recovered through the spare capacity you’re not using as shown in Figure 1-11.

Previous bursting being recovered in a less idle period
Figure 1-11. Previous bursting being recovered in a less idle period

If you constantly use the entire compute units available to you, at some point, as illustrated in Figure 1-12, your capacity will be throttled so while still paying for your F8 capacity, you may only get to use 4 capacity units for some time for example.

Previous bursting being recovered through throttling
Figure 1-12. Previous bursting being recovered through throttling

And if you simply pause your capacity right after bursting? Then you will continue to be billed until you’ve paid your dues or in other words: until all the extra compute units used by bursting have been recovered. This is also illustrated in the “Pause” section of Figure 1-13.

Previous bursting being recovered in a period where the capacity is pause
Figure 1-13. Previous bursting being recovered in a period where the capacity is pause

Capacity limitations

Some features require a certain minimum SKU.

Copilots for example, will only work if you have a F64 capacity or higher.

Similar restrictions apply to other features.

You can find an overview of current limitations at https://learn.microsoft.com/en-us/fabric/enterprise/fabric-features. That overview however isn’t always complete unfortunately. so if you are relying on a specific feature, make sure to double check its prerequisites in the documentation.

Storage

Storage pricing is straightforward. You simply pay for the storage you are using in OneLake, and you pay that per gigabyte.

Storage will always be billed independently of compute so even when you pause your capacity, you will pay for the storage used. You cannot pause your storage.

If you are making use of Mirroring, a way to replicate data into OneLake from other sources, you can make use of free storage for your mirrored data. We will explain more about Mirroring in a later chapter, but for now, with Mirroring, you receive free storage for replicas up to a certain limit based on the purchased compute capacity SKU you provision (1 TB per capacity unit). For example, if you purchase an F8 capacity, you will get 8 free terabytes worth of storage. OneLake storage is billed only when the free Mirroring storage limit is exceeded, or the provisioned compute capacity is paused.

User Licenses

User licenses do not play a significant role within Fabric with one exception: Power BI.

If you are not using Power BI within Fabric or your capacity has a size of F64 or above, only report creators will need a Pro license. Only If you are using Power BI on a capacity that is smaller than F64, you will require a Power BI Pro license for every user that is using Power BI in workspaces within that capacity.

Otherwise, no paid user licenses are required, and your workloads are being charged through the cost of capacities and storage alone. Every user accessing a workspace without a Power BI License will need a Free Fabric license though.

Networking

Charges for Networking have been announced but are not implemented yet.

Regional Differences

As with all cloud services, keep in mind that prices also vary by region and sometimes the differences are surprisingly big. Make sure to take a look at the Microsoft pricing page (https://azure.microsoft.com/en-us/pricing/details/microsoft-fabric/#pricing) to get a full overview of Microsoft fabric pricing applicable to you. Your choice of a region should of course also take latency as well as potential data residency restrictions into account.

Also, when making use of some offerings like OpenAI – for example Copilots, those aren’t available in every region so pricing, feature offering, latency and data residency should all be factored into your decision on where to deploy Fabric.

Summary

Let us try to wrap up this first chapter. Fabric is an end-to-end software as a service solution, meaning you only provision a single Azure resource despite potentially using hundreds of different workloads which simplifies your deployments, and you also only receive a single bill for that.

Fabric is lake centric which means that all its data is being stored in OneLake. With OneLake on one and capacities on the other side, Fabric gives you full separation of data and compute.

Copilots enhance the entire experience to get you to your results even quicker.

Fabric is organized through shared workspaces which are all premium workspaces so users can share their data as well as their artifacts.

While we have not looked at those aspects yet, Fabric is of course a governed platform including security, monitoring and role concepts. We will get into that in Chapter 7: Microsoft Fabric - Administration, Security, Governance and Monitoring.

Our next Chapter, All Roads Lead to OneLake, will focus on how to access OneLake, it’s architecture and most importantly, how to ingest data to and retrieve data from OneLake.

Get Fundamentals of Microsoft Fabric now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.