# Azure Data Lake Storage

<Image
  src={storageIcon}
  alt="Azure Data Lake Storage logo"
  height={80}
  width={80}
  class:list={'float-inline-left icon'}
  data-zoom-off
/>

[Azure Data Lake Storage Gen2](https://azure.microsoft.com/services/storage/data-lake-storage/) is a set of capabilities built on Azure Blob Storage for big data analytics. The Aspire Azure Data Lake Storage integration enables you to connect to Azure Data Lake Storage instances from your applications.

## Hosting integration

The Azure Data Lake Storage hosting integration models a Data Lake resource as a child of an Azure Storage resource. To add a Data Lake resource, install the [📦 Aspire.Hosting.Azure.Storage](https://www.nuget.org/packages/Aspire.Hosting.Azure.Storage) NuGet package in your [AppHost](/get-started/app-host/) project:

<InstallPackage packageName="Aspire.Hosting.Azure.Storage" />

### Add Azure Data Lake resource

In your AppHost project, call `AddDataLake` on an `IResourceBuilder<AzureStorageResource>` to add a Data Lake resource:

```csharp title="C# — AppHost.cs"
var builder = DistributedApplication.CreateBuilder(args);

var storage = builder.AddAzureStorage("storage");
var dataLake = storage.AddDataLake("datalake");

builder.AddProject<Projects.ExampleProject>()
    .WithReference(dataLake);

builder.Build().Run();
```

The preceding code:

- Adds an Azure Storage resource named `storage`.
- Adds a Data Lake resource named `datalake` as a child of the storage resource.
- Passes a reference to the Data Lake resource to the `ExampleProject`.
**Emulator not supported:** Azure Data Lake Storage does **not** support the Azure Storage emulator. You must use an actual Azure Storage account with hierarchical namespace (HNS) enabled, or use connection strings for local development.

### Add Azure Data Lake file system resource

You can also add a Data Lake file system resource directly from the storage resource:

```csharp title="C# — AppHost.cs"
var builder = DistributedApplication.CreateBuilder(args);

var storage = builder.AddAzureStorage("storage");
var dataLake = storage.AddDataLake("datalake");
var fileSystem = storage.AddDataLakeFileSystem("analytics", "analytics-data");

builder.AddProject<Projects.ExampleProject>()
    .WithReference(dataLake)
    .WithReference(fileSystem);

builder.Build().Run();
```

The `AddDataLakeFileSystem` method takes:
- `name`: The resource name used in Aspire
- `dataLakeFileSystemName` (optional): The actual file system name in Azure. Defaults to the resource name if not specified.
**Note:** Both `AddDataLake` and `AddDataLakeFileSystem` are called on the storage resource (`IResourceBuilder<AzureStorageResource>`), not on each other. When you call either method, the storage account is automatically configured with hierarchical namespace (HNS) enabled, which is required for Data Lake Storage Gen2.

### Customize provisioning infrastructure

The Data Lake resource is part of the Azure Storage resource, which is a subclass of `AzureProvisioningResource`. You can customize the generated Bicep using the `ConfigureInfrastructure` API on the storage resource. For example, you can configure the storage SKU, access tier, and other properties:

```csharp title="AppHost.cs"
var builder = DistributedApplication.CreateBuilder(args);

var storage = builder.AddAzureStorage("storage")
    .ConfigureInfrastructure(infra =>
    {
        var storageAccount = infra.GetProvisionableResources()
            .OfType<StorageAccount>()
            .Single();

        storageAccount.Sku = new StorageSku { Name = StorageSkuName.PremiumLrs };
        storageAccount.Tags.Add("workload", "analytics");
    });

var dataLake = storage.AddDataLake("datalake");

builder.AddProject<Projects.ExampleProject>()
    .WithReference(dataLake);
```

For more information on customizing Azure Storage provisioning, see [Azure Blob Storage: Customize provisioning infrastructure](/integrations/cloud/azure/azure-storage-blobs/azure-storage-blobs-host/#customize-provisioning-infrastructure).

## Client integration

To get started with the Aspire Azure Data Lake Storage client integration, install the [📦 Aspire.Azure.Storage.Files.DataLake](https://www.nuget.org/packages/Aspire.Azure.Storage.Files.DataLake) NuGet package in your client-consuming project:

<InstallDotNetPackage packageName="Aspire.Azure.Storage.Files.DataLake" />

### Add Azure Data Lake service client

In the `Program.cs` file of your client-consuming project, call `AddAzureDataLakeServiceClient` to register a `DataLakeServiceClient` for dependency injection:

```csharp
builder.AddAzureDataLakeServiceClient("datalake");
```

You can then retrieve the `DataLakeServiceClient` instance using dependency injection:

```csharp
public class ExampleService(DataLakeServiceClient client)
{
    // Use client...
}
```

### Add Azure Data Lake file system client

You can also register a `DataLakeFileSystemClient` for accessing a specific file system:

```csharp
builder.AddAzureDataLakeFileSystemClient("analytics");
```

You can then retrieve the `DataLakeFileSystemClient` instance using dependency injection:

```csharp
public class ExampleService(DataLakeFileSystemClient client)
{
    // Use client...
}
```

### Keyed services

Both client methods have keyed variants for registering multiple clients:

```csharp
builder.AddKeyedAzureDataLakeServiceClient("datalake1");
builder.AddKeyedAzureDataLakeServiceClient("datalake2");

builder.AddKeyedAzureDataLakeFileSystemClient("analytics");
builder.AddKeyedAzureDataLakeFileSystemClient("archive");
```

### Configuration

The Azure Data Lake Storage client integration supports multiple configuration approaches.

#### Use a connection string

Provide the connection name when calling `AddAzureDataLakeServiceClient`:

```csharp
builder.AddAzureDataLakeServiceClient("datalake");
```

The connection string is retrieved from the `ConnectionStrings` section. Two formats are supported:

**Service URI (recommended)**:

```json
{
  "ConnectionStrings": {
    "datalake": "https://{account_name}.dfs.core.windows.net/"
  }
}
```

When using a service URI, a [default credential](/integrations/cloud/azure/azure-default-credential/) is used for authentication.

**For file system clients**, include the file system name:

```json
{
  "ConnectionStrings": {
    "analytics": "https://{account_name}.dfs.core.windows.net/;FileSystemName=analytics-data"
  }
}
```

**Azure Storage connection string**:

```json
{
  "ConnectionStrings": {
    "datalake": "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net"
  }
}
```

#### Use configuration providers

The integration loads settings from the `Aspire:Azure:Storage:Files:DataLake` configuration section:

```json
{
  "Aspire": {
    "Azure": {
      "Storage": {
        "Files": {
          "DataLake": {
            "ServiceUri": "https://{account_name}.dfs.core.windows.net/",
            "DisableHealthChecks": false,
            "DisableTracing": false
          }
        }
      }
    }
  }
}
```

#### Use inline delegates

Configure settings programmatically:

```csharp
builder.AddAzureDataLakeServiceClient(
    "datalake",
    settings => settings.DisableHealthChecks = true);
```

Configure client options:

```csharp
builder.AddAzureDataLakeServiceClient(
    "datalake",
    configureClientBuilder: clientBuilder =>
        clientBuilder.ConfigureOptions(
            options => options.Diagnostics.ApplicationId = "myapp"));
```

### Client integration health checks

By default, the integration adds a health check that verifies connectivity to Azure Data Lake Storage. The health check:

- Is enabled when `DisableHealthChecks` is `false` (the default)
- Integrates with the `/health` HTTP endpoint

### Observability and telemetry

#### Logging

The integration uses these log categories:

- `Azure.Core`
- `Azure.Identity`

#### Tracing

The integration emits OpenTelemetry tracing activities:

- `Azure.Storage.Files.DataLake.DataLakeServiceClient`
- `Azure.Storage.Files.DataLake.DataLakeFileSystemClient`

#### Metrics

The Azure SDK for Data Lake Storage doesn't currently emit metrics.

## See also

- [Azure.Storage.Files.DataLake SDK documentation](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/storage/Azure.Storage.Files.DataLake/README.md)
- [Azure Data Lake Storage Gen2](https://learn.microsoft.com/azure/storage/blobs/data-lake-storage-introduction)
- [Azure Storage Blobs integration](/integrations/cloud/azure/azure-storage-blobs/azure-storage-blobs-get-started/)