Alternate: OneLake configuration for Cloud Ingest Edge Volumes
This article describes an alternate configuration for Cloud Ingest Edge Volumes (blob upload with local purge) for OneLake Lakehouses.
- Navigate to your OneLake portal; for example,
https://youraccount.powerbi.com
. - Create or navigate to your workspace.
- Select Manage Access.
- Select Add people or groups.
- Enter your extension name from your Azure Container Storage enabled by Azure Arc installation. This name must be unique within your tenant.
- Change the drop-down for permissions from Viewer to Contributor.
- Select Add.
Create a file named
cloudIngestPVC.yaml
with the following contents. Modify themetadata.name
value with a name for your Persistent Volume Claim. This name is referenced on the last line ofdeploymentExample.yaml
in the next step. You must also update themetadata.namespace
value with your intended consuming pod. If you don't have an intended consuming pod, themetadata.namespace
value isdefault
.Note
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
kind: PersistentVolumeClaim apiVersion: v1 metadata: ### Create a nane for your PVC ### name: <create-a-pvc-name-here> ### Use a namespace that matches your intended consuming pod, or "default" ### namespace: <intended-consuming-pod-or-default-here> spec: accessModes: - ReadWriteMany resources: requests: storage: 2Gi storageClassName: cloud-backed-sc
To apply
cloudIngestPVC.yaml
, run:kubectl apply -f "cloudIngestPVC.yaml"
You can use the following process to create a subvolume using Extension Identity to connect to your OneLake LakeHouse.
Get the name of your Edge Volume using the following command:
kubectl get edgevolumes
Create a file named
edgeSubvolume.yaml
and copy/paste the following contents. The following variables must be updated with your information:Note
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
metadata.name
: Create a name for your subvolume.spec.edgevolume
: This name was retrieved from the previous step usingkubectl get edgevolumes
.spec.path
: Create your own subdirectory name under the mount path. The following example already contains an example name (exampleSubDir
). If you change this path name, line 33 indeploymentExample.yaml
must be updated with the new path name. If you choose to rename the path, don't use a preceding slash.spec.container
: Details of your One Lake Data Lake Lakehouse (for example,<WORKSPACE>/<DATA_LAKE>.Datalake/Files
).spec.storageaccountendpoint
: Your storage account endpoint is the prefix of your Power BI web link. For example, if your OneLake page ishttps://contoso-motors.powerbi.com/
, then your endpoint ishttps://contoso-motors.dfs.fabric.microsoft.com
.
apiVersion: "arccontainerstorage.azure.net/v1" kind: EdgeSubvolume metadata: name: <create-a-subvolume-name-here> spec: edgevolume: <your-edge-volume-name-here> path: exampleSubDir # If you change this path, line 33 in deploymentExample.yaml must to be updated. Don't use a preceding slash. auth: authType: MANAGED_IDENTITY storageaccountendpoint: "https://<Your AZ Site>.dfs.fabric.microsoft.com/" # Your AZ site is the root of your Power BI OneLake interface URI, such as https://contoso-motors.powerbi.com container: "<WORKSPACE>/<DATA_LAKE>.Datalake/Files" # Details of your One Lake Data Lake Lakehouse ingestPolicy: edgeingestpolicy-default # Optional: See the following instructions if you want to update the ingestPolicy with your own configuration
To apply
edgeSubvolume.yaml
, run:kubectl apply -f "edgeSubvolume.yaml"
If you want to change the
ingestPolicy
from the defaultedgeingestpolicy-default
, create a file namedmyedgeingest-policy.yaml
with the following contents. The following variables must be updated with your preferences:Note
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
metadata.name
: Create a name for youringestPolicy
. This name must be updated and referenced in thespec.ingestPolicy
section of youredgeSubvolume.yaml
.spec.ingest.order
: The order in which dirty files are uploaded. This is best effort, not a guarantee (defaults tooldest-first
). Options for order are:oldest-first
ornewest-first
.spec.ingest.minDelaySec
: The minimum number of seconds before a dirty file is eligible for ingest (defaults to 60). This number can range between 0 and 31536000.spec.eviction.order
: How files are evicted (defaults tounordered
). Options for eviction order are:unordered
ornever
.spec.eviction.minDelaySec
: The number of seconds before a clean file is eligible for eviction (defaults to 300). This number can range between 0 and 31536000.
apiVersion: arccontainerstorage.azure.net/v1 kind: EdgeIngestPolicy metadata: name: <create-a-policy-name-here> # This will need to be updated and referenced in the spec.ingestPolicy section of the edgeSubvolume.yaml spec: ingest: order: <your-ingest-order> minDelaySec: <your-min-delay-sec> eviction: order: <your-eviction-order> minDelaySec: <your-min-delay-sec>
To apply
myedgeingest-policy.yaml
, run:kubectl apply -f "myedgeingest-policy.yaml"
To configure a generic single pod (Kubernetes native application) against the Persistent Volume Claim (PVC), create a file named
deploymentExample.yaml
with the following contents. Replace the values forcontainers.name
andvolumes.persistentVolumeClaim.claimName
with your own. If you updated the path name fromedgeSubvolume.yaml
,exampleSubDir
on line 33 must be updated with your new path name.Note
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
apiVersion: apps/v1 kind: Deployment metadata: name: cloudingestedgevol-deployment ### This must be unique for each deployment you choose to create. spec: replicas: 2 selector: matchLabels: name: wyvern-testclientdeployment template: metadata: name: wyvern-testclientdeployment labels: name: wyvern-testclientdeployment spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - wyvern-testclientdeployment topologyKey: kubernetes.io/hostname containers: ### Specify the container in which to launch the busy box. ### - name: <create-a-container-name-here> image: mcr.microsoft.com/azure-cli:2.57.0@sha256:c7c8a97f2dec87539983f9ded34cd40397986dcbed23ddbb5964a18edae9cd09 command: - "/bin/sh" - "-c" - "dd if=/dev/urandom of=/data/exampleSubDir/esaingesttestfile count=16 bs=1M && while true; do ls /data &>/dev/null || break; sleep 1; done" volumeMounts: ### This name must match the following volumes.name attribute ### - name: wyvern-volume ### This mountPath is where the PVC is attached to the pod's filesystem ### mountPath: "/data" volumes: ### User-defined name that's used to link the volumeMounts. This name must match volumeMounts.name as previously specified. ### - name: wyvern-volume persistentVolumeClaim: ### This claimName must refer to your PVC metadata.name claimName: <your-pvc-metadata-name-from-line-5-of-pvc-yaml>
To apply
deploymentExample.yaml
, run:kubectl apply -f "deploymentExample.yaml"
Use
kubectl get pods
to find the name of your pod. Copy this name, as you need it in the next step.Note
Because
spec.replicas
fromdeploymentExample.yaml
was specified as2
, two pods appear usingkubectl get pods
. You can choose either pod name to use for the next step.Run the following command and replace
POD_NAME_HERE
with your copied value from the previous step:kubectl exec -it POD_NAME_HERE -- sh
Change directories into the
/data
mount path as specified indeploymentExample.yaml
.You should see a directory with the name you specified as your
path
in Step 2 of the Attach subvolume to Edge Volume section. Now,cd
into/YOUR_PATH_NAME_HERE
, replacingYOUR_PATH_NAME_HERE
with your details.As an example, create a file named
file1.txt
and write to it usingecho "Hello World" > file1.txt
.In the Azure portal, navigate to your storage account and find the container specified from step 2 of Attach subvolume to Edge Volume. When you select your container, you should find
file1.txt
populated within the container. If the file hasn't appeared yet, wait approximately 1 minute; Edge Volumes waits a minute before uploading.
After you complete these steps, begin monitoring your deployment using Azure Monitor and Kubernetes Monitoring, or 3rd-party monitoring with Prometheus and Grafana.