Using KubernetesPodOperator in ADF Managed Airflow

Igor Veselinovic 20 Reputation points
2024-01-15T20:59:58.78+00:00

I've been experimenting with the Managed Airflow feature in Azure Data Factory (ADF), and I've noticed in the user interface for creating/editing an Airflow integration runtime that it mentions Kubernetes secrets and Azure Kubernetes Service (AKS), as seen in the screenshot below: User's image

So it seems like the Managed Airflow service is powered by AKS under the hood. Is there a way for us to access this AKS cluster using an Airflow KubernetesPodOperator (is the kube config file located somewhere in the file system that we can access?), or is this not supported? Would we have to set up a separate AKS cluster ourselves in order to use the KubernetesPodOperator?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,447 questions
{count} votes

Accepted answer
  1. Konstantinos Passadis 19,586 Reputation points MVP
    2024-01-16T21:27:59.0133333+00:00

    Hello @Igor Veselinovic !

    Thank yo for your message

    Secrets in Kubernetes are namespaced meaning they belong to the specific namespace so in order to access them via a Deployment you need to deploy in that same namespace.

    There are ways to sync-replicate them , for example :

    https://cert-manager.io/docs/devops-tips/syncing-secrets-across-namespaces/

    A managed service is supposed to be simple and straightforward !

    For Airflow running in a Kubernetes environment, especially when managed by a service like Azure Data Factory, it's best to follow the service's guidelines for secret management. If Airflow is managing the deployment of the KubernetesPodOperator, it will typically expect the secrets to be in the same namespace as where the operator is running. If you need to use secrets in Airflow tasks across different namespaces, you would typically replicate those secrets to the namespaces where the tasks will run.

    The dropdown you're seeing for the secret type (Private registry auth, Basic auth, Generic, etc.) is specifying the format or usage of the secret.

    For instance, "Private registry auth" could be used to store credentials for accessing private Docker registries.

    You are graspin correctly the concept. Be aware it is a Preview service meaning it s not intented for Production and also you may face issues ! Take in account the limitations :

    Limitations

    Managed Airflow in other regions is available by GA.
    
    Data Sources connecting through airflow should be accessible through public endpoint (network).
    
    DAGs that are inside a Blob Storage in VNet/behind Firewall is currently not supported. Instead we recommend using Git sync feature of Managed Airflow. See, Sync a GitHub repository in Managed Airflow
    
    Importing Dags from Azure Key Vault isn't supported in LinkedServices.
    

    Finally follow the Documentation for updates !


    I hope this helps!

    Kindly mark the answer as Accepted and Upvote in case it helped! Regards


2 additional answers

Sort by: Most helpful
  1. Konstantinos Passadis 19,586 Reputation points MVP
    2024-01-15T21:17:17.1333333+00:00

    Hello @Igor Veselinovic !

    Welcome to Microsoft QnA!

    You are refering this

    https://learn.microsoft.com/en-us/azure/data-factory/kubernetes-secret-pull-image-from-private-container-registry

    The Airflow is a Managed Environment meaning all infra is managed for you

    Managed Airflow for Azure Data Factory relies on the open-source Apache Airflow application. You can find documentation and more tutorials for Airflow on the Apache Airflow Documentation or Community webpages.

    More options on the Airflow environment setup page:

    • Enable git sync: You can allow your Airflow environment to automatically sync with a Git repository instead of manually importing DAGs. For more information, see Sync a GitHub repository in Managed Airflow.
    • Airflow configuration overrides You can override any Airflow configurations that you set in airflow.cfg. Examples are name: AIRFLOW__VAR__FOO and value: BAR. For more information, see Airflow configurations.
    • Environment variables: You can use this key value store within Airflow to store and retrieve arbitrary content or settings.
    • Requirements: You can use this option to preinstall Python libraries. You can update these requirements later.
    • Kubernetes secrets: You can create a custom Kubernetes secret for your Airflow environment. An example is Private registry credentials to pull images for KubernetesPodOperator.

    This is all the extensibility we have , until now regarding this feature !


    I hope this helps!

    Kindly mark the answer as Accepted and Upvote in case it helped! Regards

    0 comments No comments

  2. Igor Veselinovic 20 Reputation points
    2024-01-16T19:35:22.3633333+00:00

    Hi Konstantinos, Thanks for getting back to me so quickly. Since asking this question, I was able to successfully use the KubernetesPodOperator, using a basic DAG like this:

    from datetime import datetime
    

    As you can see, I've specified the "adf" namespace and just used the simple "hello-world" image, but altogether it was very straightforward with minimal configuration. Is this the intended way to use the KubernetesPodOperator in ADF Managed Airflow? I also noticed when adding Kuberentes secrets to my Airflow Integrated Runtime, that the "Secret namespace" is "adf", so that's why I'm using that namespace, although "default" also works. But if I want to have access to the secrets that I specify in my Airflow tasks, then obviously I need to use the "adf" namespace. User's image

    Please let me know if this lines up with your understanding of using the KubernetesPodOperator in ADF Managed Airflow. Thanks.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.