Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.
Kubernetes is an open-source product by Google for managing production environments at mega-scale.
Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster.
KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time?
The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified.
Then, how do we do that?
Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.
Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.
Instead of running as below:
We run it as follows:
Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator
An earlier version of this blog was published on Medium by the author
Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.
Kubernetes is an open-source product by Google for managing production environments at mega-scale.
Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster.
KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time?
The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified.
Then, how do we do that?
Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.
Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.
Instead of running as below:
We run it as follows:
Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator
An earlier version of this blog was published on Medium by the author
Apache Airflow is an open-source tool by Airbnb for managing complex workflows using Directed Acyclic Graph (DAG). It can manage the most complex, repetitive workflows with a lesser amount of code. One can add scheduling, task delegation via Airflow.
Kubernetes is an open-source product by Google for managing production environments at mega-scale.
Kubernetes is the easiest way to handle containers in production, and Airflow is a tool for managing complex workflows at scale. Airflow provides KubernetesPodOperator, which allows you to create and run pods on a Kubernetes cluster.
KubernetesPodOperator is simple, you add the KubernetesPodOperator to a DAG, provide the container name, and it'll run. You can also add secrets, volumes, pod affinity, docker run time arguments, etc. What if one wants to run a container whose container name would be known only at run time?
The Airflow run time will call the KubernetesPodOperator's execute method, which will create the pod with the container name we specified.
Then, how do we do that?
Airflow has the concept of DagRun-configuration. With DagRun, we can pass run time configuration to a specific airflow DAG instance, that means, we can do this, modify the above execute method of KubernetesPodOperator, like this.
Output - docker image name, commands, arguments, and environment variables, which can be passed via the rest Application Programming Interface (API) or command line.
Instead of running as below:
We run it as follows:
Feel free to check the code at: https://github.com/saivarunr/KubernetesPodOperator
An earlier version of this blog was published on Medium by the author