max_active_runs of Airflow

In Apache Airflow, if you want to ensure that a new run of a DAG doesn’t start before the previous one has completed, you can use the max_active_runs parameter in the DAG definition. Setting this parameter to 1 ensures that only one instance of the DAG is running at any given time.

Setting max_active_runs for a DAG

Here’s an example of how to set up a DAG with max_active_runs=1:

from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime

# Define the DAG
with DAG(
    'my_dag',
    description='A sample DAG',
    schedule_interval='@daily',      # Set your schedule
    start_date=datetime(2023, 1, 1),
    catchup=False,
    max_active_runs=1                # This prevents new runs before the previous run completes
) as dag:

    # Define tasks
    start = DummyOperator(task_id='start')
    end = DummyOperator(task_id='end')

    # Define task dependencies
    start >> end

Explanation of max_active_runs

  • max_active_runs=1: Ensures that only one DAG run is active at any given time. If there is already a run in progress, any new scheduled or triggered runs will be queued until the current one finishes.
  • Where to Use: This option is set directly in the DAG definition.
  1. concurrency (Task-Level Control): You can set concurrency on the DAG level if you want to control the maximum number of tasks that can run simultaneously within the DAG. This is helpful if you want to limit parallelism across tasks, but it doesn’t specifically prevent overlapping DAG runs.

    DAG(
        'my_dag',
        concurrency=1,  # Limits task concurrency
        ...
    )
    
  2. depends_on_past=True (Task-Level Control): At the task level, you can set depends_on_past=True to ensure that a task only runs if the previous run of that task succeeded. While this doesn’t directly limit the DAG runs, it helps create dependencies based on past runs of individual tasks, which can add an extra layer of control.

    DummyOperator(
        task_id='my_task',
        depends_on_past=True,
        dag=dag
    )
    

Summary

For preventing concurrent DAG runs, max_active_runs=1 is the most effective option. This will ensure that Airflow queues any new DAG runs until the current one completes, helping you maintain sequential DAG executions without overlap.

댓글

이 블로그의 인기 게시물

Using the MinIO API via curl

How to split a list into chunks of 100 items in JavaScript, 자바스크립트 리스트 쪼개기

HTML Inline divisions at one row by Tailwind

Boilerplate for typescript server programing

가속도 & 속도

Gradle multi-module project

How to checkout branch of remote git, 깃 리모트 브랜치 체크아웃

CDPEvents in puppeteer

Sparse encoder

Reactjs datetime range picker