max_active_runs of Airflow

In Apache Airflow, if you want to ensure that a new run of a DAG doesn’t start before the previous one has completed, you can use the max_active_runs parameter in the DAG definition. Setting this parameter to 1 ensures that only one instance of the DAG is running at any given time.

Setting max_active_runs for a DAG

Here’s an example of how to set up a DAG with max_active_runs=1:

from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime

# Define the DAG
with DAG(
    'my_dag',
    description='A sample DAG',
    schedule_interval='@daily',      # Set your schedule
    start_date=datetime(2023, 1, 1),
    catchup=False,
    max_active_runs=1                # This prevents new runs before the previous run completes
) as dag:

    # Define tasks
    start = DummyOperator(task_id='start')
    end = DummyOperator(task_id='end')

    # Define task dependencies
    start >> end

Explanation of max_active_runs

  • max_active_runs=1: Ensures that only one DAG run is active at any given time. If there is already a run in progress, any new scheduled or triggered runs will be queued until the current one finishes.
  • Where to Use: This option is set directly in the DAG definition.
  1. concurrency (Task-Level Control): You can set concurrency on the DAG level if you want to control the maximum number of tasks that can run simultaneously within the DAG. This is helpful if you want to limit parallelism across tasks, but it doesn’t specifically prevent overlapping DAG runs.

    DAG(
        'my_dag',
        concurrency=1,  # Limits task concurrency
        ...
    )
    
  2. depends_on_past=True (Task-Level Control): At the task level, you can set depends_on_past=True to ensure that a task only runs if the previous run of that task succeeded. While this doesn’t directly limit the DAG runs, it helps create dependencies based on past runs of individual tasks, which can add an extra layer of control.

    DummyOperator(
        task_id='my_task',
        depends_on_past=True,
        dag=dag
    )
    

Summary

For preventing concurrent DAG runs, max_active_runs=1 is the most effective option. This will ensure that Airflow queues any new DAG runs until the current one completes, helping you maintain sequential DAG executions without overlap.

댓글

이 블로그의 인기 게시물

Using the MinIO API via curl

Install and run an FTP server using Docker

PYTHONPATH, Python 모듈 환경설정

Elasticsearch Ingest API

How to checkout branch of remote git, 깃 리모트 브랜치 체크아웃

Fundamentals of English Grammar #1

You can use Sublime Text from the command line by utilizing the subl command

How to start computer vision ai

Catch multiple exceptions in Python

git 명령어