max_active_runs of Airflow
In Apache Airflow, if you want to ensure that a new run of a DAG doesn’t start before the previous one has completed, you can use the max_active_runs
parameter in the DAG definition. Setting this parameter to 1
ensures that only one instance of the DAG is running at any given time.
Setting max_active_runs
for a DAG
Here’s an example of how to set up a DAG with max_active_runs=1
:
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime
# Define the DAG
with DAG(
'my_dag',
description='A sample DAG',
schedule_interval='@daily', # Set your schedule
start_date=datetime(2023, 1, 1),
catchup=False,
max_active_runs=1 # This prevents new runs before the previous run completes
) as dag:
# Define tasks
start = DummyOperator(task_id='start')
end = DummyOperator(task_id='end')
# Define task dependencies
start >> end
Explanation of max_active_runs
max_active_runs=1
: Ensures that only one DAG run is active at any given time. If there is already a run in progress, any new scheduled or triggered runs will be queued until the current one finishes.- Where to Use: This option is set directly in the DAG definition.
Other Related Options
concurrency
(Task-Level Control): You can setconcurrency
on the DAG level if you want to control the maximum number of tasks that can run simultaneously within the DAG. This is helpful if you want to limit parallelism across tasks, but it doesn’t specifically prevent overlapping DAG runs.DAG( 'my_dag', concurrency=1, # Limits task concurrency ... )
depends_on_past=True
(Task-Level Control): At the task level, you can setdepends_on_past=True
to ensure that a task only runs if the previous run of that task succeeded. While this doesn’t directly limit the DAG runs, it helps create dependencies based on past runs of individual tasks, which can add an extra layer of control.DummyOperator( task_id='my_task', depends_on_past=True, dag=dag )
Summary
For preventing concurrent DAG runs, max_active_runs=1
is the most effective option. This will ensure that Airflow queues any new DAG runs until the current one completes, helping you maintain sequential DAG executions without overlap.
댓글
댓글 쓰기