Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heartbeat timeout docs #46257

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

karenbraganz
Copy link
Contributor


In Airflow 2.10, task logs on the Airflow UI do not use the terminology "zombie task" to describe when the heartbeat of a local task job times out. Instead, such events are recorded on the Event Log page as a "heartbeat timeout". In order to make troubleshooting such issues more intuitive to Airflow users, the terminology in the documentation should match that in the logs.

I have added more emphasis to the heartbeat timeout terminology in the documentation so that Airflow users are able to easily find this documentation when they see "heartbeat timeout" events and relate the information in the documentation to what they see in the logs.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link
Contributor

@RNHTTR RNHTTR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few suggestions, but this is a great start!

docs/apache-airflow/core-concepts/tasks.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/tasks.rst Outdated Show resolved Hide resolved
@@ -216,7 +216,7 @@ If you'd like to reproduce zombie tasks for development/testing processes, follo
sleep_dag()


Run the above DAG and wait for a while. You should see the task instance becoming a zombie task and then being killed by the scheduler.
Run the above DAG and wait for a while. You should see the task experiencing a heartbeat timeout and then being killed by the scheduler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Run the above DAG and wait for a while. You should see the task experiencing a heartbeat timeout and then being killed by the scheduler.
Run the above DAG and wait for a while. The ``TaskInstance`` will be marked failed after <scheduler_zombie_task_threshold> seconds.

This should link to this doc: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#scheduler-zombie-task-threshold

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, the name of this config should change too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants