Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change weight priority type from Integer to Float #42410

Conversation

molcay
Copy link
Contributor

@molcay molcay commented Sep 23, 2024

This PR is about changing the type of priority_weight property of the TaskInstance. For different databases, we have different possible max values for the integer columns.

There are several attempts to fix this problem:

This PR contains the changes in #38160 (created by Taragolis) on top of #38222 as mentioned here.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:db-migrations PRs with DB migration area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation labels Sep 23, 2024
@molcay molcay force-pushed the fix/out-of-range-for-priority-weight branch from a82c385 to 7385aad Compare September 23, 2024 13:06
@jscheffl
Copy link
Contributor

I don't understand the reason and need to change the priority weight from int to float. For me this adds just extra complexity and I see no benefit.

@molcay
Copy link
Contributor Author

molcay commented Sep 27, 2024

Hi @jscheffl,
There was this issue/discussion: #22781 which is about the integer out-of-range for the priority weight field. This error might happen for PostgreSQL. There were a lot of discussions around it in different PRs (that I mentioned in this PR's description). As far as I understand; the next planned move was to make this change after #38222 as mentioned #38160 (comment).

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Sep 27, 2024

This seems a little odd to me, just to be sure I understand what we are trying to solve, are we trying to address integer overflow for weights and operation on weights ?

I don't see the use case for using such big numbers for priority weights.

@molcay
Copy link
Contributor Author

molcay commented Oct 10, 2024

Hi @pierrejeambrun,

The prior target is to address integer overflow. Because it is different for different database engines.
Since we don't have any given range to the users, I think we just need to be sure that it is not failing or do this to avoid failing. I read all the discussions that were held before; this was the one of the previous implementation but closed and waited to make this change after #38222

@jscheffl
Copy link
Contributor

Do we know what the "lowest range" supported in (positive and negative) range is? Then we could build a proper validation to prevent roll-over and document this.

Making this to float I assume will make other problems... like if you have a very large number, adding one will still be the same number :-D

@VladaZakharova
Copy link
Contributor

hi @potiuk !
As I understand there was a discussion regarding the way we want to implement this logic (the whole list is in the description of the PR). As I understand, now we have some kind of new ideas what we want here, so maybe we can agree with @jscheffl on the way it will be better to implement. WDYT?

@potiuk
Copy link
Member

potiuk commented Oct 31, 2024

Yeah. Actually I like the idea of changing it to float - this has far less problems than int (rollover etc.) and the problem with +1/-1 is virtually not existing IMHO - because when you get to the value where +1/-1 is the same, then .... it is already such a big number that it does not matter any more.

I'd be for rebasing and merging this change (@jscheffl ?)

@jscheffl
Copy link
Contributor

jscheffl commented Nov 1, 2024

I am not a big fan of floats - maybe because in my job history I had always long long discussions about FMA and reproducibility of execution... but whatever.

I would not block this but I still feel this is somewhat strange. integers have a very wide range in the default and we are talking about potentially a couple of thousand tasks we want to set priority. I don't see a point that there is a real "functional need" to set priorities exponential or in another manner that - with a normal modelling - yo need such large ranges. Anyway that would be hard to manage from a functional standpoint. Also I don't see a realistic setup where the priority weight strategy because of a very hghe long DAG would get to such boundaries.
And the argument that at a certain level a big big float with +1 is the same big big float and no harm... then we could also figure out the smallest bound of INT supported by Postgres/MySQL/whatsoever DB and just cap the upper/lower boundaries to prevent a roll-over.

It would be more convincing if there is a real problem/use case for such big priorities and not caused because DAG develops just add t00000000 many zeroes to the priority.

@potiuk
Copy link
Member

potiuk commented Nov 1, 2024

I don't see a point that there is a real "functional need" to set priorities exponential or in another manner that - with a normal modelling - yo need such large ranges

Let's not forget that we have the priority-weight cumulation: upstream, downstream and eventually Custom: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule

I can quite easily imagine some big numbers when we have huge 1000s task dags with multiple upstream/downstream tasks (and layers) and especially with custom weights I can easily imagine those numbers to add up (or maybe multiply in custom rules if one wants to have more aggressive priority propagation).

@jscheffl
Copy link
Contributor

jscheffl commented Nov 2, 2024

I don't see a point that there is a real "functional need" to set priorities exponential or in another manner that - with a normal modelling - yo need such large ranges

Let's not forget that we have the priority-weight cumulation: upstream, downstream and eventually Custom: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule

I can quite easily imagine some big numbers when we have huge 1000s task dags with multiple upstream/downstream tasks (and layers) and especially with custom weights I can easily imagine those numbers to add up (or maybe multiply in custom rules if one wants to have more aggressive priority propagation).

I still don't see a real need and use of such high numbers. Yes we accumulate priority weights by making sums. Assume we have a DAG with 1000 tasks chained (I hope nobody is modelling this, will really run a long time) and we use a priority of 10k (=10000). Then the accumulated priority is at 10 million.

Looking into the INT value we use today the supported database have integer ranges with:

This means I still can have 1000 tasks with a priority of 1 million in my DAG. Which is also something in the range of values you can model to fit into the INT range.

Instead of switching to float I think we should rather cap the values and ensure they can not roll-over. And add documentation about the limits. The limtis of postgres and mysql are the same and sound reasonable (else: there is also the option to switch to bigint of course if you want to support incredible numbers non-float).

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 4, 2024

I am a little undecided on this one. On one hand if that is a use case, then why not, on the other hand, I tend to agree with Jens. I am having a hard time imagining a legitimate scenario for such big priority weights where downscaling to more reasonable values is not possible.

@potiuk
Copy link
Member

potiuk commented Nov 4, 2024

@molcay -> is there a case that you might give as an example where it would be needed?

@molcay
Copy link
Contributor Author

molcay commented Nov 7, 2024

Hi @potiuk, I am sorry but I don't have any specific case that can be used as an example. I only know #22781 was the start. Maybe, @kosteev has an idea?

@kosteev
Copy link
Contributor

kosteev commented Nov 7, 2024

Not pretending that this is very practical case, however in Cloud Composer we saw this (and not only once):

Imagine that you give customer an example DAG that looks like this:

...

dag = DAG(
    'dag_id',
    default_args=default_args,
    schedule_interval='*/10 * * * *',
)

# task with highest priority
t1 = BashOperator(
    task_id='task_id',
    bash_command='echo test',
    dag=dag,
    priority_weight=2**31 - 1)

Customer modifies DAG and adds extra task t2 by copying t1 and setting dependencies between them:

...

dag = DAG(
    'dag_id',
    default_args=default_args,
    schedule_interval='*/10 * * * *',
)

# task with highest priority
t1 = BashOperator(
    task_id='task_id',
    bash_command='echo test',
    dag=dag,
    priority_weight=2**31 - 1)

t2 = BashOperator(
    task_id='task_id2',
    bash_command='echo test2',
    dag=dag,
    priority_weight=2**31 - 1)

t1 >> t2

Then this DAG will cause an issue and break scheduler (because t2 priority_weight will overflow).

Btw, I found and example DAG like this on stackoverflow https://stackoverflow.com/questions/66098050/airflow-dag-not-triggered-at-schedule-time.

I am not saying that this is at all common, but it is very unexpected for user to have scheduler broken after slight modification of the DAG like this.

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 7, 2024

Those are just unreasonable values and it is not a surprise to me that things break down the line.

If I code a mapped task that expand to (2*32 - 1) most likely things will also break.

We can always add a safeguard, or a better error handling on the scheduler side to make that more explicit, but changing the underlying datatype to allow usage of extremely high values does not achieve much (IMO).

Also I believe that this can already be solved by a custom cluster policy if a company encounter this edge case. (preventing people from using too high values)

@kosteev
Copy link
Contributor

kosteev commented Nov 7, 2024

I agree that changing datatype doesn't solve issue radically.

I haven't read all the comments here, but my vote is to have validation/safeguard for this (e.g. validation during DAG parsing and throwing import error for such a DAG, that customer could see it).

In general the fact that Airflow scheduler (or DAG processor) can break because of user code doesn't look right, IMHO.

@potiuk
Copy link
Member

potiuk commented Nov 11, 2024

Yeah. I am convinced now, that we should rather cap the values, not change the type.

@molcay
Copy link
Contributor Author

molcay commented Nov 13, 2024

Since #43611 is already approved and looks like it is about to merge, I am closing this PR

@molcay molcay closed this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:db-migrations PRs with DB migration area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants