You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i can set the attn.to_out.0 attn.to_add_out ff.net.2 ff_context.net.2 to LinearAllreduce, but how to deal with norm1.linear and norm1_context.linear. i need all gather the results of a single linear layer or it will cause error because the inputs of both norm1 and norm1_context are a whole hidden_states
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
i can see the rowlinear with allreduce aka LinearAllreduce. but there is no any implementations about column linear layer with all gather.
how could i set the linear type when running dit models:
i can set the
attn.to_out.0 attn.to_add_out ff.net.2 ff_context.net.2
to LinearAllreduce, but how to deal withnorm1.linear
andnorm1_context.linear
. i need all gather the results of a single linear layer or it will cause error because the inputs of both norm1 and norm1_context are a whole hidden_statesBeta Was this translation helpful? Give feedback.
All reactions