-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weights for relation_detr_focalnet_large_lrf_fl4_800_1333.py #15
Comments
The same model works for both image sizes. My GPU memory is not enough for 800 x 1333 images as well. Did you guys get the same accuracy with both input image sizes (800, 1333) & (1200, 2000)? |
Is there any way to reduce the memory of the system? I think the memory issues come from O(n^2) memory for the transformer, can we replace that with the O(n) version or something like xformers? |
Now, I have gone into the details and it is using Deformable attention, meaning we don't need the O(n) version of the transformer. Please correct me if I am wrong. Maybe we can fine-tune it using LORA? I don't know how straightforward this process is in this model. Could you suggest some ideas for fine-tuning your biggest model on a single RTX 3090 Ti machine (24GB VRAM)? |
Hi, @ck-amrahd Thanks for your question. Actually, we haven't pre-trained relation_detr_focalnet_large_LRF_fl4_800_1333 on COCO. To fine-tune the custom dataset, you can directly load the weight of 1200_2000 into the model of 800_1333 as Image_size will not change the model architecture. The model accuracy of image_size (800 800,1333) should be a little lower than that of (1200,2000), but it will reduce much memory. The memory cost mainly comes from the For backbone:
Here is a backbone setting with a better trade-off between GPU memory and accuracy. backbone = FocalNetBackbone("focalnet_large_lrf_fl4", weights=False, return_indices=(1, 2, 3), freeze_indices=(0,)) For transformer encoder: Yes we don't need O(n) attention since deformable attention has been a O(n) version. LoRA mainly solves the memory problem caused by large parameters by decomposing W into low rank matrix. But the memory of Relation-DETR mainly comes from the intermediate output of the model, not the model parameters. Our model may not need LoRA. If you want to try it, you can wrap the following linear layers in the MultiScaleDeformableAttention of the transformer encoder with LoRA. self.attention_weights = nn.Linear(embed_dim, num_heads * num_levels * num_points)
self.value_proj = nn.Linear(embed_dim, embed_dim)
self.output_proj = nn.Linear(embed_dim, embed_dim) |
Thank you @xiuqhou, I will give it a try. |
Hi @xiuqhou Thanks for the idea. I am able to fine-tune the larger model with:
I also reduced the number of queries and the number of hybrid proposals and now I am able to fine-tune the version that takes 1200 * 2000 images. However, the model fine-tuned with this setting performed poorly compared to a model that I fine-tuned from the Swin-L backbone. I am not able to figure out why. I will keep looking into it. In the meantime, do you have any intuition over why that may be the case? |
Hi, @ck-amrahd Thanks for your feedback. Please use Did you change
On the other hand, I strongly suggest you using 800 * 1333 version since larger image_sizes have marginal diminishing returns but will increase memory_cost largely. And please do not reduce or increase To make it simple, I put my own changed 800 * 1333 model_configs for Focal-large here. I have successfully run it on my own 3090 GPU with with from torch import nn
from models.backbones.focalnet import FocalNetBackbone
from models.bricks.position_encoding import PositionEmbeddingSine
from models.bricks.post_process import PostProcess
from models.bricks.relation_transformer import (
RelationTransformer,
RelationTransformerDecoder,
RelationTransformerEncoder,
RelationTransformerEncoderLayer,
RelationTransformerDecoderLayer,
)
from models.bricks.set_criterion import HybridSetCriterion
from models.detectors.relation_detr import RelationDETR
from models.matcher.hungarian_matcher import HungarianMatcher
from models.necks.channel_mapper import ChannelMapper
# mostly changed parameters
embed_dim = 256
num_classes = 91
num_queries = 900
hybrid_num_proposals = 1500
hybrid_assign = 6
num_feature_levels = 5
transformer_enc_layers = 6
transformer_dec_layers = 6
num_heads = 8
dim_feedforward = 2048
# instantiate model components
position_embedding = PositionEmbeddingSine(
embed_dim // 2, temperature=10000, normalize=True, offset=-0.5
)
backbone = FocalNetBackbone("focalnet_large_lrf_fl4", weights=False, return_indices=(1, 2, 3), freeze_indices=(0,))
neck = ChannelMapper(backbone.num_channels, out_channels=embed_dim, num_outs=num_feature_levels)
transformer = RelationTransformer(
encoder=RelationTransformerEncoder(
encoder_layer=RelationTransformerEncoderLayer(
embed_dim=embed_dim,
n_heads=num_heads,
dropout=0.0,
activation=nn.ReLU(inplace=True),
n_levels=num_feature_levels,
n_points=4,
d_ffn=dim_feedforward,
),
num_layers=transformer_enc_layers,
),
decoder=RelationTransformerDecoder(
decoder_layer=RelationTransformerDecoderLayer(
embed_dim=embed_dim,
n_heads=num_heads,
dropout=0.0,
activation=nn.ReLU(inplace=True),
n_levels=num_feature_levels,
n_points=4,
d_ffn=dim_feedforward,
),
num_layers=transformer_dec_layers,
num_classes=num_classes,
),
num_classes=num_classes,
num_feature_levels=num_feature_levels,
two_stage_num_proposals=num_queries,
hybrid_num_proposals=hybrid_num_proposals,
)
matcher = HungarianMatcher(
cost_class=2, cost_bbox=5, cost_giou=2, focal_alpha=0.25, focal_gamma=2.0
)
# construct weight_dict for loss
weight_dict = {"loss_class": 1, "loss_bbox": 5, "loss_giou": 2}
weight_dict.update({"loss_class_dn": 1, "loss_bbox_dn": 5, "loss_giou_dn": 2})
aux_weight_dict = {}
for i in range(transformer.decoder.num_layers - 1):
aux_weight_dict.update({k + f"_{i}": v for k, v in weight_dict.items()})
weight_dict.update(aux_weight_dict)
weight_dict.update({"loss_class_enc": 1, "loss_bbox_enc": 5, "loss_giou_enc": 2})
weight_dict.update({k + "_hybrid": v for k, v in weight_dict.items()})
criterion = HybridSetCriterion(
num_classes=num_classes, matcher=matcher, weight_dict=weight_dict, alpha=0.25, gamma=2.0
)
postprocessor = PostProcess(select_box_nums_for_evaluation=300)
# combine above components to instantiate the model
model = RelationDETR(
backbone=backbone,
neck=neck,
position_embedding=position_embedding,
transformer=transformer,
criterion=criterion,
postprocessor=postprocessor,
num_classes=num_classes,
num_queries=num_queries,
hybrid_assign=hybrid_assign,
denoising_nums=100,
min_size=800,
max_size=1333,
) |
Hi @xiuqhou, Thank you so much for the detailed feedback. I will try it and let you know. |
Hi @xiuqhou Thank you for the feedback. I am now fine-tuning the large focal net model on a custom dataset. I am able to fine-tune according to your instructions. I am now fine-tuning for 1-2 epochs due to the computational cost. The total loss starts around 60 and goes to around 40 at the end of the first epoch. I will train for longer epochs, but this loss seems quite high for the detection task. Do you have any intuition for this? Is that what you observe when fine-tuning on some datasets? |
Hi @ck-amrahd The total loss for the first epoch looks OK. Our method has an extra branch compared to DETRs like DINO, so it contains more loss terms and a larger total loss. When I trained Relation-DETR on COCO, the loss also started around 60 and went to around 35 at the end of the first epoch. It is similar to your result and the difference may comes from different sizes of datasets. As long as the loss goes down steadily, the training process should be OK. Your can refer to our released training log for details. |
Question
Hi, Thanks for the awesome repo. I am trying to finetune your model on a custom dataset. My GPU memory is not enough to finetune relation_detr_focalnet_large_lrf_fl4_1200_2000.py version. I have tried with batch_size=1, and "fp16" mixed precision training. Could you please release the weights and accuracy infor for the relation_detr_focalnet_large_lrf_fl4_800_1333.py version? Thank you.
Additional
No response
The text was updated successfully, but these errors were encountered: