Skip to content

Commit

Permalink
Merge branch 'PaddlePaddle:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
sfc-gh-dkajtoch authored Jan 16, 2025
2 parents 65d4e39 + cf4c059 commit 4e5db9e
Show file tree
Hide file tree
Showing 109 changed files with 127,234 additions and 269 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ PaddleOCR 由 [PMC](https://github.com/PaddlePaddle/PaddleOCR/issues/12122) 监
- 🎨 [**模型丰富一键调用**](https://paddlepaddle.github.io/PaddleOCR/latest/paddlex/quick_start.html):将文本图像智能分析、通用OCR、通用版面解析、通用表格识别、公式识别、印章文本识别涉及的**17个模型**整合为6条模型产线,通过极简的**Python API一键调用**,快速体验模型效果。此外,同一套API,也支持图像分类、目标检测、图像分割、时序预测等共计**200+模型**,形成20+单功能模块,方便开发者进行**模型组合**使用。
- 🚀[**提高效率降低门槛**](https://paddlepaddle.github.io/PaddleOCR/latest/paddlex/overview.html):提供基于**统一命令****图形界面**两种方式,实现模型简洁高效的使用、组合与定制。支持**高性能推理、服务化部署和端侧部署**等多种部署方式。此外,对于各种主流硬件如**英伟达GPU、昆仑芯、昇腾、寒武纪和海光**等,进行模型开发时,都可以**无缝切换**

- 支持文档场景信息抽取v3[PP-ChatOCRv3-doc](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction.md)、基于RT-DETR的[高精度版面区域检测模型](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/layout_detection.md)和PicoDet的[高效率版面区域检测模型](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/layout_detection.md)、高精度表格结构识别模型[SLANet_Plus](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/table_structure_recognition.md)、文本图像矫正模型[UVDoc](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/text_image_unwarping.md)、公式识别模型[LatexOCR](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/formula_recognition.md)、基于PP-LCNet的[文档图像方向分类模型](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.md)
- 支持文档场景信息抽取v3[PP-ChatOCRv3-doc](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction.html)、基于RT-DETR的[高精度版面区域检测模型](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html)和PicoDet的[高效率版面区域检测模型](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html)、高精度表格结构识别模型[SLANet_Plus](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/table_structure_recognition.html)、文本图像矫正模型[UVDoc](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/text_image_unwarping.html)、公式识别模型[LatexOCR](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/formula_recognition.html)、基于PP-LCNet的[文档图像方向分类模型](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.html)

- **🔥2024.7 添加 PaddleOCR 算法模型挑战赛冠军方案**
- 赛题一:OCR 端到端识别任务冠军方案——[场景文本识别算法-SVTRv2](https://paddlepaddle.github.io/PaddleOCR/latest/algorithm/text_recognition/algorithm_rec_svtrv2.html)
Expand Down
4 changes: 2 additions & 2 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ PaddleOCR is being oversight by a [PMC](https://github.com/PaddlePaddle/PaddleOC

- 🚀 [**High Efficiency and Low barrier of entry**](https://paddlepaddle.github.io/PaddleOCR/latest/en/paddlex/overview.html): Provides two methods based on **unified commands** and **GUI** to achieve simple and efficient use, combination, and customization of models. Supports multiple deployment methods such as **high-performance inference, service-oriented deployment, and edge deployment**. Additionally, for various mainstream hardware such as **NVIDIA GPU, Kunlunxin XPU, Ascend NPU, Cambricon MLU, and Haiguang DCU**, models can be developed with **seamless switching**.

- Supports [PP-ChatOCRv3-doc](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_en.md), [high-precision layout detection model based on RT-DETR](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/layout_detection_en.md) and [high-efficiency layout area detection model based on PicoDet](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/layout_detection_en.md), [high-precision table structure recognition model](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/table_structure_recognition_en.md), text image unwarping model [UVDoc](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/text_image_unwarping_en.md), formula recognition model [LatexOCR](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/formula_recognition_en.md), and [document image orientation classification model based on PP-LCNet](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/doc_img_orientation_classification_en.md).
- Supports [PP-ChatOCRv3-doc](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction.html), [high-precision layout detection model based on RT-DETR](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html) and [high-efficiency layout area detection model based on PicoDet](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html), [high-precision table structure recognition model](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html), text image unwarping model [UVDoc](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/text_image_unwarping.html), formula recognition model [LatexOCR](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html), and [document image orientation classification model based on PP-LCNet](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.html).

- **🔥2024.7 Added PaddleOCR Algorithm Model Challenge Champion Solutions**:
- Challenge One, OCR End-to-End Recognition Task Champion Solution: [Scene Text Recognition Algorithm-SVTRv2](https://paddlepaddle.github.io/PaddleOCR/algorithm/text_recognition/algorithm_rec_svtrv2.html);
Expand All @@ -50,7 +50,7 @@ Full documentation can be found on [docs](https://paddlepaddle.github.io/PaddleO
PaddleOCR support a variety of cutting-edge algorithms related to OCR, and developed industrial featured models/solution [PP-OCR](https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/overview.html)[PP-Structure](https://paddlepaddle.github.io/PaddleOCR/latest/en/ppstructure/overview.html) and [PP-ChatOCR](https://aistudio.baidu.com/aistudio/projectdetail/6488689) on this basis, and get through the whole process of data production, model training, compression, inference and deployment.

<div align="center">
<img src="./docs/images/ppocrv4.png">
<img src="./docs/images/ppocrv4_en.jpg">
</div>

> It is recommended to start with the “quick experience” in the document tutorial
Expand Down
1 change: 1 addition & 0 deletions configs/det/ch_PP-OCRv4/ch_PP-OCRv4_det_cml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Global:
use_visualdl: false
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./checkpoints/det_db/predicts_db.txt
d2s_train_image_shape: [3, 640, 640]
distributed: true
Architecture:
name: DistillationModel
Expand Down
1 change: 1 addition & 0 deletions configs/det/ch_PP-OCRv4/ch_PP-OCRv4_det_student.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Global:
use_visualdl: false
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./checkpoints/det_db/predicts_db.txt
d2s_train_image_shape: [3, 640, 640]
distributed: true

Architecture:
Expand Down
1 change: 1 addition & 0 deletions configs/det/ch_PP-OCRv4/ch_PP-OCRv4_det_teacher.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Global:
use_visualdl: false
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./checkpoints/det_db/predicts_db.txt
d2s_train_image_shape: [3, 640, 640]
distributed: true

Architecture:
Expand Down
117 changes: 117 additions & 0 deletions configs/rec/PP-FormuaNet/rec_pp_formulanet_l.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
Global:
use_gpu: True
epoch_num: 10
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/rec/pp_formulanet_l/
save_epoch_step: 2
# evaluation is run every 417 iterations (1 epoch)(batch_size = 24) # max_seq_len: 1024
eval_batch_step: [0, 417 ]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/datasets/pme_demo/0000013.png
infer_mode: False
use_space_char: False
rec_char_dict_path: &rec_char_dict_path ppocr/utils/dict/unimernet_tokenizer
max_new_tokens: &max_new_tokens 1024
input_size: &input_size [768, 768]
save_res_path: ./output/rec/predicts_unimernet_latexocr.txt
allow_resize_largeImg: False
start_ema: True

Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
weight_decay: 0.05
lr:
name: LinearWarmupCosine
learning_rate: 0.0001

Architecture:
model_type: rec
algorithm: PP-FormulaNet-L
in_channels: 3
Transform:
Backbone:
name: Vary_VIT_B_Formula
image_size: 768
encoder_embed_dim: 768
encoder_depth: 12
encoder_num_heads: 12
encoder_global_attn_indexes: [2, 5, 8, 11]
Head:
name: PPFormulaNet_Head
max_new_tokens: *max_new_tokens
decoder_start_token_id: 0
decoder_ffn_dim: 2048
decoder_hidden_size: 512
decoder_layers: 8
temperature: 0.2
do_sample: False
top_p: 0.95
encoder_hidden_size: 1024
is_export: False
length_aware: False
use_parallel: False
parallel_step: 0

Loss:
name: PPFormulaNet_L_Loss

PostProcess:
name: UniMERNetDecode
rec_char_dict_path: *rec_char_dict_path

Metric:
name: LaTeXOCRMetric
main_indicator: exp_rate
cal_bleu_score: False

Train:
dataset:
name: SimpleDataSet
data_dir: ./ocr_rec_latexocr_dataset_example
label_file_list: ["./ocr_rec_latexocr_dataset_example/train.txt"]
transforms:
- UniMERNetImgDecode:
input_size: *input_size
- UniMERNetTrainTransform:
- LatexImageFormat:
- UniMERNetLabelEncode:
rec_char_dict_path: *rec_char_dict_path
max_seq_len: *max_new_tokens
- KeepKeys:
keep_keys: ['image', 'label', 'attention_mask']

loader:
shuffle: False
drop_last: False
batch_size_per_card: 6
num_workers: 0
collate_fn: UniMERNetCollator

Eval:
dataset:
name: SimpleDataSet
data_dir: ./ocr_rec_latexocr_dataset_example
label_file_list: ["./ocr_rec_latexocr_dataset_example/val.txt"]
transforms:
- UniMERNetImgDecode:
input_size: *input_size
- UniMERNetTestTransform:
- LatexImageFormat:
- UniMERNetLabelEncode:
max_seq_len: *max_new_tokens
rec_char_dict_path: *rec_char_dict_path
- KeepKeys:
keep_keys: ['image', 'label', 'attention_mask', 'filename']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 10
num_workers: 0
collate_fn: UniMERNetCollator
115 changes: 115 additions & 0 deletions configs/rec/PP-FormuaNet/rec_pp_formulanet_s.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
Global:
use_gpu: True
epoch_num: 20
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/rec/pp_formulanet_s/
save_epoch_step: 2
# evaluation is run every 179 iterations (1 epoch)(batch_size = 56) # max_seq_len: 1024
eval_batch_step: [0, 179]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/datasets/pme_demo/0000013.png
infer_mode: False
use_space_char: False
rec_char_dict_path: &rec_char_dict_path ppocr/utils/dict/unimernet_tokenizer
max_new_tokens: &max_new_tokens 1024
input_size: &input_size [384, 384]
save_res_path: ./output/rec/predicts_unimernet_latexocr.txt
allow_resize_largeImg: False
start_ema: True

Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
weight_decay: 0.05
lr:
name: LinearWarmupCosine
learning_rate: 0.0001

Architecture:
model_type: rec
algorithm: PP-FormulaNet-S
in_channels: 3
Transform:
Backbone:
name: PPHGNetV2_B4
class_num: 1024

Head:
name: PPFormulaNet_Head
max_new_tokens: *max_new_tokens
decoder_start_token_id: 0
decoder_ffn_dim: 1536
decoder_hidden_size: 384
decoder_layers: 2
temperature: 0.2
do_sample: False
top_p: 0.95
encoder_hidden_size: 2048
is_export: False
length_aware: True
use_parallel: True,
parallel_step: 3

Loss:
name: PPFormulaNet_S_Loss
parallel_step: 3

PostProcess:
name: UniMERNetDecode
rec_char_dict_path: *rec_char_dict_path

Metric:
name: LaTeXOCRMetric
main_indicator: exp_rate
cal_bleu_score: False

Train:
dataset:
name: SimpleDataSet
data_dir: ./ocr_rec_latexocr_dataset_example
label_file_list: ["./ocr_rec_latexocr_dataset_example/train.txt"]
transforms:
- UniMERNetImgDecode:
input_size: *input_size
- UniMERNetTrainTransform:
- LatexImageFormat:
- UniMERNetLabelEncode:
rec_char_dict_path: *rec_char_dict_path
max_seq_len: *max_new_tokens
- KeepKeys:
keep_keys: ['image', 'label', 'attention_mask']

loader:
shuffle: False
drop_last: False
batch_size_per_card: 14
num_workers: 0
collate_fn: UniMERNetCollator

Eval:
dataset:
name: SimpleDataSet
data_dir: ./ocr_rec_latexocr_dataset_example
label_file_list: ["./ocr_rec_latexocr_dataset_example/val.txt"]
transforms:
- UniMERNetImgDecode:
input_size: *input_size
- UniMERNetTestTransform:
- LatexImageFormat:
- UniMERNetLabelEncode:
max_seq_len: *max_new_tokens
rec_char_dict_path: *rec_char_dict_path
- KeepKeys:
keep_keys: ['image', 'label', 'attention_mask', 'filename']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 30
num_workers: 0
collate_fn: UniMERNetCollator
1 change: 1 addition & 0 deletions configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Global:
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt
d2s_train_image_shape: [3, 48, 320]


Optimizer:
Expand Down
2 changes: 1 addition & 1 deletion configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_hgnet.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Global:
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt

d2s_train_image_shape: [3, 48, 320]

Optimizer:
name: Adam
Expand Down
1 change: 1 addition & 0 deletions configs/rec/SVTRv2/rec_repsvtr_ch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Global:
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_repsvtr.txt
d2s_train_image_shape: [3, 48, 320]

Optimizer:
name: AdamW
Expand Down
2 changes: 1 addition & 1 deletion configs/rec/SVTRv2/rec_svtrv2_ch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Global:
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_svrtv2.txt

d2s_train_image_shape: [3, 48, 320]

Optimizer:
name: AdamW
Expand Down
3 changes: 2 additions & 1 deletion configs/rec/rec_latex_ocr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Global:
use_space_char: False
rec_char_dict_path: ppocr/utils/dict/latex_ocr_tokenizer.json
save_res_path: ./output/rec/predicts_latexocr.txt
d2s_train_image_shape: [1,256,256]

Optimizer:
name: AdamW
Expand Down Expand Up @@ -64,7 +65,7 @@ PostProcess:
Metric:
name: LaTeXOCRMetric
main_indicator: exp_rate
cal_blue_score: False
cal_bleu_score: False

Train:
dataset:
Expand Down
Loading

0 comments on commit 4e5db9e

Please sign in to comment.