Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to load PE binaries with non-utf-8 decodable bytes in section name #438

Open
AlexVanMechelen opened this issue Nov 8, 2023 · 8 comments
Labels

Comments

@AlexVanMechelen
Copy link

AlexVanMechelen commented Nov 8, 2023

Description

Loading a PE binary with non-utf-8 decodable bytes in the section name of one of its sections causes a crash here

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 6: invalid continuation byte

Could add errors='ignore' flag to .decode() to drop non-utf-8 decodable bytes

@ltfish
Copy link
Member

ltfish commented Nov 8, 2023

@rhelmot I've been thinking about this for a while. Should we use bytes instead of str for section and segment names?

@rhelmot
Copy link
Member

rhelmot commented Nov 8, 2023

My question for OP is: are your section names encoded in some other encoding, or are they garbage?

@AlexVanMechelen
Copy link
Author

@rhelmot The section names are garbage. Some executable packers create such garbage section names, leading to the above error for all executables packed with them.

@rhelmot
Copy link
Member

rhelmot commented Nov 8, 2023

Does any compiler support generating utf-8 section names? If so, I would recommend adding the error-replace utf-8 decoding. If not, Latin-1.

@ltfish
Copy link
Member

ltfish commented Nov 9, 2023

@rhelmot Some malware intentionally makes their section names garbage. I don't think we want to fail to load those binaries in such cases.

@rhelmot
Copy link
Member

rhelmot commented Nov 9, 2023

Neither of those solutions will fail with garbage bytes.

@ltfish
Copy link
Member

ltfish commented Nov 9, 2023

Why don't we default to latin-1?

@rhelmot
Copy link
Member

rhelmot commented Nov 9, 2023

That's why I asked the question about whether compilers let you generate utf8 section names manually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants