Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up instr_zeroes_ymmh/zmmh() #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dvyukov
Copy link
Owner

@dvyukov dvyukov commented Jul 20, 2023

No description provided.

dvyukov added 3 commits July 20, 2023 10:13
Check operands before decoding the instruction
because it's much faster.
If we have raw bits, then look at them instead of iterating
over all encodings because it's much faster.
Copy link

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall reasonable to me, if you're happy with it...simpler than trying to get the flag even if the overhead maybe isn't quite as low.

byte *bits;

/* Handle zeroall/vzeroupper special case. */
if (instr->opcode == OP_vzeroall || (zmm && instr->opcode == OP_vzeroupper))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per DynamoRIO#6217 we want to remove the zmm condition on vzeroupper.

vex_evex = false;
num_prefixes = 0;
bits = instr_get_raw_bits(instr);
decode_sizeof(GLOBAL_DCONTEXT, bits, &num_prefixes, &rip_rel_pos);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to check the return value to ensure success.

bits = instr_get_raw_bits(instr);
decode_sizeof(GLOBAL_DCONTEXT, bits, &num_prefixes, &rip_rel_pos);
for (i = 0; i < num_prefixes; i++) {
vex_evex |= bits[i] == VEX_2BYTE_PREFIX_OPCODE ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is C where bool is not a first-class type and using bitwise operators can get into trouble...even when all components are pure logical 0-or-1 operations it still makes me nervous (maybe some numeric-as-bool condition will be added in the future and result in a subtle hard to find bug...it has happened before) and I would request logical rather than bitwise combination.

bool
instr_zeroes_ymmh(instr_t *instr)
static bool
instr_zeroes_high(instr_t *instr, bool zmm)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instr_zeroes_simd_high or instr_zeroes_high_simd to distinguish from GPRs?

vex_evex = false;
num_prefixes = 0;
bits = instr_get_raw_bits(instr);
decode_sizeof(GLOBAL_DCONTEXT, bits, &num_prefixes, &rip_rel_pos);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so you asked about the dcontext: right. It is used only for the ISA mode. Really we want to use the ISA mode of the instr, from instr_get_isa_mode(instr).

Hmm, this ends up being messy, unfotunately:

I see code in instr_get_eflags() among others which does this for such a situation:

        dcontext_t *dcontext = get_thread_private_dcontext();
        dr_isa_mode_t old_mode;
        dr_set_isa_mode(dcontext, instr_get_isa_mode(instr), &old_mode);
        decode_eflags_usage(dcontext, instr_get_raw_bits(instr), &instr->eflags,
                            DR_QUERY_INCLUDE_ALL);
        dr_set_isa_mode(dcontext, old_mode, NULL);

But...for standalone decoding there is no thread-private dcontext, and it's racy to mess with the global dcontext for a local operation: that is DynamoRIO#1595 and it looks like nobody has tried to solve it.

A different solution would be to pass in the ISA mode instead of the dcontext, maybe to a non-exported inner function.

(This will only matter for mixed-bitwidth decoding.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants