Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for highway ArgSort #2472

Open
zafeerali943 opened this issue Feb 19, 2025 · 14 comments
Open

Support for highway ArgSort #2472

zafeerali943 opened this issue Feb 19, 2025 · 14 comments

Comments

@zafeerali943
Copy link

Hi Developers!

I could see highway repository supports variety of hardware to enable optimised sort algorithms. As one of such sort algorithms ArgSort are primarily used for many scientific libraries. Is there any plan to add support for ArgSort implementation from Highway? So that it can be used for popular Scientific libraries such as NumPy which can improve performance of ArgSort on ARM devices.

Thanks!

@jan-wassenberg
Copy link
Member

Hi, it is possible to get argsort reasonably efficiently using the existing VQSort calls. If you arrange the input so that 32 or 64-bit keys are in K32V32 or K64V64 structs, and place iota (0..N-1) into the value part of that struct, then the resulting values will be argsort. Does that fit your use case?

@zafeerali943
Copy link
Author

Hi @jan-wassenberg thanks for your valuable information! I pretty new to highway source, kindly can you explain more precisely like how we can implement this probably with some example, like we can implement and call argsort function via VQSort.

Thanks in advance

@zafeerali943
Copy link
Author

@jan-wassenberg are you referring to this part of code vqsort_kv64a.cc

@jan-wassenberg
Copy link
Member

Yes, that's right. A somewhat more detailed sketch:

K32V32 kv[];
for (size_t i = 0; i < keys.size(); ++i) {
kv[i].key = keys[i];
kv[i].value = i;
}
VQSort(kv, keys.size(), SortAscending());
// output kv, or copy kv[i].value into output array

@zafeerali943
Copy link
Author

@zafeerali943 wonderful! thanks for your valuable information! just one more clarification what are dtypes does this KV sort offers?
Could it be applicable for int32 and int64 ? Also what could be the necessary source to build VQsort for this scenario

@zafeerali943
Copy link
Author

zafeerali943 commented Feb 20, 2025

Yes, that's right. A somewhat more detailed sketch:

K32V32 kv[];
for (size_t i = 0; i < keys.size(); ++i) {
kv[i].key = keys[i];
kv[i].value = i;
}
VQSort(kv, keys.size(), SortAscending());
// output kv, or copy kv[i].value into output array

Just now I tried compiling sample source using clang-cl:

`#include "hwy/highway.h"

#include "hwy/contrib/sort/vqsort-inl.h"

int main(){
hwy::K32V32 kv[20];
int keys[20] = {184, 100, 61, 207, 58, 141, 5, 150, 115, 33, 72, 47, 195, 130, 213, 177, 225, 165, 72, 89};

for (size_t i = 0; i < 20; ++i) {
    kv[i].key = keys[i];
    kv[i].value = i;
}
hwy::VQSort(kv,20,hwy::SortAscending());
return 0;

}
`

Error:
lld-link: error: undefined symbol: void __cdecl hwy::VQSort(struct hwy::K32V32 *__restrict, unsigned __int64, struct hwy::SortAscending)

referenced by C:\Users\HCKTest\AppData\Local\Temp\kvsort-e27948.obj:(main)

@jan-wassenberg
Copy link
Member

Yes, for int32 you can use K32V32 and int64 K64V64.
For calling VQSort, the correct and sufficient header is vqsort.h, but you will have to include in your project vqsort*.cc.

@zafeerali943
Copy link
Author

I am trying to compile using LLVM's clang-cl on windows, can you give a sample source for the above use case to compile using clang-cl, I am facing linking errors

@jan-wassenberg
Copy link
Member

The above is as close to full source code as we are willing and able to provide.

@zafeerali943
Copy link
Author

zafeerali943 commented Feb 21, 2025

Hi @jan-wassenberg, thanks for your patience and support!

I tried including all vqsort sources in cmakelist and tried compiling my sample source, but ended up with someother undefined errors.

My Program:

#include "hwy/contrib/sort/vqsort.h"
int main(){
    hwy::K32V32 kv[20];
    int keys[20] = {184, 100, 61, 207, 58, 141, 5, 150, 115, 33, 72, 47, 195, 130, 213, 177, 225, 165, 72, 89};
    for (size_t i = 0; i < 20; ++i) {
        kv[i].key = keys[i];
        kv[i].value = i;
    }
    hwy::VQSort(kv,20,hwy::SortAscending()); 
    return 0;
};

My Cmakelist:

# CMakeLists.txt
cmake_minimum_required(VERSION 3.10)

# Set Clang-CL as the compiler
set(CMAKE_C_COMPILER "clang-cl")
set(CMAKE_CXX_COMPILER "clang-cl")

project(vqsort_example)

# Specify C++ standard
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# Find all vqsort source files
file(GLOB VQSORT_SOURCES 
    "highway/hwy/contrib/sort/vqsort*.cc"
)

# Add source files
set(SOURCES
    samplekv.cpp
    ${VQSORT_SOURCES}
)

# Create executable
add_executable(sample_sort ${SOURCES})

# Add include directory for Highway
target_include_directories(sample_sort PRIVATE
    C:/Users/HCKTest/Desktop/highway
)


target_compile_options(sample_sort PRIVATE 
    /EHsc    
    -fuse-ld=lld  
)
```
Find the error in my attached logs:

[logs.txt](https://github.com/user-attachments/files/18904902/logs.txt)

@Mugundanmcw
Copy link

@zafeerali943 the issue can be fixed by adding abort.cc file from hwy folder

@Mugundanmcw
Copy link

I was able to reproduce this issue, compiling with necessary cc source file included in the project would fix the above issue

@Mugundanmcw
Copy link

Yes, for int32 you can use K32V32 and int64 K64V64. For calling VQSort, the correct and sufficient header is vqsort.h, but you will have to include in your project vqsort*.cc.

@jan-wassenberg I just went through source code of KV sort sources, currently it supports only uint32\uint64 right? do you have any plan to support int32/int64 in future release?

@jan-wassenberg
Copy link
Member

Hi @Mugundanmcw , that's right. Because we are likely anyway copying from the user's input to the KV type, it is probably best to then also apply a transformation that makes int32 sort in the desired order, when in fact the underlying sort is a u32. (Same also applies for int64 and u64.) This transformation extracts the sign bit, broadcasts it to all bits, then XORs the signed input with that. For example, uint32_t flip = (input < 0) ? ~0u : 0; transformed = static_cast<uint32_t>(input) ^ flip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants