The One Billion Row Challenge (1BRC) is a fun exploration of how far modern languages can be pushed for aggregating one billion rows from a text file. The Dataset for angregation is around 13GB.
The Challenge was originally coined for Java based compilers but here i have implemented in golang .
The original repo can be found at https://github.com/gunnarmorling/1brc
The Implementation is in different methods running for machine with specs below
Processor: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz, 1800 Mhz, 4 Core(s), 8 Logical Processor(s)
Physical Memory(RAM): 8GB
In this method hashmap is used where key is hashed to index in a array. No go routine is used the program executes serially. The program takes 7m 35s to execute approximately
This method is similar to first but instead of using hashmap golang's inbuilt map is used which maps key to pointer of values. The execution time reduced to around 5m
Here the execution is parellized by batch processing along with custom hashmaps. Lines are chunked together in different sizes of multiple 1024 bytes. The execution time is reduced to 4m 45s
This method is same as above only maps are used. Execution time is tested for different chunk lengths along with the cores in use.
chunk len: 64x1024, 128x1024, 256x1024
| 8 Cores | 3m54s | 3m22s | 3m23s |
| 6 Cores | 3m54s | 3m21s | 3m49s |
- Using Buffers instead of scanner.Scan()
- Different methods to convert to tempratures to Int
- Profiling the program
- Improving Hashing algorithm to use instead of go maps
- Execution time can be further reduced by increasing CPU Cores to increase parallization