Skip to content

📸 Clean your image folder using perceptual hashing and BK-trees using Go!

Notifications You must be signed in to change notification settings

adriacabeza/go-imagecleaner

Repository files navigation

Image Cleaner 🏞🏞 ➡ 🏞

made-with-Go Go

This tool can take your image gallery and create a new folder with image-alike-cluster folders. It uses a perceptual image hashing algorithm and a custom threshold to cluster them. An improvement could be to add deep-learning to the scene and cluster the images based on features. To cluster the images efficently, it uses a BK-trees since checking duplicates can turn into a O(N^2) problem pretty easily.

Before



After


Image Cleaner was created upon a friend request. After a friends-trip, he had several pictures that looked alike (from different smartphones) and he wanted to select the best ones. He tried to use fdupes to start removing the exact duplicates but it didn't even work since some of the images were sent using Google Photos, Whatsapp, etc (different compression algorithms and sizes). After a quick search I found some python examples like: duplicate images or Fast Near Duplicate image search but I did not find anything similar written in Go so here it is.

As a note, this is my first piece of code written in Go so it probably won't be as good as I'd like to. Any comment or improvement will be gladly received :D

Installation

To start using Image Cleaner, install Go and run go get:

go get -u github.com/adriacabeza/go-imagecleaner

This will retrieve the library.

Moreover, if you prefer it, you can use the binary released version.

Usage

imagecleaner -imagesPath=IMAGE_PATH -threshold=THRESHOLD

If you do not specify any value for threshold it will use its default value = 10.

This will create a folder called clusters with each image structured into cluster folders. Note that the code only copies images, it does not remove them.

Example:

$ go run main.go ./cluster_utils.go ./image_utils.go -imagesPath=/Users/adria/Downloads/Photos
Starting to cluster your images from /Users/adria/Downloads/Photos
Selected 6 images
 6 / 6 [=================================================================================] 100.00% 2s
Images hashed and BK-tree created
Creating clusters
 6 / 6 [=================================================================================] 100.00% 1s
Found 3 clusters in 6 images
Clusters created
 3 / 3 [=================================================================================] 100.00% 0s
Done

note that all the clusters that are size 1 (just one image) are merged into a big folder of unique images

TODO

  • Try another hash functions
  • Add some testing
  • Create binary

Credits