Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Exclude Forks" Additional Filter gives a result still containing some forks #166

Open
wolfenmark opened this issue Aug 29, 2023 · 4 comments
Assignees
Labels
bug Something isn't working server Concerning the server

Comments

@wolfenmark
Copy link

wolfenmark commented Aug 29, 2023

Description
I got from GHS a list of projects with at least 10 contributors, 100 stars, 1000 commits, and explicitly requested GHS to exclude forks with the filter checkbox in the UI.
When checking projects in the list, there are still some that are forks (9 out of about 12,000).

Replication
These are two projects that were in the list and that you can use to reproduce the bug.
If you search for them in GHS with Exclude Forks checked, they are returned, but if clicked, they clearly show as forks on GitHub (and are also forks according to the REST API):

To replicate: just search for "quaprosoft" or "anurodhp" with Exclude Forks checked, they're returned, click the link to the repo on GitHub and see the "forked from:" at the top.

I included two examples because the anurodhp/VaxProj was renamed from anurodhp/monal, but qaprosoft/carina was not.
Both these projects had their last commit in 2021 (March and November respectively).

Other Info
Other projects I didn't manually check but that were marked as forks by my analysis (if you need more cases to investigate):

  • WorldDbs/lotus - GHS Original Name: worlddbs/lotus
  • iDevision/enhanced-discord.py - GHS Original Name: idevision/enhanced-discord.py
  • terraform-providers/terraform-provider-oci - GHS Original Name: terraform-providers/terraform-provider-oci
  • krassowski/jupyterlab-lsp - GHS Original Name: krassowski/jupyterlab-lsp
  • pliablepixels/zmNinja - GHS Original Name: pliablepixels/zmninja
  • bhj/Karaoke-Forever - GHS Original Name: bhj/karaoke-forever
  • sibirrer/lenstronomy - GHS Original Name: sibirrer/lenstronomy
  1. Casing is not relevant for renames, reported as coming from my tool, but GitHub is case insensitive for repo names. The only rename example is anurodhp/monal -> anurodhp/VaxProj.

  2. The list might be outdated since I have old data already analyzed from which I am getting them, but the two examples I manually checked are definitely still exhibiting the problem.

@wolfenmark
Copy link
Author

The two example repositories also show stats in GHS that are different from the ones in the actual repo on GitHub.

Examples:

  • 299 stars for anurodhp/monal in GHS vs. 1 star for anurodhp/VaxProj (possible reason for it not to be updated anymore <10 stars criterion).
  • Same for qaprosoft/carina, 570 stars according to GHS, 4 stars now in GitHub.

@dabico
Copy link
Member

dabico commented Aug 30, 2023

Given that the former fork is still reachable if you look for it by its old name, this leads me to believe that the project started under the old name as a non-fork, was deleted by the owner, and then re-created as a fork under the same name, before finally being renamed. The deletion would explain the drop in stars. However, this is all guesswork on my part, and it might even be better to reach out to @anurodhp directly for a timeline of events. A clear understanding of the project's lifecycle will help us avoid instances of this in the future.

The latter project was in all likelihood deleted and then re-created as a fork. Given that it never went over 10 stars and was not updated for a long time, we could never update the information. This does open a new can of worms: What should we do with projects that go below the star threshold, after they were mined? I think the best course of action would be to devise a new "maintenance" job that periodically checks repositories that have not been updated in a very long time, refreshing stale information, and removing the repository if it no longer satisfies the star criteria.

@dabico dabico self-assigned this Aug 30, 2023
@dabico dabico added the enhancement Improvements or additions to features label Aug 30, 2023
@dabico
Copy link
Member

dabico commented Aug 30, 2023

By the way, regarding the naming mismatches: GitHub does not discriminate casing in repository names. What I mean by this is that given a repository ghs by user @seart-group, the same user would not be able to create Ghs or GHS.

Example:

All of these API links point to the same repository. My point is that there is no difference between the actual name and its lower-case variant. We used to keep all names in lowercase (due to a misunderstanding by one of the maintainers), but I have since changed it to be stored as is displayed in GitHub. As a result, you may still see some repositories that were not updated in a long time have a case mismatch in the stored and actual name. But I guess that this will also be rectified with the proposed "maintenance" job.

@wolfenmark
Copy link
Author

No issue with the names, indeed GitHub is case insensitive for repo names. I reported them as coming from my tool (didn't mean to imply a difference with the casing) to check if renames are possibly linked to the problem. The only rename example seems to be anurodhp/monal -> anurodhp/VaxProj. Added a note to the issue to clarify this.

@dabico dabico added bug Something isn't working and removed enhancement Improvements or additions to features labels Sep 18, 2023
@dabico dabico added the server Concerning the server label Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working server Concerning the server
Projects
None yet
Development

No branches or pull requests

2 participants