Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abandonware #1397

Open
Kojoley opened this issue Apr 15, 2024 · 5 comments
Open

Abandonware #1397

Kojoley opened this issue Apr 15, 2024 · 5 comments

Comments

@Kojoley
Copy link

Kojoley commented Apr 15, 2024

Thanks for the awesome project, I really love the idea!

I think we have little understanding the scope of abandoned software and how much of it is still present in repositories. Repology shows such packages as newest, which is technically not wrong but lacks clarity about the health of the actual software and how new it actually is. I would understand if you consider this as out of the scope of Repology project.

For example:

  • mcrypt project is dead since 2008 but still presented in a lot of current repositories. The worst thing - it's a cryptography library that has multiple know security vulnerabilities, downstream package maintainers do patch them but you never know.
  • SDL Image is not dead project, Repology refers to SDL v1 compatible version 1.2.12 which was released in 2012, git repository branch SDL-1.2 contains version 1.2.13 with about 100 commits since 1.2.12 tag and includes fixes for known security vulnerabilities, but we don't know what repositories actually ships.
  • Nose is dead since 2015 and gave a major headache when Python 3.11 broke it.

IIUC Repology already collects information about project home page where a release date could be found or in case of a repository where the last commit was made. Showing that information would be already a great improvement, which can later be used to automatically flag packages as possibly obsolete/abandoned.

EOL distros like CentOS 6 (released in 2011, updates stopped in 2017, EOL since 2020) which is reported at 14% of newest packages or Ubuntu 14.04 (released in 2014, updates stopped in 2019, EOL 2024) which is reported at 22% of newest are probably good proxies to determine abandonware. It's actually scary how many unmaintained software could be still in use.

@captn3m0
Copy link

We track some of this at endoflife.date and have repology identifiers. As an example, our tomcat page includes the repology: tomcat identifier, and the API provides EOL information for all tomcat versions.

We do track EOL dates of linux distros, and we include CPE information that can be used as identifiers, but I am not sure if that is enough to link to the repology repository easily.

@AMDmi3
Copy link
Member

AMDmi3 commented Jun 20, 2024

Well I don't see much we can do here. It looks like two separate issues are covered by this issue:

Discovering upstream releases which are not yet packaged anywhere

From upstream homepages - this is very maintenance heavy task, as a huge corpus of regex/xpath patterns must be kept up to date, I don't have resources for that. Some repositories have their own mechanisms of that we could use, but that needs a lot of back end support if we're going to crawl pages on our own. If there's ready to use dataset of project(or local package name) - latest discovered version pairs, it could easily be used directly, but I don't know suitable ones (I've considered support for anitya and FreeBSD's portscout but these either are hard to match to repology projects or have lots of false positives). Anyway, there's no way that would cover noticeable fraction of 270k+ projects known to repology. It will cover well known projects, but these are already covered by the fact that frest packages for new releases of these appear within hours.

From upstream releases in form of tarballs or git tags - this looks more feasible, still there are issues. For instance, tags have random format, there's no way to automatically and reliable match projects to git repos as some repos may mention fork repos instead of upstreams, and upstreams may move. There are plans to support that in limited way, as in scrape tags only from github only in defined set of formats only when a project uses numeric versions only if one github repo is associated with a project + small set of manual overrides. I believe it would provide a lot of fresh release data and won't have many false positives, but still it needs a lot of code to be written.

Marking abandonware

Regardless of how I myself despise unmaintained software, this is opinionated, technically complex and not of much use. For some kinds of software (like maybe games and desktop apps which would do its job until it stops compiling with recent compilers and dependencies) this does not really matter. For others like mentioned crypto libs the relevant signal is conveyed through vulnerabilities. And which is most important, we may not label anything as abandoned unless we know tatt for sure, and as mentioned above, it's not technicaly possible.

Still, internally repology tracks release lifetime (first and last time each version was seen). This info is not used for anything yet, but it could to convey some additional info to users.

Summarizing, I see two viable ways of improvement:

  • Track upstream releases, where it's possible, accurate and does not require manual maintenance
  • Display first seen dates for versions

@Kojoley
Copy link
Author

Kojoley commented Jun 20, 2024

Discovering upstream releases which are not yet packaged anywhere

I think this is orthogonal. It would be nice to have, seeing achieved repository gives a clear signal of the project status, but not that important.

I've considered support for anitya and FreeBSD's portscout but these either are hard to match to repology projects or have lots of false positives

I couldn't find a dedicated issue about that. It makes no sense that release-monitoring.org doesn't map to Fedora repository (from fedora-infra/anitya#1066 it look like it is 1-1 mapping).

Marking abandonware

Regardless of how I myself despise unmaintained software, this is opinionated, technically complex and not of much use. For some kinds of software (like maybe games and desktop apps which would do its job until it stops compiling with recent compilers and dependencies) this does not really matter. For others like mentioned crypto libs the relevant signal is conveyed through vulnerabilities. And which is most important, we may not label anything as abandoned unless we know tatt for sure, and as mentioned above, it's not technicaly possible.

Yes there are different views about that. While one may state that it is a "feature complete software" it doesn't overwrite that it is unmaintained and released years ago. Even if a distro patches it.

  • Display first seen dates for versions

I would really love a version - date table for a package. I had to crawl for GCC/LLVM version release dates before.

For releases that are older (let say 2 years) a badge text could also include that (like 1.2.3 (2y old)). Badge "alt text" (when you hover cursor over it or click on touch display) could always have version release date (like Released on xxxx-xx-xx (x days/month/years ago)).


Btw, what happens when all tracked repositories drop some package? Repology still lists it as newest? Or it disappears from Repology? It's seems to be the latter or no such packages yet since https://repology.org/projects/?repos=0 shows No projects found matching the criteria.

@AMDmi3
Copy link
Member

AMDmi3 commented Jun 24, 2024

couldn't find a dedicated issue about that.

There probably isn't one.

It makes no sense that release-monitoring.org doesn't map to Fedora repository (from fedora-infra/anitya#1066 it look like it is 1-1 mapping).

And this is probably the reason it's not usable atm, as there currently is no way for a source to use name-project mappings from another source. I also suspect we'd have to use mappings from all repositories supported by anitya as it supports not only fedora, and this mechanism is fragile as package names may change. I also don't see a way to get all data from anitya with a single http request.

For releases that are older (let say 2 years) a badge text could also include that (like 1.2.3 (2y old)).

I don't think it should affect badges, as badges are usually used on projects' homepages, and from there it's already apparent that the project is abandoned. Sometimes badges are used for projects dependencies though, and for that case we could support appending dates with a param, but that way it probably won't be used at all.

Instead, it looks that it would be most usable on Repology's project page.

Still, we cannot do this right away, as the mentioned release date tracking mechanism is currently incomplete and it matches versions incorrectly is some cases. Even if it's fixed, there are cases of premature/incorrect package releases which would provide incorrect dates, and to cope with that we'd need complete backend revamp (planned with no due date, possibly never).

Badge "alt text" (when you hover cursor over it or click on touch display)

Repology has no control over badges alt text - it's defined by the page which embeds the badge.

Btw, what happens when all tracked repositories drop some package?

Versions come from packages, and if there are no packages there are no versions and thus no project, so it disappears from Repology. History is retained though and is still accessible if you know the project name.

@Kojoley
Copy link
Author

Kojoley commented Jun 24, 2024

couldn't find a dedicated issue about that.

There probably isn't one.

I've created #1411.

For releases that are older (let say 2 years) a badge text could also include that (like 1.2.3 (2y old)).

I don't think it should affect badges, as badges are usually used on projects' homepages, and from there it's already apparent that the project is abandoned. Sometimes badges are used for projects dependencies though, and for that case we could support appending dates with a param, but that way it probably won't be used at all.

I meant version number with a background color as a badge on repology.org itself (like on https://repology.org/projects/ or https://repology.org/projects/?inrepo=arch). Sorry for confusion.

Still, we cannot do this right away, as the mentioned release date tracking mechanism is currently incomplete and it matches versions incorrectly is some cases. Even if it's fixed, there are cases of premature/incorrect package releases which would provide incorrect dates, and to cope with that we'd need complete backend revamp (planned with no due date, possibly never).

Perfect is the enemy of good. Something is better than nothing. Anitya/release-monitoring.org have Retrieved on timestamp which is also is not a release date, but for new releases is close enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants