Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MusicbrainzNGS replacement #4936

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

Louson
Copy link

@Louson Louson commented Oct 6, 2023

Description

Suggestion for musicbrainz substitution in relation with discussion #4660

Tests have been ran with command tox -e py311-test

To Do

  • Documentation. (If you've add a new command-line flag, for example, find the appropriate page under docs/ to describe it.)
  • Changelog. (Add an entry to docs/changelog.rst near the top of the document.)
  • Tests. (Encouraged but not strictly required.)

@Louson Louson force-pushed the dev/louis/mbzero branch 2 times, most recently from 62900d8 to 6351d61 Compare November 7, 2023 10:52
Copy link

stale bot commented Mar 17, 2024

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 17, 2024
@Louson
Copy link
Author

Louson commented Mar 18, 2024

Hello, it is still relevant but the merge is corrupted since the massive style conformance that was made last summer. Unfortunately I have a serious lack of availability to fix it.

However the replacement library mbzero is not sufficiently experimented and would expect some review.

To move forward, we could split this PR to prepare for a replacement and drop the last commit.

I would be happy to work on that when I have time, but if anyone wants to take the lead, it's also fine.

As musicbrainzngs is still unmaintained, I would recommend to keep that PR open

@stale stale bot removed the stale label May 29, 2024
@Louson Louson marked this pull request as draft May 29, 2024 20:12
@Louson Louson force-pushed the dev/louis/mbzero branch 2 times, most recently from 4280735 to 6cbc3f2 Compare May 29, 2024 20:50
@Louson Louson changed the title DRAFT: dev/louis/mbzero dev/louis/mbzero May 29, 2024
@Louson Louson marked this pull request as ready for review May 29, 2024 20:50
@Louson
Copy link
Author

Louson commented May 29, 2024

Hello, as the library mbzero evolved (test coverage got pretty good), I updated the request to match the new style.

Tests done: py312 py312-lint format format_check

Again, read commit per commit.

Suggestions are welcome

(ping @sampsyo)

@Louson Louson changed the title dev/louis/mbzero MusicbrainzNGS replacement Jun 18, 2024
@Serene-Arc
Copy link
Contributor

Serene-Arc commented Jun 25, 2024

Thanks for the PR! First off, I made a bunch of comments on the code but there's one issue to deal with before anything else: mbzero requires Python 3.10, which is not the minimum-supported python version for the beets project. That would be 3.8, soon to be 3.9. We won't support 3.10 until October, 2025 at the earliest. This is why the tests are failing.

Do you have a solution to that? Because if not, this PR will have to be placed on hold until 2025-10. I see you're the maintainer for mbzero; is there a reason that 3.10 is your MSV?

@Louson
Copy link
Author

Louson commented Jun 25, 2024

Hello,

I saw that last week. I don't think there is any blocker, 3.10 was my python version at the time of development. Let me check on 3.8 and 3.9.

Louis Rannou added 2 commits June 25, 2024 14:54
Some of the work that was done in musicbrainzngs requires to be done in beets to
lighten the musicbrainz API.

Signed-off-by: Louis Rannou <[email protected]>
The structure of data changes a bit

Signed-off-by: Louis Rannou <[email protected]>
@Louson Louson force-pushed the dev/louis/mbzero branch 4 times, most recently from d3147cf to 267c1af Compare June 25, 2024 14:10
Copy link
Contributor

@Serene-Arc Serene-Arc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking that! There's a couple things to address with the code though

setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
Comment on lines 61 to 86
limit_interval = 1.0
limit_requests = 1
do_rate_limit = True


def set_rate_limit(limit_or_interval=1.0, new_requests=1):
"""Sets the rate limiting behavior of the module. Must be invoked
before the first Web service call.
If the `limit_or_interval` parameter is set to False then
rate limiting will be disabled. If it is a number then only
a set number of requests (`new_requests`) will be made per
given interval (`limit_or_interval`).
"""
global limit_interval
global limit_requests
global do_rate_limit
if isinstance(limit_or_interval, bool):
do_rate_limit = limit_or_interval
else:
if limit_or_interval <= 0.0:
raise ValueError("limit_or_interval can't be less than 0")
if new_requests <= 0:
raise ValueError("new_requests can't be less than 0")
do_rate_limit = True
limit_interval = limit_or_interval
limit_requests = new_requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use something other than these types of global variables? It's doesn't fit with beet's OOP model. Perhaps a singleton class for the rate limiter would work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can undertsand that ! So far the PR s a 1st step. That's a simple copy from what used to be in musicbrainzngs.

If the whole idea is fine for beets, I'm ok to find a solution. Would you have any tip for the singleton ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would look up singletons in Python, but they're pretty simple. If you don't know what they are (and sorry for being presumptive, if you do know then ignore), they are classes that can only create a single instance. You store an instance in the class data and that is the only one that is every created, which allows you to keep track of state like with these global variables, but in an OOP way.

Here is an explanation of how to create a singleton in Python. Again, ignore if you're already aware.

For the implementation, I would suggest perhaps a class that manages requests and handles the rate limiting. The RateLim decorator would be the best place, you can make it a singleton and then just use the decorator. If that doesn't work, then there can be a separate object that manages the rate inside the decorator should be fine. It's basically just trying to keep track of a counter and maybe a lock for the sake of concurrency. That's prime singleton stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as far as first steps go, this is good! My idea on PRs is that they should be complete enough to not need immediate changes. I think the global variables would be that type of thing; it's just a change that should be done, and there's no reason not to do it now rather than add them and then have to remove them in a separate PR.

beets/autotag/mb.py Outdated Show resolved Hide resolved
Comment on lines 307 to 308
useragent = "beets/{} (https://beets.io/)".format(beets.__version__)
mbi = MbInterface(useragent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More global variables. Is there a way to not have global variables for this? The user agent isn't much of a problem as a global but the MbInterface really shouldn't be one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree. I had the idea to create an other file mbinterface.py that would replace the MbInterface class. The drawback is that we would have two files for the musicbrainz connection.

I'm not sure what's good here. Could we opt for an acceptable solution until mb.py is restructured ? That file is not really OOP btw

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's a solution, properly. The issue is that the mbi object is being initialised in the module directly. Honestly a better solution, based on its structure and that the user-agent is a constant, we should just make a new object of the MbInterface whenever we call it, or even convert them to class functions. The latter is probably best. There's no actual reason to make an MbInterface object unless you're changing the source from the MusicBrainz servers to something else.

Even integrating it with the rate limiting code as a singleton could work. It's tricky. If we didn't offer support for other sources, I would just say make them class functions. Maybe passing in a destination object or reading that directly from the config could work.

What do you think?

Louis Rannou added 4 commits June 30, 2024 14:46
mbzero is much lighter and things have to be implemented in beets:

- there is no helper such as get_release_by_id
- no checking, so wrong requests will raise an exception
- there is nothing to limit the requests, it needs to be implemented
- mocks are moved to the interface

Signed-off-by: Louis Rannou <[email protected]>
mbzero 0.4 has support for python 3.8 and 3.9

Signed-off-by: Louis Rannou <[email protected]>
Comment on lines -1871 to +1901
"release": {
"title": releases[id_][0],
"id": id_,
"medium-list": [
{
"track-list": [
{
"id": "baz",
"recording": {
"title": "foo",
"id": "bar",
"length": 59,
},
"position": 9,
"number": "A2",
}
],
"position": 5,
}
],
"artist-credit": [
{
"artist": {
"name": releases[id_][1],
"id": "some-id",
},
}
],
"release-group": {
"id": "another-id",
},
"status": "Official",
}
"title": releases[id_][0],
"id": id_,
"media": [
{
"tracks": [
{
"id": "baz",
"recording": {
"title": "foo",
"id": "bar",
"length": 59,
},
"position": 9,
"number": "A2",
}
],
"position": 5,
}
],
"artist-credit": [
{
"artist": {
"name": releases[id_][1],
"id": "some-id",
},
}
],
"release-group": {
"id": "another-id",
},
"status": "Official",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to do a full review of the PR with these new changes, but there is one thing that could help me get through it a bit faster. Could you explain why the structure of the response dict has changes here? It just raises a red flag when the test fixture has this type of change.

It only works if everywhere in the code that expected the old structure has changed as well, which I'll need to check and can be time consuming. Because changing the tests without changing the codebase can lead to a mismatch between the tests and the actual code if the fixtures and mocks don't actually relate. Unnesting a value like this can cause big problems.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. For a better understanding, you may go commit by commit.

The structure of the response change because the library relies on the json interface instead of the xml as suggested in #4651 (comment)
The structure of the response is a bit different.

The difference is mainly :

  1. the "release" sublevel has disappeared
  2. "xxx-list" has become "xxxs"
  3. "medium-list" has become "media"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello. Eventually, I my split the work and first have a replacement that keeps the xml interface. That would leave the test. The xml is supported by mbzero anyway.
When that's fine, we can think about dropping the xml in favor of json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! I still need to review, my semester has hit like a truck so I have had very, very little spare time or energy to do code review. I'll get to it soon hopefully :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants