Breaking change: help test the new Spotify folders format #10

mikez · 2023-12-10T20:14:56Z

Spotify changed the cache storage tech as of 2023-11-30. The code was substantially rewritten. You're invited to test the new setup and report any issues here.

So far, I've tested it on one macOS machine. More testing is especially needed for Linux and Windows.

If you were able to run this and verify your folder structure, please give a thumbs up here 👍; also, feel free to report the operating system you tested it on.

Installation instructions (unchanged)

curl -L https://git.io/folders > /usr/local/bin/spotifyfolders
chmod +x /usr/local/bin/spotifyfolders

Note: If you haven't changed your folder hierarchy recently, it may be stored in a compressed format. In that case, you may receive additional instructions on how to install the "snappy" decompression library. You can circumvent this by making a tiny change to your folder structure (which you can immediately change back again); this will store the folder hierarchy in an uncompressed cache.

(/cc @technomorph, @Nitemice, @inferrinizzard, @chocolateboy, @AlexSWall, @sydoracle, @ShiromMakkad)

mikez · 2023-12-10T20:18:47Z

macOS with Python 3.9, 3.10, 3.11, and 3.12 works here and verified folder structure. 👍

Nitemice · 2023-12-16T06:13:20Z

Seems to be totally broken for me on Windows. With the new update, I'm only getting an error message: "No data found in the Spotify cache. ...". My client, at least, doesn't seem to be using the root level log or lbd file. I'll investigate a bit more hopefully next week.

mikez · 2023-12-16T09:28:25Z

My client, at least, doesn't seem to be using the root level log or lbd file. I'll investigate a bit more hopefully next week.

👍

@Nitemice If it's using the old format, we could fall back to that. If it helps, what's the folder structure of your windows_appdata_path or windows_store_path?

On macOS, the structure is:

~/Library/Application Support/Spotify/PersistentCache
- Storage/
- Users/
  - XXX-user/
    - primary.ldb/
- public.ldb/

Nitemice · 2024-01-04T13:31:20Z

I've looked into it a bit more, and it seems like Spotify no longer uses the custom cache directory for storing the lbd files. It's in the windows_store_path for me now, and the structure matches what you described.

However, even when I run the script without the custom cache path, it fails. I've tried following what exactly it's looking for, but I don't quite understand the pattern it's trying to match in the file. Either the files on Windows are a bit different, or there's something going wrong with the pattern string.
From just looking through the files naively, I've found a few that contains strings that look close to what we seem to be looking for, but they're incomplete compared to what the script is looking for.

Could you provide a bit of clarification on how the pattern string is constructed, if you know? Otherwise, I'm happy to provide some sample files if that would be helpful.

mikez · 2024-01-04T22:29:22Z

Thanks, @Nitemice!

if I understand you correctly, it sounds like the windows_store_path structure matches the macOS path structure. In that case, you should be able to run:

spotifyfolders --cache CACHE_DIR

where you replace CACHE_DIR with the windows_store_path.

If that doesn't work, then can you provide any sample files or directory structures you see? That would be most helpful. I'll look into it. — if you want to search for some pattern, look for "rootlist".

Nitemice · 2024-01-05T00:38:54Z

Hey @mikez,
Yes, the structure under windows_store_path matches what you indicated on macOS, and the script already looks in that location, so I don't need to specify a CACHE_DIR. That part is working fine now.

The problem now is that the SpotifyLevelDB.get() function is failing to find a log or ldb file that contains a match for the LEVELDB_ROOTLIST_KEY. Looking though the files, and stepping through manually, it looks like it should find a match, but it fails on the bytestring_less_or_equal function (L417). I'm not sure why yet.

mikez · 2024-01-05T06:43:04Z

@Nitemice Ah, I see. The LevelDB files are there, but something seems to fail—it can't seem to find the key.

I can think of 2 ways to debug this:

You sent me a zip of your file structure under Spotify's windows_store_path and I'll take a look.
You install https://github.com/google/leveldb/ (can be challenging if you're not used to building C++) which gives you access to the leveldbutil command. With that one you can dump the contents of levelDB into plaintext and seek the contents of the key that way.

As an aside: Spotify uses a greenbase.KeyComparator to decide the order of keys in the table files. This contents of this function are not public AFAIK, so I made some educated guesses. The assumptions may be wrong here.

Nitemice · 2024-01-05T12:13:41Z

I've stepped through the code and I've managed to get it working. However, I had to make two major (probably breaking) changes.

At first, I thought the issue was that the LEVELDB_ROOTLIST_KEY was being evaluated wrong, and the group separator symbol wasn't being encoded correctly, but it turned out that it's actually the opposite. It seems that there is no group separator in the rootlist key in my levelDB. Instead it's just a space char. So I've changed L24 to:

LEVELDB_ROOTLIST_KEY = b"!pl#slc# spotify:user:{}:rootlist#"

That's odd and all, but after I made that change, I found that still didn't fix it. Instead now the code wasn't even reaching the files that contained the rootlist key. bytestring_less_or_equal returns True in a bunch of cases where the keys don't match, which caused it to break the loop and just give up searching.
My initial fix was to edit bytestring_less_or_equal as below, because while I understand this behaviour when the function is used as a comparator, we're not really using it as one here:

        if byte1 == group_separator and byte2 != group_separator:
            return False
        if byte1 != group_separator and byte2 == group_separator:
            return False
        if byte1 < byte2:
            return False
        elif byte1 > byte2:
            return False

But on further review of the code, a simpler fix is to simply remove the break on L422.

I have pushed a branch with those changes for you to have a proper look: Nitemice@63a40c7

mikez · 2024-01-05T14:48:22Z

@Nitemice Nice debugging there! :)

Change 1 is very interesting, and I'd like to learn more about this—is this Windows specific, is there something else going on? I wonder if we could get others to give us data here.

Change 2 makes it so that every key in every table is traversed. This takes longer. However, it could be a fall-back option, in case no key is found.

Here's what I propose in terms of changes (for now):

have two search keys. On macOS seek the first one first, on Windows the other one first. Then flip if nothing is found.
make a regular (fast) search; if that fails, then make the slower search.

I'm assuming the greenbase.KeyComparator may need fixing. What would be really helpful, is if you can check all the keys in your tables, and see how they're ordered; I'm curious if you can find any patterns. For me, it seems to be an alphanumeric regular ordering, except for the group_separators. If you don't know how to output the keys in a ldb file (in their stored order), I can write you a script.

mikez · 2024-01-05T17:04:23Z

@Nitemice Digging some more, I've noticed the pattern of the keys here seem to be:

PREFIX {SPACER} SUFFIX

where PREFIX can be something like:

!pl#add#
!pl#changes#
!cit#cit#
etc.

and the spacers are

$
%
&
\x1d
etc.

Does the spacer encode some number, is there some logic behind it? I don't understand. Also, the ordering of the keys becomes more confusing the closer I look at it.

See #10 for details.

mikez · 2024-01-05T18:19:09Z

@Nitemice FYI, pushed a new version based on our current understanding.

Nitemice · 2024-01-06T05:45:56Z

Change 1 is very interesting, and I'd like to learn more about this

I agree, it'd be good to have more data from others to see if it's the same. If I had to guess, I'd say it's a Windows vs Unix thing, but who knows!

Change 2 makes it so that every key in every table is traversed.

I'll admit I didn't fully understand what a lot of this code was doing, and I probably still don't fully get it. But now that I've more closely examined this loop, and the bytestring_less_or_equal function, I think I get why the break is there.

I think I've found a simpler solution that shouldn't slow down the runtime much if at all. What if we modify the bytestring_less_or_equal function so that space and group-separator character are both treated as "group separators"? I've made a commit as a demo, and it seems to work well for me: Nitemice@974692e

have two search keys. On macOS seek the first one first, on Windows the other one first. Then flip if nothing is found.

I think this is a reasonable solution, but if we start finding even more alternatives, we may need to rethink it (maybe in combination with the solution above?).
Only issue with this is I tried running your updated code, and it wouldn't work for me. Apparently there isn't a reverse() function for tuples, so I had to turn the key_templates from a tuple to a list.

---    key_templates = (LEVELDB_ROOTLIST_KEY_1, LEVELDB_ROOTLIST_KEY_2)
+++    key_templates = [LEVELDB_ROOTLIST_KEY_1, LEVELDB_ROOTLIST_KEY_2]

make a regular (fast) search; if that fails, then make the slower search.

Like I mentioned, I think the above solution is cleaner and simpler. But it's really up to you.

If you don't know how to output the keys in a ldb file (in their stored order), I can write you a script.

I spent a large chunk of time over the last few days trying to dump the ldb various ways, but nothing has worked for me on Windows. If you've got a script that you think'll work (and doesn't depend on a dozen extra things), I'd be happy to give it a go.

I've noticed the pattern of the keys here seem to be:

Yeah, I think PREFIX {SPACER} SUFFIX is accurate, although each section can vary wildly. The details that seem most fixed to me are:

each prefix begins with a '!' and ends with a '#'
each suffix ends with a '#'

I've also seen the following as spacers:

\x16
'

mikez · 2024-01-06T10:23:10Z

I think I've found a simpler solution that shouldn't slow down the runtime much if at all. What if we modify the bytestring_less_or_equal function so that space and group-separator character are both treated as "group separators"? I've made a commit as a demo, and it seems to work well for me: Nitemice@974692e

I like simple solutions! :) In this specific case, this seems a bit speculative for me (and prone to false negatives), since we don't quite know yet what that "SPACER" symbol means. (Including its usage in greenbase.KeyComparator.) There may be times where it may have another function. I'd opt to be more conservative here and do the "slow search" until we have more clarity on greenbase.KeyComparator.

have two search keys. On macOS seek the first one first, on Windows the other one first. Then flip if nothing is found.

I think this is a reasonable solution, but if we start finding even more alternatives, we may need to rethink it (maybe in combination with the solution above?). Only issue with this is I tried running your updated code, and it wouldn't work for me. Apparently there isn't a reverse() function for tuples, so I had to turn the key_templates from a tuple to a list.

👍 This is an error. Thank you. I pushed a fix.

make a regular (fast) search; if that fails, then make the slower search.

Like I mentioned, I think the above solution is cleaner and simpler. But it's really up to you.

I much prefer cleaner and simpler. I see this as a transitory solution until we have clarity on how greenbase.KeyComparator works. Then we can massively clean up the code and only need one attempt.

If you don't know how to output the keys in a ldb file (in their stored order), I can write you a script.

I spent a large chunk of time over the last few days trying to dump the ldb various ways, but nothing has worked for me on Windows. If you've got a script that you think'll work (and doesn't depend on a dozen extra things), I'd be happy to give it a go.

https://gist.github.com/mikez/dcd8cd65319049e434820c3c9459cc4a

No dependencies; except for the folders.py script. Put this into the same directory as that file.
Read the description in the beginning for a basic primer on which keys are where in what order.

As for greenbase.KeyComparator: I haven't understood the ordering. Sometimes it seems alphanumeric; at other times, it seems the SPACER symbol has higher weight than the rest, even the characters before it.

Nitemice · 2024-01-07T04:28:00Z

In this specific case, this seems a bit speculative for me (and prone to false negatives)

I don't disagree. You're probably right; this is the best we can do for now, until we have a much better handle on what's going on with the key's structure and sorting.

I've run the script you provided (thanks for that!) on the file that contain the 'rootlist' keys, because really, that's the only one we care about. I've attached a anonymised/summarised version of the output.
042129_summ.txt

Basically, it looks to be in alphanumeric order, with group separators (in this case space chars) treated as greater than other chars.
We need some people on other platforms to confirm that it's the same for them, and we should be good from there.

mikez · 2024-01-07T12:23:49Z

Thank you, @Nitemice, this is very helpful.
I now figured out the spacer symbol. It encodes the bytelength of the upcoming data.
The same pattern seems to be used for all !-keys:

!{prefix}#{len}{data}#{len}{data}#…#"

Sometimes the data can contain #, ! and such symbols too, but the "len" prefix makes that clear.
I pushed an update making the code simpler.

Greenbase comparator is still unclear.

mikez · 2024-01-07T15:29:41Z

Ok, I think I figured out the comparator approximately too now. It's essentially alphanumeric, but ignores the varint {length} sections. Pushed new version. Leaving in slow_search as a backup in case DB gets corrupted.

mikez · 2024-01-10T09:15:04Z

@Nitemice Can you confirm that this works on your end?

Nitemice · 2024-01-11T10:11:27Z

@Nitemice Can you confirm that this works on your end?

Yes, working for me now. Thank you!

mikez mentioned this issue Dec 10, 2023

New Spotify Cache Broken - But found new file #9

Closed

mikez added a commit that referenced this issue Jan 5, 2024

Windows fix: Do four attempts at finding key (thx to @Nitemice)

a2e97c7

See #10 for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking change: help test the new Spotify folders format #10

Breaking change: help test the new Spotify folders format #10

mikez commented Dec 10, 2023 •

edited

Loading

mikez commented Dec 10, 2023 •

edited

Loading

Nitemice commented Dec 16, 2023

mikez commented Dec 16, 2023 •

edited

Loading

Nitemice commented Jan 4, 2024

mikez commented Jan 4, 2024

Nitemice commented Jan 5, 2024 •

edited

Loading

mikez commented Jan 5, 2024

Nitemice commented Jan 5, 2024 •

edited

Loading

mikez commented Jan 5, 2024 •

edited

Loading

mikez commented Jan 5, 2024

mikez commented Jan 5, 2024

Nitemice commented Jan 6, 2024

mikez commented Jan 6, 2024 •

edited

Loading

Nitemice commented Jan 7, 2024

mikez commented Jan 7, 2024

mikez commented Jan 7, 2024

mikez commented Jan 10, 2024

Nitemice commented Jan 11, 2024

Breaking change: help test the new Spotify folders format #10

Breaking change: help test the new Spotify folders format #10

Comments

mikez commented Dec 10, 2023 • edited Loading

Installation instructions (unchanged)

mikez commented Dec 10, 2023 • edited Loading

Nitemice commented Dec 16, 2023

mikez commented Dec 16, 2023 • edited Loading

Nitemice commented Jan 4, 2024

mikez commented Jan 4, 2024

Nitemice commented Jan 5, 2024 • edited Loading

mikez commented Jan 5, 2024

Nitemice commented Jan 5, 2024 • edited Loading

mikez commented Jan 5, 2024 • edited Loading

mikez commented Jan 5, 2024

mikez commented Jan 5, 2024

Nitemice commented Jan 6, 2024

mikez commented Jan 6, 2024 • edited Loading

Nitemice commented Jan 7, 2024

mikez commented Jan 7, 2024

mikez commented Jan 7, 2024

mikez commented Jan 10, 2024

Nitemice commented Jan 11, 2024

mikez commented Dec 10, 2023 •

edited

Loading

mikez commented Dec 10, 2023 •

edited

Loading

mikez commented Dec 16, 2023 •

edited

Loading

Nitemice commented Jan 5, 2024 •

edited

Loading

Nitemice commented Jan 5, 2024 •

edited

Loading

mikez commented Jan 5, 2024 •

edited

Loading

mikez commented Jan 6, 2024 •

edited

Loading