Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests fails on follow up runs #211

Open
ReenigneArcher opened this issue Apr 25, 2024 · 9 comments
Open

tests fails on follow up runs #211

ReenigneArcher opened this issue Apr 25, 2024 · 9 comments

Comments

@ReenigneArcher
Copy link

ReenigneArcher commented Apr 25, 2024

I have the below code to test a praw reddit bot. The first test always succeeds, but follow up tests fail to find the required calls in the cassette. I discovered that it is related to the filter_access_token callback, and if I disable that, then follow up tests will succeed. Am I doing something wrong in the callback function?

from base64 import b64encode
import json
import os
from urllib.parse import quote_plus

from betamax import Betamax, cassette
from betamax_serializers.pretty_json import PrettyJSONSerializer
from praw.config import _NotSet
import pytest
from praw.models import Submission, Comment

from src.reddit.bot import Bot

Betamax.register_serializer(PrettyJSONSerializer)


def b64_string(input_string: str) -> str:
    """Return a base64 encoded string (not bytes) from input_string."""
    return b64encode(input_string.encode('utf-8')).decode('utf-8')


def filter_access_token(
        interaction: cassette.cassette.Interaction,
        current_cassette: cassette.cassette.Cassette,
):
    """Add Betamax placeholder to filter access token."""
    request_uri = interaction.data['request']['uri']
    response = interaction.data['response']

    # We only care about requests that generate an access token.
    if ('api/v1/access_token' not in request_uri or
            response['status']['code'] != 200):
        return
    body = response['body']['string']

    token_types = [
        'access_token',
        'refresh_token',
    ]
    for token_type in token_types:
        try:
            token = json.loads(body)[token_type]
        except (KeyError, TypeError, ValueError):
            continue

        # If we succeeded in finding the token, add it to the placeholders for this cassette.
        current_cassette.placeholders.append(
            cassette.cassette.Placeholder(placeholder=f'<{token_type.upper()}>', replace=token)
        )


def get_placeholders(bot: Bot) -> dict[str, str]:
    """Prepare placeholders for sensitive information."""
    filter_keys = [
        'client_id',
        'client_secret',
        'password',
        'username',
    ]

    placeholders = {
        attr: getattr(bot.reddit.config, attr)
        for attr in filter_keys}

    # Check if attributes exist and are not _NotSet before using them
    for key in placeholders:
        if isinstance(placeholders[key], _NotSet):
            placeholders[key] = ''

    # Password is sent URL-encoded.
    placeholders['password'] = quote_plus(placeholders['password'])

    # Client ID and secret are sent in base-64 encoding.
    placeholders['basic_auth'] = b64_string(
        "{}:{}".format(placeholders['client_id'],
                       placeholders['client_secret'])
    )

    return placeholders


class TestBot:
    @pytest.fixture(scope='session')
    def bot(self):
        return Bot(
            user_agent='Test suite',
        )

    @pytest.fixture(scope='session', autouse=True)
    def betamax_config(self, bot):
        with Betamax.configure() as config:
            config.cassette_library_dir = 'tests/fixtures/cassettes'
            config.default_cassette_options['serialize_with'] = 'prettyjson'
            config.before_record(callback=filter_access_token)

            # Add placeholders for sensitive information.
            for key, value in get_placeholders(bot).items():
                config.define_cassette_placeholder('<{}>'.format(key.upper()), value)

    @pytest.fixture(scope='session')
    def session(self, bot):
        http = bot.reddit._core._requestor._http
        http.headers['Accept-Encoding'] = 'identity'  # ensure response is human readable
        return http

    @pytest.fixture(scope='session')
    def recorder(self, session):
        return Betamax(session)

    def test_submission(self, bot, recorder, request):
        submission = bot.reddit.submission(id='w03cku')
        with recorder.use_cassette(request.node.name):
            assert submission.author

I get the following error.

self = <prawcore.requestor.Requestor object at 0x00000170BD94D8D0>, timeout = None, args = ('post', 'https://www.reddit.com/api/v1/access_token')
kwargs = {'auth': ('**********************', '******************************'), 'data': [('grant_type', 'password'), ('password', '****************'), ('username', '**************')], 'headers': {'Connection': 'close'}}

    def request(
        self, *args: Any, timeout: float | None = None, **kwargs: Any
    ) -> Response:
        """Issue the HTTP request capturing any errors that may occur."""
        try:
            return self._http.request(*args, timeout=timeout or self.timeout, **kwargs)
        except Exception as exc:  # noqa: BLE001
>           raise RequestException(exc, args, kwargs) from None
E           prawcore.exceptions.RequestException: error with request A request was made that could not be handled.
E
E           A request was made to https://www.reddit.com/api/v1/access_token that could not be found in test_submission.
E
E           The settings on the cassette are:
E
E               - record_mode: once
E               - match_options {'uri', 'method'}.

venv\Lib\site-packages\prawcore\requestor.py:70: RequestException
--------------------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------------------- 
WARNING  prawcore:sessions.py:161 Retrying due to ChunkedEncodingError(ProtocolError('Connection broken: IncompleteRead(93 bytes read, 755 more expected)', IncompleteRead(93 bytes read, 755 more expected))) status: GET https://oauth.reddit.com/comments/w03cku/

I've been trying to follow this (somewhat outdated) blog/tutorial (https://leviroth.github.io/2017-05-16-testing-reddit-bots-with-betamax/) on using betamax and I believe this is the last issue I have to solve.

Edit:
I did find that I can use

cassette.cassette.Placeholder(placeholder='*' * len(token), replace=token)

instead of

cassette.cassette.Placeholder(placeholder=f'<{token_type.upper()}>', replace=token)

and it will work... but I'm not sure that's the best solution.

@sigmavirus24
Copy link
Collaborator

    def test_submission(self, bot, recorder, request):
        submission = bot.reddit.submission(id='w03cku')
        with recorder.use_cassette(request.node.name):
            assert submission.author

Seems to me like it doesn't perform a request while betamax is recording but I don't know your bot code

@ReenigneArcher
Copy link
Author

praw creates lazy objects and doesn't make an api request until you access a property.

The issue is with the 'Content-Length' header mismatch, as the placeholder is shorter in length than the original value.

I can patch the content-length manually before playback... but I wonder if there's a better way.

@sigmavirus24
Copy link
Collaborator

You likely need the preserve exact body bytes option which I don't see you setting here explicitly and which is documented in cases like this

@ReenigneArcher
Copy link
Author

Well, I need to strip out an access token from the body... my understanding (based on the docs) is the body would remain unchanged with that option, thus exposing my access token.

@ReenigneArcher
Copy link
Author

This ended up working as a before playback callback... ideally I would like to do this before record, but I couldn't get it to work.

def patch_content_length(
    interaction: cassette.cassette.Interaction,
    current_cassette: cassette.cassette.Cassette,
):
    """Fix the Content-Length header in the response after sanitizing tokens."""
    request_uri = interaction.data['request']['uri']
    response = interaction.data['response']

    # # We only care about requests that generate an access token.
    if ('api/v1/access_token' not in request_uri or
            response['status']['code'] != 200):
        return
    body = response['body']['string']
    content_length = len(body.encode('utf-8'))
    response['headers']['Content-Length'] = [str(content_length)]

@sigmavirus24
Copy link
Collaborator

You're other option is that since you're using identity for accept encoding you ensure that whatever you use as a placeholder is the right length so whatever you replace it with is the same length otherwise you'll encounter this for everything.

@sigmavirus24
Copy link
Collaborator

https://betamax.readthedocs.io/en/latest/configuring.html#filtering-sensitive-data describes before record hook which happens after sanitization

@ReenigneArcher
Copy link
Author

Right... the problem being that when the length of the placeholder is different than the original length for what you want to replace... the "Content-Length" header no longer matches... and follow up tests will fail due to the IncompleteRead error, since it's expecting more bytes than it actually receives.

@sigmavirus24
Copy link
Collaborator

Right, there might be a way to disable that in urllib3/http.client but I barely have the time right now to answer this issue. I could review a PR to do that but not sure when I'd have a chance to kick off a release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants