Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle aiohttp requests in Sentry reporting #5681

Merged
merged 2 commits into from
Feb 27, 2025

Conversation

agners
Copy link
Member

@agners agners commented Feb 27, 2025

Proposed change

The request header seems to be a dictionary in current Sentry SDK. The previous code actually failed with an exception when trying to unpack the header. However, it seems that Exceptions are not handled or printed in this filter function, so those issues were simply swallowed.

The new code has been tested to correctly sanitize and report issues during aiohttp requests.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to cli pull request:
  • Link to client library pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Ruff (ruff format supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Summary by CodeRabbit

  • New Features

    • Enhanced how network addresses and URLs are sanitized, ensuring that only authorized and valid hosts are used. This improvement reinforces the security of handling sensitive network data.
    • Introduced new test functions to validate the sanitation process for URLs and hostnames, ensuring sensitive information is properly filtered.
  • Bug Fixes

    • Improved assertions in tests to ensure that sensitive information is not present in filtered request data.
  • Refactor

    • Updated the processing of request headers by adopting a more straightforward approach, resulting in clearer logic and greater efficiency when managing sensitive header information.

@agners agners added the bugfix A bug fix label Feb 27, 2025
Copy link
Contributor

coderabbitai bot commented Feb 27, 2025

📝 Walkthrough

Walkthrough

A new function, sanitize_host, has been introduced to validate and sanitize host strings based on whether they fall within a designated Docker network mask. The sanitize_url function now leverages this new utility by extracting and validating the host rather than replacing it with a fixed domain. Additionally, the header processing in the filter_data function has been refactored from an index-based loop to direct dictionary access for headers such as REFERER, HEADER_TOKEN, HEADER_TOKEN_OLD, HOST, and X_FORWARDED_HOST.

Changes

File Summary of Changes
supervisor/misc/filter.py - Added sanitize_host function for host validation.
- Updated sanitize_url to use sanitize_host for extracting and reconstructing URLs.
- Refactored filter_data to use direct dictionary access for header sanitization (handles REFERER, HEADER_TOKEN, HEADER_TOKEN_OLD, HOST, X_FORWARDED_HOST).
tests/misc/test_filter_data.py - Added sample events SAMPLE_EVENT_AIOHTTP_INTERNAL and SAMPLE_EVENT_AIOHTTP_EXTERNAL.
- Renamed test_sanitize to test_sanitize_user_hostname and added test_sanitize_internal.
tests/misc/test_sanitise_url.py - Added test_sanitize_host function to verify sanitize_host behavior.
- Updated assertions in test_sanitize_url to reflect changes in expected outputs based on new sanitize_host logic.

Sequence Diagram(s)

sequenceDiagram
    participant C as Caller
    participant S as sanitize_url
    participant H as sanitize_host
    
    C->>S: Call sanitize_url(URL)
    S->>S: Validate URL using regex pattern
    S->>H: Call sanitize_host(extracted host)
    H-->>S: Return validated host or default
    S->>C: Return reconstructed URL
Loading
sequenceDiagram
    participant C as Caller
    participant F as filter_data
    
    C->>F: Call filter_data(headers)
    F->>F: Check & sanitize 'REFERER'
    F->>F: Check & sanitize 'HEADER_TOKEN'
    F->>F: Check & sanitize 'HEADER_TOKEN_OLD'
    F->>F: Check & sanitize 'HOST'
    F->>F: Check & sanitize 'X_FORWARDED_HOST'
    F-->>C: Return modified headers
Loading

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9945277 and fe30643.

📒 Files selected for processing (3)
  • supervisor/misc/filter.py (2 hunks)
  • tests/misc/test_filter_data.py (2 hunks)
  • tests/misc/test_sanitise_url.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • supervisor/misc/filter.py
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Build armv7 supervisor
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
  • GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (7)
tests/misc/test_sanitise_url.py (3)

3-3: New host sanitization function import aligned with implementation.

The addition of sanitize_host import reflects the implementation of a new function dedicated to host validation and sanitization, which is good modular design.


6-8: Clean and focused test for the new sanitize_host function.

The test verifies the core functionality of the new host sanitization mechanism by ensuring external hostnames are properly sanitized to the expected placeholder domain.


14-22: Test assertions properly updated to reflect new sanitization approach.

The updated assertions correctly verify that the sanitize_url function now uses the sanitize_host function to sanitize just the hostname part while preserving path and query parameters. This is a more robust approach than the previous implementation.

tests/misc/test_filter_data.py (4)

21-41: Good test fixture for internal network requests.

This detailed sample event properly models aiohttp requests originating from within the internal Docker network (172.30.32.x). The inclusion of various headers provides good test coverage for filtering internal requests.


42-74: Comprehensive test fixture for external requests with sensitive data.

This fixture thoroughly represents an external request with potential PII in hostname and headers. The inclusion of multiple header types that might contain the hostname (Host, Origin, Referer, X-Forwarded-Host) ensures robust testing of the sanitization process.


138-155: Properly renamed and focused test for user hostname sanitization.

The test has been appropriately renamed to be more specific about what it's testing. The assertions now verify that the user's hostname ("debian-supervised-dev.lan") is properly sanitized from URL and various headers, which directly addresses the main issue this PR aims to fix.


157-166: Important test case for verifying internal requests remain unchanged.

This new test ensures that requests from within the Docker network aren't unnecessarily sanitized, which is critical for maintaining proper functionality. The assertion that the filtered event equals the original event confirms this behavior.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddee79d and e47a8e8.

📒 Files selected for processing (1)
  • supervisor/misc/filter.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Run tests Python 3.13.2
  • GitHub Check: Build armv7 supervisor
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
🔇 Additional comments (4)
supervisor/misc/filter.py (4)

3-3: Good addition for IP address handling.

Adding the ipaddress module is necessary for the new host validation functionality.


10-10: Appropriate import addition for Docker network validation.

Including DOCKER_NETWORK_MASK is necessary for filtering internal vs. external hosts.


17-28: Well-implemented host sanitization function.

This function effectively identifies and preserves internal IP addresses within the Docker network while sanitizing external hosts. The try-except block properly handles non-IP hostnames.


32-39: Improved URL sanitization with contextual awareness.

The updated implementation is more intelligent than the previous approach. It now preserves internal hosts while sanitizing external ones, which maintains security while improving functionality.

Comment on lines 129 to 142
headers = event["request"].get("headers", {})
if isinstance(headers, dict):
if hdrs.REFERER in headers:
headers[hdrs.REFERER] = sanitize_url(headers[hdrs.REFERER])
if HEADER_TOKEN in headers:
headers[HEADER_TOKEN] = "XXXXXXXXXXXXXXXXXXX"
if HEADER_TOKEN_OLD in headers:
headers[HEADER_TOKEN_OLD] = "XXXXXXXXXXXXXXXXXXX"
if hdrs.HOST in headers:
headers[hdrs.HOST] = sanitize_host(headers[hdrs.HOST])
if hdrs.X_FORWARDED_HOST in headers:
headers[hdrs.X_FORWARDED_HOST] = sanitize_host(
headers[hdrs.X_FORWARDED_HOST]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fixed header handling to prevent exceptions.

This change directly addresses the PR objective by properly checking if headers is a dictionary before attempting to access it. The previous implementation likely caused exceptions when headers wasn't a dictionary, which is now handled correctly.

The code also improves header sanitization by:

  1. Sanitizing URLs in the REFERER header
  2. Masking token headers
  3. Sanitizing host headers using the new function

@agners agners changed the title Correctly handle aiohttp requests Correctly handle aiohttp requests in Sentry reporting Feb 27, 2025
@agners agners force-pushed the fix-aiohttp-sentry-reporting branch from e47a8e8 to 9da4dca Compare February 27, 2025 11:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
tests/misc/test_sanitise_url.py (3)

3-4: Update the docstring to match the actual module path.

The function docstrings mention "supervisor.utils" but the imports show the functions are in "supervisor.misc.filter". The docstrings should be updated to match the actual module path.


6-8: Add more test cases for sanitize_host.

The test only checks one case (external hostname). Consider adding test cases for:

  • Internal IP addresses that should not be sanitized
  • IP addresses outside the Docker network
  • Edge cases like localhost, empty strings, or None values
def test_sanitize_host():
    """Test supervisor.misc.filter.sanitize_host."""
    # External hostnames should be sanitized
    assert sanitize_host("my.duckdns.org") == "sanitized-host.invalid"
    # Internal IPs should not be sanitized
    assert sanitize_host("172.30.32.1") == "172.30.32.1"
    # Edge cases
    assert sanitize_host("localhost") == "localhost"
    assert sanitize_host("") == ""

14-22: Tests look good but could be more comprehensive.

The updated tests correctly verify that external hostnames are sanitized in various URL formats. Consider adding test cases for URLs with internal IP addresses that should not be sanitized.

tests/misc/test_filter_data.py (2)

157-166: Consider making internal test assertions more specific.

While the assertion that the filtered event equals the original event is valid, more specific assertions about what shouldn't be changed would make the test intent clearer and more robust against future changes.

-    assert filtered == event
+    # Verify that internal URLs and headers remain unchanged
+    assert filtered["request"]["url"] == event["request"]["url"]
+    assert filtered["request"]["headers"]["Host"] == event["request"]["headers"]["Host"]
+    # Verify the entire structure remains unchanged
+    assert filtered == event

21-74: Add test cases for error handling.

Given that the PR is addressing exceptions when unpacking headers, consider adding test cases for scenarios where headers are missing or have unexpected formats to ensure the error handling works correctly.

def test_sanitize_with_missing_headers(coresys):
    """Test event sanitation with missing headers."""
    # Create a copy of the event with incomplete headers
    event = SAMPLE_EVENT_AIOHTTP_EXTERNAL.copy()
    event["request"] = event["request"].copy()
    event["request"]["headers"] = {}  # Empty headers
    
    coresys.config.diagnostics = True
    coresys.core.state = CoreState.RUNNING
    
    with patch("shutil.disk_usage", return_value=(42, 42, 2 * (1024.0**3))):
        # This should not raise an exception
        filtered = filter_data(coresys, event, {})
    
    # Verify basic structure is maintained
    assert "request" in filtered
    assert "headers" in filtered["request"]

Also applies to: 138-166

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9da4dca and 9945277.

📒 Files selected for processing (2)
  • tests/misc/test_filter_data.py (2 hunks)
  • tests/misc/test_sanitise_url.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Build i386 supervisor
  • GitHub Check: Build armv7 supervisor
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
  • GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (2)
tests/misc/test_filter_data.py (2)

21-74: Well-structured test data for both internal and external scenarios.

The sample event dictionaries are comprehensive and provide good test cases for the filter function. They include realistic headers and URL patterns that would be encountered in production.


138-155: Test correctly verifies hostname sanitization in multiple headers.

The test properly checks that the user hostname is sanitized from the URL, Host header, Referer header, and X-Forwarded-Host header.

The request header seems to be a dictionary in current Sentry SDK.
The previous code actually failed with an exception when trying to
unpack the header. However, it seems that Exceptions are not handled
or printed in this filter function, so those issues were simply
swallowed.

The new code has been tested to correctly sanitize and report issues
during aiohttp requests.
@agners agners force-pushed the fix-aiohttp-sentry-reporting branch from 9945277 to fe30643 Compare February 27, 2025 14:46
@agners agners merged commit c5d4ebc into main Feb 27, 2025
20 checks passed
@agners agners deleted the fix-aiohttp-sentry-reporting branch February 27, 2025 14:54
@github-actions github-actions bot locked and limited conversation to collaborators Mar 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants