Skip to content

Add -v/--verbose flag for detailed HTTP request logging (issue #27)#48

Open
Tahoora-Tabassum wants to merge 24 commits intodbpedia:mainfrom
Tahoora-Tabassum:pr-verbose-clean
Open

Add -v/--verbose flag for detailed HTTP request logging (issue #27)#48
Tahoora-Tabassum wants to merge 24 commits intodbpedia:mainfrom
Tahoora-Tabassum:pr-verbose-clean

Conversation

@Tahoora-Tabassum
Copy link

@Tahoora-Tabassum Tahoora-Tabassum commented Jan 7, 2026

Description

This PR introduces a -v/--verbose CLI flag for the databusclient, enabling detailed HTTP request and response logging during interactions with Databus and Vault. This helps users debug network or authentication issues, trace requests, and inspect API communication without changing default behavior.

Key Features

  • Added top-level -v / --verbose CLI option for all commands.
  • Detailed HTTP logging via Python’s logging module (urllib3), active only when -v is used.
  • Default behavior remains silent; sensitive data is not exposed.
  • Tests updated to remove network dependency and validate verbose behavior.
  • Documentation updated to include verbose usage examples.

Related Issues

Resolves #27

Type of change

  • New feature (non-breaking)
  • Documentation update

Checklist

  • Code follows project style guidelines.
  • Self-reviewed code changes.
  • Tests added/updated for verbose behavior.
  • Documentation updated.
  • All unit tests pass (poetry run pytest).
  • No linting errors (poetry run ruff check).

Summary

  • New -v/--verbose flag for detailed HTTP/Vault request logging.
  • Tests adjusted to stabilize auth-related behavior.
  • CLI documentation updated with usage examples.

Summary by CodeRabbit

  • New Features

    • Added global -v/--verbose CLI flag enabling HTTP request/response logging with automatic redaction of sensitive headers
    • Added mkdist command to create distribution strings from URLs and content variant key-value pairs
    • Added shell completion instruction support for bash, zsh, fish, and powershell shells
  • Bug Fixes

    • Updated CLI flag naming convention for consistency

@coderabbitai
Copy link

coderabbitai bot commented Jan 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces a verbose -v/--verbose CLI flag enabling redacted HTTP request/response logging for databusclient. The feature propagates verbose mode to API calls, instruments download and Vault token exchange flows to emit HTTP details when verbose mode is active, and includes unit tests asserting logging and redaction behavior. Also adds new mkdist and completion CLI commands.

Changes

Cohort / File(s) Summary
Documentation & Metadata
PR_BODY.md
Documents the verbose logging feature implementation, redaction behavior, and security considerations.
CLI Commands
databusclient/cli.py
Adds mkdist command for creating distribution strings with CV validation, SHA-length parsing, and deterministic ordering; adds completion command for shell completion instructions.
Verbose Logging Tests
tests/test_cli_verbose.py, tests/test_utils_verbose.py, tests/test_download_auth.py
Comprehensive test coverage for verbose logging behavior: validates HTTP request/response logging in CLI and utility functions, ensures sensitive headers and tokens are redacted, covers Vault token exchange and SPARQL query flows.
CLI Feature Tests
tests/test_cli.py
Tests for mkdist command including CV sorting, validation of CV format and SHA-length format, and completion command output verification.
Test Configuration
test.sh
Updates CLI flag from --version-id to --versionid.
Minor Whitespace & Formatting
databusclient/api/download.py, tests/test_download.py
Whitespace-only changes around existing code paths; no functional modifications.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI User
    participant Client as Databus Client
    participant Logger as Logger
    participant HTTP as HTTP Client
    participant API as Databus API

    CLI->>Client: Invoke with -v flag
    Client->>Client: Enable verbose mode
    Client->>HTTP: Prepare HTTP request
    HTTP->>API: Send request (Authorization header)
    API-->>HTTP: Response
    HTTP-->>Logger: Log request/response details
    Logger->>Logger: Redact sensitive headers<br/>(Authorization, X-API-KEY)
    Logger-->>Client: Logging complete
    Client-->>CLI: Return result with verbose logs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #40: Modifies the Vault token exchange and download authentication flow that this PR instruments for verbose, redacted HTTP logging.
  • PR #12: Adds vault token exchange and extends download functionality that this PR augments with verbose logging and redaction instrumentation.

Suggested reviewers

  • Integer-Ctrl
🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Changes in cli.py include two new commands (mkdist and completion) unrelated to the verbose logging feature described in issue #27 and PR objectives. Remove mkdist and completion command implementations from this PR as they are out-of-scope for the verbose logging feature. Address them in separate PRs.
Docstring Coverage ⚠️ Warning Docstring coverage is 54.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a -v/--verbose flag for HTTP request logging and references issue #27.
Description check ✅ Passed The description covers key features, objectives, and includes related issue reference. Type of change is identified and checklist items are completed.
Linked Issues check ✅ Passed The PR implements issue #27's requirement to add a -v/--verbose CLI flag enabling detailed HTTP request/response logging for debugging Databus and Vault interactions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
README.md (1)

318-353: Resolve unresolved merge conflict markers.

The documentation contains unresolved merge conflict markers (<<<<<<< HEAD, =======, >>>>>>> upstream/main) that will break the rendered documentation and must be resolved before merging.

📍 Locations of merge conflicts

Additional conflict markers appear at:

  • Lines 370-385
  • Lines 421-438

Please resolve all conflicts by:

  1. Reviewing both versions of the conflicting sections
  2. Choosing the correct content or merging both appropriately
  3. Removing all conflict markers
  4. Ensuring documentation consistency
tests/test_download_auth.py (1)

1-13: Remove duplicate imports.

The file contains duplicate imports that should be consolidated:

  • patch is imported on both line 1 and line 11
  • DownloadAuthError is imported on both line 10 and line 13
🔧 Proposed fix
 from unittest.mock import Mock, patch
 
 import pytest
 
 import requests
 import logging
 
 import databusclient.api.download as dl
 
-from databusclient.api.download import VAULT_REQUIRED_HOSTS, DownloadAuthError
-from unittest.mock import patch
-
-from databusclient.api.download import download, DownloadAuthError
+from databusclient.api.download import VAULT_REQUIRED_HOSTS, DownloadAuthError, download
🤖 Fix all issues with AI agents
In @databusclient/api/utils.py:
- Around line 58-68: The _redact_headers function currently uses
key.startswith("x-api-key") which unintentionally matches headers like
"x-api-key-extra"; change that logic to check for an exact header name match
(compare key == "x-api-key") alongside the existing authorization check so only
the intended "X-API-KEY" header is redacted; update the condition in
_redact_headers to use equality for the lowercase key and keep the rest of the
function intact.

In @README.md:
- Around line 524-564: The README contains duplicated "mkdist command" and
"Completion" sections; remove the second duplicate block (the repeated "mkdist
command" and its following "Completion" snippet) so only the first occurrence
remains, ensuring the remaining section titles "mkdist command" and "Completion"
are preserved exactly.

In @test.sh:
- Line 4: Replace the incorrect CLI flag usage `--versionid` in test.sh with the
correct `--version-id` to match the flag defined in databusclient/cli.py; also
search README.md for any leftover merge-conflict or inconsistent occurrences of
`--versionid`/`--version-id` and normalize them to `--version-id` so docs and
tests use the exact flag name declared in the CLI.
🧹 Nitpick comments (6)
README.md (1)

323-327: Add language specifiers to code blocks.

Several code blocks are missing language specifiers for proper syntax highlighting. This is flagged by markdownlint and reduces documentation quality.

♻️ Suggested improvement

Add appropriate language identifiers to code blocks. For example:

-```
+```bash
 databusclient deploy --versionid https://databus.dbpedia.org/...

Additional locations missing language specifiers:
- Lines 529, 534, 541, 550, 555, 562

Use `bash` or `shell` for command-line examples.

</details>

</blockquote></details>
<details>
<summary>databusclient/api/utils.py (1)</summary><blockquote>

`71-71`: **Move import to top of file.**

The `import logging` statement should be at the top of the file with other imports, per PEP 8 conventions.


<details>
<summary>Proposed fix</summary>

Move the import to the top of the file:

```diff
 from typing import Optional, Tuple
 
 import requests
+import logging

And remove line 71.

databusclient/cli.py (1)

274-277: Consider using Click's default parameter for consistency.

The default value "bash" is specified in the function signature rather than in the click.argument decorator. While this works, using Click's default parameter would be more idiomatic and would show the default in help text.

Proposed fix
 @app.command()
-@click.argument("shell", type=click.Choice(["bash","zsh","fish","powershell"]), required=False)
-def completion(shell="bash"):
+@click.argument("shell", type=click.Choice(["bash","zsh","fish","powershell"]), default="bash")
+def completion(shell):
     click.echo(f"Run: eval \"$(_DATABUSCLIENT_COMPLETE=source_{shell} python -m databusclient)\"")
databusclient/api/download.py (2)

75-83: Consider importing log_http at module level to reduce repetition.

The from databusclient.api.utils import log_http statement is repeated inside each conditional block. Moving it to the module-level imports would reduce duplication and slightly improve performance by avoiding repeated import lookups.

Proposed fix

Add to the imports at the top:

 from databusclient.api.utils import (
     fetch_databus_jsonld,
     get_databus_id_parts_from_file_url,
+    log_http,
 )

Then remove all the inline from databusclient.api.utils import log_http statements throughout the file.


304-311: Hardcoded status=200 in SPARQL logging may not reflect actual status.

The log_http call after the SPARQL query uses a hardcoded status=200. While this is generally true on success (since exceptions would be raised on failure), it's not guaranteed to be accurate. SPARQLWrapper doesn't easily expose the actual HTTP status code.

Consider either:

  • Omitting the status parameter for the response log
  • Adding a comment explaining the assumption
Proposed fix (omit status)
     if verbose:
         from databusclient.api.utils import log_http

-        log_http("POST", endpoint_url, req_headers={"X-API-KEY": databus_key} if databus_key is not None else None, status=200)
+        # Note: actual status not available from SPARQLWrapper; logged as successful since no exception was raised
+        log_http("POST", endpoint_url, req_headers={"X-API-KEY": databus_key} if databus_key is not None else None)
tests/test_cli_verbose.py (1)

27-38: Patch targets could be more isolated for better test robustness.

The current patches at requests.head and requests.get work correctly, since the test invokes only the download command and all mocked responses are configured. However, the patches affect the entire requests module globally. Since download.py uses requests directly and also calls utils.py (which also imports requests), patching at databusclient.api.download.requests.head and databusclient.api.download.requests.get would provide better test isolation and avoid unintended interactions if other modules using requests are imported in the future.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9901fca and faadd58.

📒 Files selected for processing (12)
  • CHANGELOG.md
  • PR_BODY.md
  • README.md
  • databusclient/api/download.py
  • databusclient/api/utils.py
  • databusclient/cli.py
  • file.txt
  • test.sh
  • tests/test_cli.py
  • tests/test_cli_verbose.py
  • tests/test_download_auth.py
  • tests/test_utils_verbose.py
🧰 Additional context used
🧬 Code graph analysis (5)
databusclient/cli.py (1)
databusclient/api/deploy.py (1)
  • create_distribution (176-207)
tests/test_cli.py (1)
databusclient/cli.py (1)
  • app (18-32)
tests/test_cli_verbose.py (2)
tests/test_download_auth.py (1)
  • iter_content (22-26)
databusclient/cli.py (1)
  • app (18-32)
tests/test_utils_verbose.py (3)
databusclient/api/download.py (3)
  • download (669-797)
  • _get_sparql_query_of_collection (249-276)
  • _query_sparql_endpoint (279-311)
databusclient/api/utils.py (1)
  • fetch_databus_jsonld (33-55)
tests/conftest.py (1)
  • convert (24-25)
databusclient/api/download.py (1)
databusclient/api/utils.py (2)
  • log_http (74-103)
  • fetch_databus_jsonld (33-55)
🪛 markdownlint-cli2 (0.18.1)
README.md

323-323: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


327-327: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


529-529: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


534-534: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


541-541: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


550-550: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


555-555: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


559-559: Multiple headings with the same content

(MD024, no-duplicate-heading)


562-562: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (17)
CHANGELOG.md (1)

1-7: LGTM!

The changelog entry clearly documents the new verbose feature, header redaction behavior, and security considerations. The format is standard and informative.

PR_BODY.md (1)

1-19: LGTM!

The PR description is well-structured and comprehensive. It clearly explains the feature, changes, motivation, and security considerations.

README.md (1)

185-186: LGTM!

The verbose flag documentation is clear and appropriately includes security warnings about redacted headers and avoiding verbose in public CI logs.

tests/test_download_auth.py (1)

110-137: LGTM!

The test effectively validates the verbose logging feature with proper token redaction. The test setup appropriately mocks the HTTP auth flow, and assertions correctly verify that:

  1. HTTP logging is active when verbose is enabled
  2. Sensitive headers are redacted
  3. Actual token values don't leak into logs
tests/test_cli.py (4)

5-19: LGTM! Good test coverage for mkdist happy path.

The test correctly validates:

  • Multiple --cv options are accepted
  • Keys are sorted alphabetically in output (a=1_b=2 not b=2_a=1)
  • All components (format, compression, sha-length) are correctly assembled

22-26: LGTM! Validates error handling for malformed content variants.


29-35: LGTM! Validates sha-length format enforcement.


38-42: LGTM! Basic validation for shell completion output.

tests/test_utils_verbose.py (4)

12-22: LGTM! Well-structured mock response helper.

The make_response helper correctly simulates a requests.Response object with status_code, headers, text, and raise_for_status behavior.


25-34: LGTM! Validates API key redaction in verbose logging.

The test correctly verifies that:

  • HTTP activity is logged with [HTTP] GET
  • Sensitive data is replaced with REDACTED
  • The actual secret value SECRET never appears in logs

38-47: LGTM! Validates redaction in SPARQL collection query flow.


51-76: LGTM! Good mock implementation for SPARQLWrapper testing.

The MockSPARQL class adequately simulates the SPARQLWrapper interface to test verbose logging around SPARQL POST operations.

databusclient/api/utils.py (1)

74-103: LGTM! Robust HTTP logging with graceful fallbacks.

The log_http function handles various response header formats defensively, which is appropriate for logging utilities that shouldn't break on unexpected input types. The fallback chain for handling Mock objects and different response types is well thought out.

databusclient/cli.py (2)

15-32: LGTM! Well-implemented verbose flag with proper logging setup.

The verbose flag is correctly:

  • Defined at the group level for global availability
  • Stored in Click context for propagation to subcommands
  • Configured with a handler only if none exists (prevents duplicate handlers)
  • Set to DEBUG level when enabled

234-271: LGTM! Well-structured mkdist command with proper validation.

The command properly:

  • Validates content variant format and forbidden characters
  • Prevents duplicate keys
  • Validates sha-length with a strict regex pattern
  • Ensures deterministic output via sorted keys
  • Supports both plain string and JSON output formats
databusclient/api/download.py (2)

9-11: LGTM! Module-level logger setup.

The logger is correctly configured at the module level, allowing consistent logging behavior across all functions.


669-797: LGTM! Consistent verbose flag propagation throughout download flows.

The verbose parameter is correctly accepted by the main download() function and propagated to all sub-functions across all code paths (collections, files, versions, artifacts, groups, and SPARQL queries).

Comment on lines 58 to 68
def _redact_headers(headers):
if not headers:
return headers
redacted = {}
for k, v in headers.items():
key = k.lower()
if key == "authorization" or key.startswith("x-api-key"):
redacted[k] = "REDACTED"
else:
redacted[k] = v
return redacted
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Issue with x-api-key header matching logic.

The condition key.startswith("x-api-key") will match x-api-key exactly but also any header starting with that prefix (e.g., x-api-key-extra). This is likely unintended. Additionally, the header name is X-API-KEY (exact match), so startswith seems overly broad.

Proposed fix
 def _redact_headers(headers):
     if not headers:
         return headers
     redacted = {}
     for k, v in headers.items():
         key = k.lower()
-        if key == "authorization" or key.startswith("x-api-key"):
+        if key == "authorization" or key == "x-api-key":
             redacted[k] = "REDACTED"
         else:
             redacted[k] = v
     return redacted
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _redact_headers(headers):
if not headers:
return headers
redacted = {}
for k, v in headers.items():
key = k.lower()
if key == "authorization" or key.startswith("x-api-key"):
redacted[k] = "REDACTED"
else:
redacted[k] = v
return redacted
def _redact_headers(headers):
if not headers:
return headers
redacted = {}
for k, v in headers.items():
key = k.lower()
if key == "authorization" or key == "x-api-key":
redacted[k] = "REDACTED"
else:
redacted[k] = v
return redacted
🤖 Prompt for AI Agents
In @databusclient/api/utils.py around lines 58 - 68, The _redact_headers
function currently uses key.startswith("x-api-key") which unintentionally
matches headers like "x-api-key-extra"; change that logic to check for an exact
header name match (compare key == "x-api-key") alongside the existing
authorization check so only the intended "X-API-KEY" header is redacted; update
the condition in _redact_headers to use equality for the lowercase key and keep
the rest of the function intact.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
README.md (3)

324-359: Critical: Unresolved merge conflict markers must be removed.

The README contains unresolved merge conflict markers (<<<<<<<, =======, >>>>>>>). These must be resolved before the PR can be merged.

Please resolve the conflicts by:

  1. Choosing the correct version of the content
  2. Removing all conflict markers
  3. Ensuring consistent formatting and examples

376-391: Critical: Additional unresolved merge conflict markers.

More merge conflict markers present in this section. Please resolve these conflicts and ensure the documentation is consistent throughout.


427-444: Critical: More unresolved merge conflict markers.

This section also contains unresolved merge conflict markers. All merge conflicts must be resolved before merging.

🤖 Fix all issues with AI agents
In @README.md:
- Around line 530-570: Remove the duplicated "### mkdist command" and "##
Completion" sections so only one copy of each remains; locate the repeated
headings ("mkdist command" and "Completion") and delete the second occurrence
(the block starting at the duplicate heading), preserving the first instance
with its Usage, Example, and code blocks; also ensure fenced code blocks are
marked with the language (e.g., ```bash or ```text/```python as appropriate) to
satisfy markdownlint and keep formatting consistent.
- Line 310: The README has an inconsistent CLI option name `--versionid`; change
it to the hyphenated `--version-id` to match the CLI code and other
examples—update the example line that currently shows `--versionid TEXT` to
`--version-id TEXT` and verify any surrounding examples or references use the
same `--version-id` token for consistency.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faadd58 and a6bb139.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • README.md
  • databusclient/api/utils.py
  • databusclient/cli.py
  • tests/test_download_auth.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • databusclient/api/utils.py
  • CHANGELOG.md
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/cli.py (2)
databusclient/api/download.py (1)
  • download (669-797)
databusclient/api/deploy.py (1)
  • create_distribution (218-249)
tests/test_download_auth.py (2)
tests/test_utils_verbose.py (1)
  • make_response (12-22)
databusclient/api/download.py (1)
  • _download_file (31-213)
🪛 markdownlint-cli2 (0.18.1)
README.md

329-329: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


333-333: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


535-535: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


540-540: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


547-547: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


556-556: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


561-561: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


565-565: Multiple headings with the same content

(MD024, no-duplicate-heading)


568-568: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (6)
tests/test_download_auth.py (1)

107-134: LGTM! Well-structured test for verbose logging and redaction.

The test properly validates that:

  • HTTP operations are logged when verbose is enabled
  • Sensitive authorization tokens are redacted ("REDACTED" appears)
  • Actual token values ("ACCESS", "VAULT") do not leak into logs

This aligns with the PR objectives and provides good coverage for the verbose feature.

databusclient/cli.py (4)

16-36: LGTM! Well-implemented global verbose flag.

The verbose flag is properly integrated as a global CLI option with appropriate logging configuration:

  • Context passing is correctly set up for downstream commands
  • Logger configuration is conditional and defensive (checks for existing handlers)
  • Log level appropriately set to DEBUG when verbose is enabled

182-205: LGTM! Correct context passing for verbose flag.

The download command properly receives and propagates the verbose flag from the global context to the underlying API call. The use of .get("verbose", False) provides a safe default.


232-271: LGTM! Robust implementation of mkdist command.

The command includes thorough validation:

  • Content variant format and key constraints are properly enforced
  • SHA-256 validation regex is correct (64 hex chars + ':' + length)
  • Deterministic ordering via sorted keys prevents ambiguity
  • Clear error messages for invalid inputs

The implementation aligns well with the underlying create_distribution API.


274-277: LGTM! Simple and effective completion command.

The completion command provides helpful instructions for enabling shell completion across multiple shells. Relies on Click's built-in completion support.

README.md (1)

191-192: LGTM! Clear documentation of verbose flag.

The documentation properly explains the verbose flag functionality and includes an appropriate warning about avoiding its use in public CI logs due to the sensitive nature of HTTP request/response details.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
databusclient/api/download.py (7)

145-166: Critical: verbose parameter missing from function signature but used in the body.

The docstring documents a verbose parameter (line 165), and the function body references verbose at multiple locations (lines 190, 195, 239, 246, 275, 280, 285), but verbose is not in the function signature. This will raise a NameError at runtime.

🐛 Proposed fix: add verbose parameter to signature
 def _download_file(
     url,
     localDir,
     vault_token_file=None,
     databus_key=None,
     auth_url=None,
     client_id=None,
     validate_checksum: bool = False,
     expected_checksum: str | None = None,
+    verbose: bool = False,
 ) -> None:

354-389: Critical: verbose parameter missing from _download_files signature.

The docstring documents verbose (line 374), but the parameter is not in the function signature. Additionally, this function should propagate verbose to _download_file calls.

🐛 Proposed fix
 def _download_files(
     urls: List[str],
     localDir: str,
     vault_token_file: str = None,
     databus_key: str = None,
     auth_url: str = None,
     client_id: str = None,
     validate_checksum: bool = False,
     checksums: dict | None = None,
+    verbose: bool = False,
 ) -> None:

And update the _download_file call:

         _download_file(
             url=url,
             localDir=localDir,
             vault_token_file=vault_token_file,
             databus_key=databus_key,
             auth_url=auth_url,
             client_id=client_id,
             validate_checksum=validate_checksum,
             expected_checksum=expected,
+            verbose=verbose,
         )

561-603: Critical: verbose parameter missing from _download_collection signature but used in body.

The docstring mentions verbose (line 582), and the variable is used on lines 584 and 586, but the signature on lines 561-570 lacks the parameter.

🐛 Proposed fix
 def _download_collection(
     uri: str,
     endpoint: str,
     localDir: str,
     vault_token: str = None,
     databus_key: str = None,
     auth_url: str = None,
     client_id: str = None,
-    validate_checksum: bool = False
+    validate_checksum: bool = False,
+    verbose: bool = False,
 ) -> None:

606-645: Critical: verbose parameter missing from _download_version signature.

The docstring mentions verbose (line 625), and it's used on line 627, but the signature lacks the parameter.

🐛 Proposed fix
 def _download_version(
     uri: str,
     localDir: str,
     vault_token_file: str = None,
     databus_key: str = None,
     auth_url: str = None,
     client_id: str = None,
-    validate_checksum: bool = False,
+    validate_checksum: bool = False,
+    verbose: bool = False,
 ) -> None:

648-695: Critical: verbose parameter missing from _download_artifact signature.

The docstring mentions verbose (line 669), and it's used on lines 671 and 677, but the signature lacks the parameter.

🐛 Proposed fix
 def _download_artifact(
     uri: str,
     localDir: str,
     all_versions: bool = False,
     vault_token_file: str = None,
     databus_key: str = None,
     auth_url: str = None,
     client_id: str = None,
-    validate_checksum: bool = False,
+    validate_checksum: bool = False,
+    verbose: bool = False,
 ) -> None:

762-798: Critical: verbose parameter missing from _download_group signature.

The docstring mentions verbose (line 783), and it's used on line 785, but the signature lacks the parameter. Also, _download_artifact calls need to propagate verbose.

🐛 Proposed fix
 def _download_group(
     uri: str,
     localDir: str,
     all_versions: bool = False,
     vault_token_file: str = None,
     databus_key: str = None,
     auth_url: str = None,
     client_id: str = None,
-    validate_checksum: bool = False,
+    validate_checksum: bool = False,
+    verbose: bool = False,
 ) -> None:

And propagate to _download_artifact:

         _download_artifact(
             artifact_uri,
             localDir,
             all_versions=all_versions,
             vault_token_file=vault_token_file,
             databus_key=databus_key,
             auth_url=auth_url,
             client_id=client_id,
             validate_checksum=validate_checksum,
+            verbose=verbose,
         )

834-859: Critical: verbose parameter missing from download function signature.

The docstring mentions verbose (line 858), and it's used on line 969, but the signature lacks the parameter. This is the public API entry point, so it must accept verbose from the CLI.

🐛 Proposed fix
 def download(
     localDir: str,
     endpoint: str,
     databusURIs: List[str],
     token=None,
     databus_key=None,
     all_versions=None,
     auth_url="https://auth.dbpedia.org/realms/dbpedia/protocol/openid-connect/token",
     client_id="vault-token-exchange",
-    validate_checksum: bool = False
+    validate_checksum: bool = False,
+    verbose: bool = False,
 ) -> None:

Then propagate verbose to all internal function calls throughout the function body.

databusclient/cli.py (1)

151-215: Critical: download command missing @click.pass_context decorator and not propagating verbose.

The function signature includes ctx (line 188), but the @click.pass_context decorator is missing. This will cause a runtime error. Additionally, verbose from ctx.obj is not passed to api_download.

🐛 Proposed fix
 @app.command()
+@click.pass_context
 @click.argument("databusuris", nargs=-1, required=True)
 @click.option(
     "--localdir",

And pass verbose to the API call:

+    verbose = ctx.obj.get("verbose", False)
     try:
         api_download(
             localDir=localdir,
             endpoint=databus,
             databusURIs=databusuris,
             token=vault_token,
             databus_key=databus_key,
             all_versions=all_versions,
             auth_url=authurl,
             client_id=clientid,
-            validate_checksum=validate_checksum
+            validate_checksum=validate_checksum,
+            verbose=verbose,
         )
🤖 Fix all issues with AI agents
In @databusclient/api/download.py:
- Around line 449-452: The verbose logging currently calls log_http("POST",
endpoint_url, ..., status=200) after sparql.query().convert(), which is
misleading because SPARQLWrapper does not expose an HTTP status code; remove the
hardcoded status argument (or pass None) and add a short inline comment in the
verbose block noting that SPARQLWrapper abstracts HTTP status so none is
available; update the log_http call in the verbose branch that references
databus_key and endpoint_url (and the surrounding if verbose block) accordingly.

In @databusclient/api/utils.py:
- Around line 59-71: Add the missing import for log_http at the top of the
module (e.g., from databusclient.api.logging_utils import log_http or adjust the
path to match where log_http is defined) so calls to log_http() in the GET
request flow (the two calls around requests.get) no longer raise NameError; also
insert a single blank line before the def compute_sha256_and_length function to
satisfy PEP8 spacing.
🧹 Nitpick comments (3)
databusclient/api/utils.py (1)

73-75: Missing blank line between function definitions (PEP8 E302).

There should be two blank lines before the compute_sha256_and_length function definition per PEP8 style guidelines.

✏️ Proposed fix
     response.raise_for_status()
     return response.text
+
+
 def compute_sha256_and_length(filepath):
databusclient/api/download.py (1)

190-198: Repeated inline imports of log_http within the same function.

The from databusclient.api.utils import log_http statement is repeated multiple times within _download_file. While functionally correct (Python caches imports), this adds noise and is unconventional.

✏️ Suggestion: move import to function top or module level

Either import once at the top of the function:

def _download_file(...):
    from databusclient.api.utils import log_http
    # ... rest of function

Or import at the module level (preferred if used across multiple functions):

from databusclient.api.utils import log_http

Also applies to: 239-249, 280-288

databusclient/cli.py (1)

281-284: Fix the Click 8.x shell completion environment variable format in the completion command.

The current implementation uses an incorrect environment variable format for Click 8.x. The format should be {shell}_source (e.g., bash_source), not source_{shell}. Additionally, since the project defines a console_script entry point (databusclient), the instruction can invoke it directly rather than via python -m.

For the immediate fix, update line 284 to:

Minimal fix
click.echo(f"# Add this to your shell config:\neval \"$(_DATABUSCLIENT_COMPLETE=bash_source databusclient)\"")

Or more robustly:

click.echo(f"# Add this to your shell config:\neval \"$(_DATABUSCLIENT_COMPLETE={shell}_source databusclient)\"")

For a more complete implementation that actually outputs the completion script (allowing users to pipe/source it directly), use the subprocess approach to invoke the CLI with the completion environment variable set.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6bb139 and 467bce1.

📒 Files selected for processing (3)
  • databusclient/api/download.py
  • databusclient/api/utils.py
  • databusclient/cli.py
🧰 Additional context used
🧬 Code graph analysis (3)
databusclient/cli.py (1)
databusclient/api/deploy.py (1)
  • create_distribution (218-249)
databusclient/api/utils.py (2)
tests/test_download_auth.py (1)
  • raise_for_status (27-29)
tests/test_utils_verbose.py (1)
  • raise_for_status (18-20)
databusclient/api/download.py (1)
databusclient/api/utils.py (1)
  • fetch_databus_jsonld (40-74)
🔇 Additional comments (1)
databusclient/cli.py (1)

239-278: LGTM! The mkdist command implementation is well-structured.

Good validation of content-variant keys (lines 253-254), duplicate detection (lines 255-256), SHA-length format validation with regex (lines 261-262), and deterministic ordering via sorted keys (line 266).

Comment on lines 449 to 452
if verbose:
from databusclient.api.utils import log_http

log_http("POST", endpoint_url, req_headers={"X-API-KEY": databus_key} if databus_key is not None else None, status=200)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inaccurate status code in verbose logging for SPARQL query.

The log_http call hardcodes status=200 but this is called after sparql.query().convert() which could fail or return other status codes. The actual status isn't available from SPARQLWrapper, so either omit the status or note this limitation.

💡 Suggestion

Consider logging without the status code since SPARQLWrapper abstracts it away, or add a comment explaining the limitation:

     if verbose:
         from databusclient.api.utils import log_http
-
-        log_http("POST", endpoint_url, req_headers={"X-API-KEY": databus_key} if databus_key is not None else None, status=200)
+        # Note: actual status unknown; SPARQLWrapper abstracts HTTP layer
+        log_http("POST", endpoint_url, req_headers={"X-API-KEY": databus_key} if databus_key is not None else None)
🤖 Prompt for AI Agents
In @databusclient/api/download.py around lines 449 - 452, The verbose logging
currently calls log_http("POST", endpoint_url, ..., status=200) after
sparql.query().convert(), which is misleading because SPARQLWrapper does not
expose an HTTP status code; remove the hardcoded status argument (or pass None)
and add a short inline comment in the verbose block noting that SPARQLWrapper
abstracts HTTP status so none is available; update the log_http call in the
verbose branch that references databus_key and endpoint_url (and the surrounding
if verbose block) accordingly.

Comment on lines 59 to 71
if verbose:
log_http("GET", uri, req_headers=headers)

response = requests.get(uri, headers=headers, timeout=30)
response.raise_for_status()

return response.text
if verbose:
log_http(
"GET",
uri,
req_headers=headers,
status=response.status_code,
resp_headers=response.headers,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find where log_http is defined and where it's imported
rg -n "def log_http" --type=py
echo "---"
rg -n "from.*import.*log_http|import.*log_http" --type=py

Repository: dbpedia/databus-python-client

Length of output: 1128


🏁 Script executed:

# Check the imports section of the specific file
head -30 databusclient/api/utils.py

Repository: dbpedia/databus-python-client

Length of output: 797


🏁 Script executed:

cat -n databusclient/api/utils.py

Repository: dbpedia/databus-python-client

Length of output: 3039


log_http is used but not imported, causing NameError when verbose=True.

The function calls log_http() at lines 60 and 65–71, but there's no import statement for it. This will raise a NameError at runtime when verbose mode is enabled. Additionally, log_http is not defined in this file, yet other modules (e.g., databusclient/api/download.py) attempt to import it from utils.py.

Add the import statement to the top of the file:

from databusclient.api.logging_utils import log_http

(adjust the import path if log_http is defined elsewhere)

Also, add a blank line before line 75 (def compute_sha256_and_length) to comply with PEP 8.

🤖 Prompt for AI Agents
In @databusclient/api/utils.py around lines 59 - 71, Add the missing import for
log_http at the top of the module (e.g., from databusclient.api.logging_utils
import log_http or adjust the path to match where log_http is defined) so calls
to log_http() in the GET request flow (the two calls around requests.get) no
longer raise NameError; also insert a single blank line before the def
compute_sha256_and_length function to satisfy PEP8 spacing.

Integer-Ctrl and others added 19 commits February 9, 2026 20:34
* feat: databus api key for downloading

* refactored README.md

* feat: cli delete to delete datasets from databus
… (Issue dbpedia#19) (dbpedia#40)

* Restrict Vault token exchange to specific hosts; improve auth errors; add tests (fixes dbpedia#19)

* Restrict Vault token exchange to specific hosts; improve auth errors; add tests and docs note (fixes dbpedia#19)

* Fix vault redirect  check (dbpedia#19)

---------

Co-authored-by: DhanashreePetare <dhanashreepetare8125@gmail.com>
…EY headers; propagate verbose through download flows; add tests and docs
…s; add tests (fixes dbpedia#19)

* Restrict Vault token exchange to specific hosts; improve auth errors; add tests and docs note (fixes dbpedia#19)

* Release 0.15: bump version, add changelog, docstrings(issue dbpedia#35)

* Prepare PyPI release 0.15 with build artifacts and publishing guide (Issue dbpedia#35)

* Convert all docstrings to Google-style format

---------

Co-authored-by: DhanashreePetare <dhanashreepetare8125@gmail.com>
Co-authored-by: Fabian Hofer <57919013+Integer-Ctrl@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_deploy.py (1)

74-81: ⚠️ Potential issue | 🟡 Minor

get_file_info called with wrong arity (2 args instead of 1).

Even though this test is skipped, the call get_file_info(artifact_name, dst_string) passes two arguments, but the current signature in databusclient/api/deploy.py line 195 is get_file_info(distribution_str: str) — a single positional parameter. Additionally, the old return value was a 6-tuple (including name), but the new function returns a 5-tuple. The destructuring on lines 74-81 still expects 6 values.

When this test is re-enabled, it will fail with TypeError. Please update or remove this stale test to match the new API surface.

Proposed fix sketch
         (
-            name,
             cvs,
             formatExtension,
             compression,
             sha256sum,
             content_length,
-        ) = get_file_info(artifact_name, dst_string)
+        ) = get_file_info(dst_string)
🤖 Fix all issues with AI agents
In @.github/workflows/python-CI.yml:
- Line 26: Replace the unmaintained GitHub Action usage "Gr1N/setup-poetry@v8"
in the workflow with the actively maintained "snok/install-poetry" action:
update the step that currently uses Gr1N/setup-poetry to use snok/install-poetry
(e.g., "uses: snok/install-poetry@vX") and migrate any inputs to the new
action's expected keys (for example use "poetry-version" or "version" per
snok/install-poetry docs) so the workflow installs the desired Poetry version
and behaves the same as the previous step.

In `@databusclient/api/delete.py`:
- Around line 242-263: The loop in _delete_group is failing when
"databus:hasArtifact" is a single dict because iterating a dict yields keys;
normalize the artifacts value right after artifacts =
json_dict.get("databus:hasArtifact", []) to always be a list (if it's a dict,
wrap it in [dict]; if it's None, set to []), and when extracting the artifact
URI handle both dict items (item.get("@id")) and string items (use item
directly) before calling get_databus_id_parts_from_file_url; update references
to _delete_group, fetch_databus_jsonld, get_databus_id_parts_from_file_url,
_delete_artifact, and _delete_resource accordingly.
- Around line 19-67: DeleteQueue currently uses a set (self.queue) which loses
insertion order and can delete parents before children; change self.queue to an
ordered collection (e.g., use a dict as an ordered set or a list plus explicit
deduplication) so insertion order is preserved, update add_uri and add_uris to
avoid duplicates (e.g., only append if not already present or set
dict[key]=None), and keep execute calling _delete_list(list(self.queue),
self.databus_key, force=True) so deletions occur in insertion order; ensure type
hints for self.queue are updated accordingly and behaviour of
is_empty/is_not_empty remains correct.

In `@databusclient/api/deploy.py`:
- Around line 56-59: The current parsing of key/value pairs using kv.split("=")
in deploy.py can throw ValueError when values contain '='; change the split call
to only split on the first '=' (use kv.split("=", 1)) when assigning key and
value so that values with '=' (e.g., base64) are preserved, then continue adding
them to cvs (and optionally strip() key/value to remove surrounding whitespace).
- Around line 186-192: The current code raises
requests.exceptions.RequestException(response=resp) which doesn't set e.response
and is non-idiomatic; replace that error handling with resp.raise_for_status()
to raise an HTTPError with the response attached, and compute the hash using
resp.content directly (remove the redundant bytes(resp.content) wrapper). Update
the block around requests.get(url, timeout=30), resp.status_code check,
sha256sum = hashlib.sha256(...).hexdigest(), and content_length =
len(resp.content) accordingly.

In `@databusclient/api/download.py`:
- Around line 315-318: The code directly indexes resp.json()["access_token"]
which raises an unhelpful KeyError when the auth server returns an error
payload; update both occurrences (the assignment using access_token =
resp.json()["access_token"] and the similar access at the later token-exchange)
to parse resp.json() once, check for the "access_token" key (or use
.get("access_token")), and if missing raise a clear exception that includes the
full response body/status (e.g., include resp.status_code and resp.text or the
parsed error JSON) and a contextual message indicating the token exchange failed
so callers can diagnose auth errors.
- Around line 106-168: The streamed requests.get(...) responses are never closed
on the auth-error branches causing socket/resource leaks; ensure the initial
response and the retry response are always closed before raising
DownloadAuthError or re-raising HTTP errors by either wrapping the calls in a
context manager (with requests.get(...) as response:) or calling
response.close() in each error branch that raises (referencing the response
variable and the retry response after the Authorization header is set), and
apply the same close-before-raise logic before response.raise_for_status() to
guarantee no leaked connections; use the existing __get_vault_access__ and
DownloadAuthError symbols to find the auth handling logic to update.
- Around line 316-328: Replace the manual protocol-strip and split logic with
urlparse: call parsed = urlparse(download_url) and set audience = parsed.netloc
(which preserves host[:port] like the original split behavior); update the code
in the same block where download_url, access_token, and audience are handled
(use the existing urlparse import).
- Around line 46-56: The code in download.py assigns localDir using values from
get_databus_id_parts_from_file_url and passes them directly to os.path.join,
which will raise a TypeError if any of account/group/artifact/version are None;
modify the localDir construction in the block where localDir is None to first
validate the tuple returned by get_databus_id_parts_from_file_url (function
name) and normalize missing parts (e.g., replace None with a safe default like
"unknown" or omit None segments) or build the path by filtering out None values
before calling os.path.join, ensuring version still defaults to "latest" when
None.
- Around line 582-591: The download() function signature is missing the verbose
parameter expected by the CLI; update the download function (download) to accept
verbose (e.g. verbose: bool = False) in its parameter list and thread that
parameter through any internal helper calls that require it (or accept **kwargs
and ignore it) so the CLI call with verbose=ctx.obj.get("verbose", False) no
longer raises a TypeError; ensure the function definition in
databusclient.api.download includes the new parameter and that any places
calling internal helpers (or returning results) handle or forward verbose
appropriately.

In `@databusclient/api/utils.py`:
- Around line 54-64: The function fetch_databus_jsonld references verbose but
doesn't define it; update fetch_databus_jsonld to accept a verbose: bool = False
parameter (e.g., add verbose to the function signature), use that parameter for
the conditional log_http calls, and ensure callers/tests that pass verbose=True
still work; no module-level global is needed—keep log_http, databus_key,
headers, and response handling unchanged.

In `@databusclient/cli.py`:
- Around line 264-267: The completion command's shell argument can be None
because Click ignores the Python default; update the completion function
(completion) to handle a missing shell by normalizing None to a sensible default
(e.g., "bash") before building the echo string — for example set shell = shell
or "bash" at the start of completion, then use that normalized variable in the
click.echo call that constructs the eval string.
- Around line 15-22: The CLI group function app() currently lacks a verbose flag
and context setup, causing ctx.obj to be None when download reads
ctx.obj.get("verbose"); fix by adding a Click option and context handling:
decorate the app function with `@click.pass_context` and a
`@click.option`("-v","--verbose", is_flag=True, help=...), add a verbose parameter
to app(ctx, verbose), call ctx.ensure_object(dict) (or set ctx.obj = {}) and
store ctx.obj["verbose"] = verbose so downstream commands like download can
safely use ctx.obj.get("verbose", False).

In `@tests/test_download.py`:
- Around line 26-27: The test test_with_query currently calls
api_download("tmp", DEFAULT_ENDPOINT, [TEST_QUERY]) against the live SPARQL
endpoint; add a skip marker or mock the network to prevent hitting the external
service: either decorate test_with_query with
`@pytest.mark.skip`(reason="Integration test: requires live databus.dbpedia.org
connection") placed immediately above the test, or modify tests/conftest.py or
the test to mock SPARQLWrapper/api_download network calls (e.g., monkeypatch the
api_download call or stub SPARQLWrapper) so the test does not perform real HTTP
requests.

In `@tests/test_utils_verbose.py`:
- Around line 25-34: Tests pass verbose=True but the function signatures and
implementations don't accept or use it: add an optional verbose: bool = False
parameter to utils.fetch_databus_jsonld, datastore loader method
_get_sparql_query_of_collection, and _query_sparql_endpoint (keep default False
to preserve existing API), fix fetch_databus_jsonld to reference the local
verbose parameter instead of a bare name, propagate verbose through calls
(fetch_databus_jsonld -> _get_sparql_query_of_collection ->
_query_sparql_endpoint) and implement verbose logging in the latter two to emit
the same redacted "[HTTP] GET" messages when verbose is true (redact databus_key
values and ensure databus_key is accepted and used for redaction in log output).
🟡 Minor comments (8)
.github/pull_request_template.md-18-18 (1)

18-18: ⚠️ Potential issue | 🟡 Minor

Incorrect link: Stellarium C++ style guide is unrelated to this project.

The URL http://stellarium.org/doc/head/codingStyle.html points to the Stellarium project's C++ coding style guide, not to Ruff's documentation. This appears to be a copy-paste from another template.

Replace with the actual Ruff documentation link:

Proposed fix
-- [ ] My code follows the [ruff code style](http://stellarium.org/doc/head/codingStyle.html) of this project.
+- [ ] My code follows the [Ruff code style](https://docs.astral.sh/ruff/) of this project.
pyproject.toml-10-10 (1)

10-10: ⚠️ Potential issue | 🟡 Minor

Add Python 3.11 requirement to CHANGELOG.md and RELEASE_NOTES.md as a breaking change.

The Python 3.11 minimum requirement is documented in README.md but missing from the release notes. Since this drops support for Python 3.9 and 3.10, it should be explicitly listed in CHANGELOG.md under a "Breaking Changes" section and mentioned in RELEASE_NOTES.md so users are aware when upgrading.

README.md-142-153 (1)

142-153: ⚠️ Potential issue | 🟡 Minor

Top-level --help output does not show the -v/--verbose flag.

The PR's main objective is adding a global -v/--verbose CLI flag, but the help output shown here (lines 147–148) only lists --help. If the verbose flag is indeed a top-level option, the documented output should include it; otherwise users won't discover it from databusclient --help.

databusclient/api/download.py-686-688 (1)

686-688: ⚠️ Potential issue | 🟡 Minor

Bug: print("QUERY {}", ...) does not interpolate — literal {} in output.

print("QUERY {}", expr) prints two arguments separated by a space, producing output like QUERY {} SELECT .... This was likely intended to be an f-string or .format() call.

Proposed fix
-            print("QUERY {}", databusURI.replace("\n", " "))
+            print(f"QUERY {databusURI.replace(chr(10), ' ')}")
databusclient/api/delete.py-122-137 (1)

122-137: ⚠️ Potential issue | 🟡 Minor

Validate databus_key before prompting for confirmation.

The databus_key is None check on line 132 runs after the interactive confirmation prompt (line 124). This means a user could confirm deletion only to immediately hit a ValueError. Move the validation before the prompt for better UX.

Proposed fix
 def _delete_resource(
     databusURI: str,
     databus_key: str,
     dry_run: bool = False,
     force: bool = False,
     queue: DeleteQueue = None,
 ):
+    if databus_key is None:
+        raise ValueError("Databus API key must be provided for deletion")
+
     # Confirm the deletion request, skip the request or cancel deletion process
     if not (dry_run or force):
         action = _confirm_delete(databusURI)
         if action == "skip":
             print(f"Skipping: {databusURI}\n")
             return
         if action == "cancel":
             raise KeyboardInterrupt("Deletion cancelled by user.")

-    if databus_key is None:
-        raise ValueError("Databus API key must be provided for deletion")
-
     if dry_run:
README.md-329-331 (1)

329-331: ⚠️ Potential issue | 🟡 Minor

Missing closing single quote in Docker deploy example.

Line 330 is missing the closing ' on the distribution URL string. Compare with the Python example on line 321 which correctly closes with '...|type=swagger'.

Proposed fix
-'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger
+'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
databusclient/api/utils.py-22-35 (1)

22-35: ⚠️ Potential issue | 🟡 Minor

Duplicate docstring on get_databus_id_parts_from_file_url.

There are two consecutive docstrings (lines 22-30 and lines 31-35). Only the first one is recognized by Python as the function's __doc__; the second is a dead string literal.

Proposed fix: merge into a single docstring
 def get_databus_id_parts_from_file_url(
     uri: str,
 ) -> Tuple[...]:
     """Extract databus ID parts from a given databus URI.
 
     Args:
         uri: The full databus URI of the form "http(s)://host/accountId/groupId/artifactId/versionId/fileId".
 
     Returns:
         A tuple containing (host, accountId, groupId, artifactId, versionId, fileId).
         Each element is a string or None if not present.
     """
-    """Split a Databus URI into its six parts.
-
-    The returned tuple is (host, accountId, groupId, artifactId, versionId, fileId).
-    Missing parts are returned as ``None``.
-    """
tests/test_download_auth.py-130-137 (1)

130-137: ⚠️ Potential issue | 🟡 Minor

Fragile assertions: "ACCESS" and "VAULT" are common substrings that may appear in log messages.

Log messages like "access_token", "Vault token", or "vault access" could contain the substrings "ACCESS" or "VAULT" (case-insensitive matches aside, Python's in is case-sensitive so "ACCESS" won't match "access" — but if the log ever says "ACCESS token" or similar, this breaks). Consider using a more distinctive sentinel value (e.g. "s3cr3t_test_token") for the mock tokens to avoid false negatives.

Suggested change for more robust secret values
     post_resp_1 = Mock()
-    post_resp_1.json.return_value = {"access_token": "ACCESS"}
+    post_resp_1.json.return_value = {"access_token": "xTEST_ACCESS_SECRET_42x"}
     post_resp_2 = Mock()
-    post_resp_2.json.return_value = {"access_token": "VAULT"}
+    post_resp_2.json.return_value = {"access_token": "xTEST_VAULT_SECRET_42x"}
     ...
-        assert "ACCESS" not in caplog.text
-        assert "VAULT" not in caplog.text
+        assert "xTEST_ACCESS_SECRET_42x" not in caplog.text
+        assert "xTEST_VAULT_SECRET_42x" not in caplog.text
🧹 Nitpick comments (14)
pyproject.toml (1)

17-20: black may be redundant alongside ruff.

Both black (line 18) and ruff (line 20) are listed as dev dependencies. Since Ruff can also handle formatting (ruff format), consider whether black is still needed or can be removed to simplify the toolchain.

databusclient/extensions/webdav.py (1)

114-121: subprocess.run has no timeout — could hang indefinitely.

While uploads can legitimately take a long time, a missing timeout means the process could hang forever if rclone stalls (e.g., unresponsive server). Consider adding a generous timeout or at least documenting the expectation.

databusclient/api/delete.py (2)

129-130: KeyboardInterrupt is unconventional for user-initiated cancellation.

KeyboardInterrupt is typically reserved for Ctrl+C signals. Raising it here may be caught by unexpected handlers or produce confusing tracebacks. Consider using SystemExit or a custom exception (e.g., click.Abort if Click is available at this layer).


149-152: Use a more specific exception type than bare Exception.

raise Exception(...) is too generic and makes it harder for callers to distinguish delete failures from other errors. Consider defining a DeleteError (or reusing requests.exceptions.HTTPError) for API failures.

databusclient/api/download.py (2)

291-293: Dunder naming __get_vault_access__ is unconventional for a module-level private function.

Python's name-mangling dunder convention (__name__) is intended for class attributes. For module-level private functions, a single underscore prefix (_get_vault_access) is the standard convention.


28-35: Consider adding type annotations to _download_file parameters.

All other functions in the file have type annotations, but _download_file parameters (url, localDir, etc.) lack them. This hurts readability and IDE support.

tests/test_utils_verbose.py (1)

12-22: make_response helper is duplicated across test files.

This same helper exists in tests/test_download_auth.py (lines 15-35) with a slightly different interface (content vs text, plus iter_content). Consider extracting a shared helper into tests/conftest.py to avoid divergence over time.

tests/test_download_auth.py (2)

1-13: Duplicate imports.

patch is imported twice (lines 1 and 11), and DownloadAuthError is imported twice (lines 10 and 13). Clean these up.

Proposed fix
 from unittest.mock import Mock, patch
 
 import pytest
 
 import requests
 import logging
 
 import databusclient.api.download as dl
 
-from databusclient.api.download import VAULT_REQUIRED_HOSTS, DownloadAuthError
-from unittest.mock import patch
-
-from databusclient.api.download import download, DownloadAuthError
+from databusclient.api.download import download, VAULT_REQUIRED_HOSTS, DownloadAuthError

48-58: monkeypatch parameter declared but unused.

The monkeypatch fixture is in the signature but never used in this test. Remove it to avoid confusion.

databusclient/api/utils.py (2)

80-80: Mid-file import logging violates PEP 8 import ordering.

Move this import to the top of the file alongside the other imports (lines 7-9).

Proposed fix
 from typing import Optional, Tuple
 
 import requests
+import logging
 
 ...
 
-import logging
-
-
 def log_http(method, url, ...):

91-108: Overly defensive resp_headers conversion with deeply nested try/except.

The triple-nested fallback chain (lines 93-108) is hard to follow. Since resp_headers will be either a requests.structures.CaseInsensitiveDict (which supports dict()) or a plain dict from mocks, the outer dict(resp_headers) should handle both. The fallback branches handling .items() and .headers attributes add complexity for unlikely edge cases.

Simplified alternative
     if resp_headers:
-        # try to convert to dict; handle Mock or response objects gracefully
-        try:
-            resp_dict = dict(resp_headers)
-        except Exception:
-            # resp_headers might be a Mock or requests.Response; try common attributes
-            if hasattr(resp_headers, "items"):
-                try:
-                    resp_dict = dict(resp_headers.items())
-                except Exception:
-                    resp_dict = {"headers": str(resp_headers)}
-            elif hasattr(resp_headers, "headers"):
-                try:
-                    resp_dict = dict(getattr(resp_headers, "headers") or {})
-                except Exception:
-                    resp_dict = {"headers": str(resp_headers)}
-            else:
-                resp_dict = {"headers": str(resp_headers)}
+        try:
+            resp_dict = dict(resp_headers)
+        except (TypeError, ValueError):
+            resp_dict = {"raw": str(resp_headers)}
databusclient/cli.py (1)

181-194: download command should also propagate verbose to other functions that support it.

Currently only api_download receives verbose. If the verbose flag is fixed at the group level (per comment above), also consider propagating it to deploy and delete commands for consistency with the PR's objective of enabling detailed HTTP logging across all Databus/Vault interactions.

databusclient/api/deploy.py (2)

460-509: deploy prints the API key in the request headers when _debug is True.

Line 482 sets "X-API-KEY": f"{api_key}" in headers, and lines 501-502 print the data payload (not headers). This is fine currently. However, note that f"{api_key}" is equivalent to str(api_key) — the f-string wrapper is unnecessary.

Also, resp.status_code != 200 (line 504) will reject other successful 2xx codes (e.g., 201 Created, 204 No Content). If the Databus API only returns 200 on success, this is fine, but it's worth confirming.

Minor cleanup
-    headers = {"X-API-KEY": f"{api_key}", "Content-Type": "application/json"}
+    headers = {"X-API-KEY": api_key, "Content-Type": "application/json"}

79-106: _get_filetype_definition parsing logic is fragile and hard to follow.

The function relies on positional indexing (metadata_list[-3], [-2], [-1]) and : heuristic to distinguish sha256:length from other fields. Edge cases where a format extension or compression string contains : would be misinterpreted. The branching on len(metadata_list) (2, 3, 4) combined with the : check creates subtle logic that's difficult to maintain.

Consider documenting the exact expected distribution string grammar (e.g., url|cvs|format|compression|sha:len) to help future maintainers reason about this parsing.

with:
python-version: "3.10"
python-version: "3.11"
- uses: Gr1N/setup-poetry@v8 #install poetry
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Gr1N/setup-poetry GitHub Action maintenance status 2025

💡 Result:

In 2025, Gr1N/setup-poetry appears inactive / not actively maintained:

  • The repo is not archived, but the last code push was 2024-06-16 (no pushes during 2025). [1]
  • Commit activity shows 0 commits in the past year (as of the dataset sync in 2025), indicating no ongoing development through 2025. [2]
  • The latest published major tag is still v9 (shown as released “almost 2 years ago” on the GitHub Actions registry metadata), with no newer major versions appearing in 2025. [3]

Sources: [1] [2] [3]


Consider replacing Gr1N/setup-poetry with an actively maintained alternative.

Gr1N/setup-poetry@v8 has not been actively maintained since June 2024 with no updates in 2025. The latest major version (v9) was released nearly two years ago. Use snok/install-poetry instead, which is actively maintained and more widely adopted.

🤖 Prompt for AI Agents
In @.github/workflows/python-CI.yml at line 26, Replace the unmaintained GitHub
Action usage "Gr1N/setup-poetry@v8" in the workflow with the actively maintained
"snok/install-poetry" action: update the step that currently uses
Gr1N/setup-poetry to use snok/install-poetry (e.g., "uses:
snok/install-poetry@vX") and migrate any inputs to the new action's expected
keys (for example use "poetry-version" or "version" per snok/install-poetry
docs) so the workflow installs the desired Poetry version and behaves the same
as the previous step.

Comment on lines 19 to 67
class DeleteQueue:
"""
A queue to manage multiple Databus resource deletions.
Allows adding multiple databus URIs to a queue and executing their deletion in batch.
"""

def __init__(self, databus_key: str):
"""Create a DeleteQueue bound to a given Databus API key.

Args:
databus_key: API key used to authenticate deletion requests.
"""
self.databus_key = databus_key
self.queue: set[str] = set()

def add_uri(self, databusURI: str):
"""Add a single Databus URI to the deletion queue.

The URI will be deleted when `execute()` is called.
"""
self.queue.add(databusURI)

def add_uris(self, databusURIs: List[str]):
"""Add multiple Databus URIs to the deletion queue.

Args:
databusURIs: Iterable of full Databus URIs.
"""
for uri in databusURIs:
self.queue.add(uri)

def is_empty(self) -> bool:
"""Return True if the queue is empty."""
return len(self.queue) == 0

def is_not_empty(self) -> bool:
"""Return True if the queue contains any URIs."""
return len(self.queue) > 0

def execute(self):
"""Execute all queued deletions.

Each queued URI will be deleted using `_delete_resource`.
"""
_delete_list(
list(self.queue),
self.databus_key,
force=True,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Queue uses set — deletion order is not preserved.

DeleteQueue.queue is a set[str], so execute() iterates in arbitrary order. For recursive deletions (group → artifacts → versions), the parent resource may be deleted before its children, which could cause the API to return errors or leave orphaned resources.

Consider using an ordered collection (e.g., list with deduplication, or dict as an ordered set) to preserve insertion order and ensure children are deleted before parents.

Proposed fix (minimal — use dict as ordered set)
     def __init__(self, databus_key: str):
         self.databus_key = databus_key
-        self.queue: set[str] = set()
+        self.queue: dict[str, None] = {}

     def add_uri(self, databusURI: str):
-        self.queue.add(databusURI)
+        self.queue[databusURI] = None

     def add_uris(self, databusURIs: List[str]):
         for uri in databusURIs:
-            self.queue.add(uri)
+            self.queue[uri] = None

     def is_empty(self) -> bool:
         return len(self.queue) == 0

     def is_not_empty(self) -> bool:
         return len(self.queue) > 0

     def execute(self):
         _delete_list(
             list(self.queue),
             self.databus_key,
             force=True,
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class DeleteQueue:
"""
A queue to manage multiple Databus resource deletions.
Allows adding multiple databus URIs to a queue and executing their deletion in batch.
"""
def __init__(self, databus_key: str):
"""Create a DeleteQueue bound to a given Databus API key.
Args:
databus_key: API key used to authenticate deletion requests.
"""
self.databus_key = databus_key
self.queue: set[str] = set()
def add_uri(self, databusURI: str):
"""Add a single Databus URI to the deletion queue.
The URI will be deleted when `execute()` is called.
"""
self.queue.add(databusURI)
def add_uris(self, databusURIs: List[str]):
"""Add multiple Databus URIs to the deletion queue.
Args:
databusURIs: Iterable of full Databus URIs.
"""
for uri in databusURIs:
self.queue.add(uri)
def is_empty(self) -> bool:
"""Return True if the queue is empty."""
return len(self.queue) == 0
def is_not_empty(self) -> bool:
"""Return True if the queue contains any URIs."""
return len(self.queue) > 0
def execute(self):
"""Execute all queued deletions.
Each queued URI will be deleted using `_delete_resource`.
"""
_delete_list(
list(self.queue),
self.databus_key,
force=True,
)
class DeleteQueue:
"""
A queue to manage multiple Databus resource deletions.
Allows adding multiple databus URIs to a queue and executing their deletion in batch.
"""
def __init__(self, databus_key: str):
"""Create a DeleteQueue bound to a given Databus API key.
Args:
databus_key: API key used to authenticate deletion requests.
"""
self.databus_key = databus_key
self.queue: dict[str, None] = {}
def add_uri(self, databusURI: str):
"""Add a single Databus URI to the deletion queue.
The URI will be deleted when `execute()` is called.
"""
self.queue[databusURI] = None
def add_uris(self, databusURIs: List[str]):
"""Add multiple Databus URIs to the deletion queue.
Args:
databusURIs: Iterable of full Databus URIs.
"""
for uri in databusURIs:
self.queue[uri] = None
def is_empty(self) -> bool:
"""Return True if the queue is empty."""
return len(self.queue) == 0
def is_not_empty(self) -> bool:
"""Return True if the queue contains any URIs."""
return len(self.queue) > 0
def execute(self):
"""Execute all queued deletions.
Each queued URI will be deleted using `_delete_resource`.
"""
_delete_list(
list(self.queue),
self.databus_key,
force=True,
)
🤖 Prompt for AI Agents
In `@databusclient/api/delete.py` around lines 19 - 67, DeleteQueue currently uses
a set (self.queue) which loses insertion order and can delete parents before
children; change self.queue to an ordered collection (e.g., use a dict as an
ordered set or a list plus explicit deduplication) so insertion order is
preserved, update add_uri and add_uris to avoid duplicates (e.g., only append if
not already present or set dict[key]=None), and keep execute calling
_delete_list(list(self.queue), self.databus_key, force=True) so deletions occur
in insertion order; ensure type hints for self.queue are updated accordingly and
behaviour of is_empty/is_not_empty remains correct.

Comment on lines 242 to 263
group_body = fetch_databus_jsonld(databusURI, databus_key)

json_dict = json.loads(group_body)
artifacts = json_dict.get("databus:hasArtifact", [])

artifact_uris = []
for item in artifacts:
uri = item.get("@id")
if not uri:
continue
_, _, _, _, version, _ = get_databus_id_parts_from_file_url(uri)
if version is None:
artifact_uris.append(uri)

# Delete all artifacts (which deletes their versions)
for artifact_uri in artifact_uris:
_delete_artifact(
artifact_uri, databus_key, dry_run=dry_run, force=force, queue=queue
)

# Finally, delete the group itself
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force, queue=queue)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: databus:hasArtifact single-dict case not handled in _delete_group.

If the JSON-LD returns a single artifact as {"@id": "..."} instead of [{"@id": "..."}], iterating over a dict yields its keys (strings), not the dict itself. This means item.get("@id") on line 249 would fail with AttributeError since strings don't have .get().

_delete_artifact (line 202) correctly handles this pattern for versions, and _get_databus_artifacts_of_group in download.py (line 560) also handles it. Apply the same normalization here.

Proposed fix
     json_dict = json.loads(group_body)
-    artifacts = json_dict.get("databus:hasArtifact", [])
+    artifacts = json_dict.get("databus:hasArtifact", [])
+    if isinstance(artifacts, dict):
+        artifacts = [artifacts]

     artifact_uris = []
🤖 Prompt for AI Agents
In `@databusclient/api/delete.py` around lines 242 - 263, The loop in
_delete_group is failing when "databus:hasArtifact" is a single dict because
iterating a dict yields keys; normalize the artifacts value right after
artifacts = json_dict.get("databus:hasArtifact", []) to always be a list (if
it's a dict, wrap it in [dict]; if it's None, set to []), and when extracting
the artifact URI handle both dict items (item.get("@id")) and string items (use
item directly) before calling get_databus_id_parts_from_file_url; update
references to _delete_group, fetch_databus_jsonld,
get_databus_id_parts_from_file_url, _delete_artifact, and _delete_resource
accordingly.

Comment on lines 56 to 59
)

key, value = kv.split("=")
cvs[key] = value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

kv.split("=") will fail on values containing =.

If a content variant value contains = (e.g., key=base64=encoded), kv.split("=") returns 3+ elements and the unpacking key, value = kv.split("=") raises ValueError: too many values to unpack. Use split("=", 1) to split only on the first =.

Proposed fix
-        key, value = kv.split("=")
+        key, value = kv.split("=", 1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
)
key, value = kv.split("=")
cvs[key] = value
)
key, value = kv.split("=", 1)
cvs[key] = value
🤖 Prompt for AI Agents
In `@databusclient/api/deploy.py` around lines 56 - 59, The current parsing of
key/value pairs using kv.split("=") in deploy.py can throw ValueError when
values contain '='; change the split call to only split on the first '=' (use
kv.split("=", 1)) when assigning key and value so that values with '=' (e.g.,
base64) are preserved, then continue adding them to cvs (and optionally strip()
key/value to remove surrounding whitespace).

Comment on lines 186 to 192
resp = requests.get(url, timeout=30)
if resp.status_code >= 400:
raise requests.exceptions.RequestException(response=resp)

sha256sum = hashlib.sha256(bytes(resp.content)).hexdigest()
content_length = len(resp.content)
return sha256sum, content_length
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

RequestException does not accept a response keyword — this will not propagate the HTTP error correctly.

requests.exceptions.RequestException.__init__ does not have a response parameter (that's requests.exceptions.HTTPError). The response=resp kwarg is silently absorbed by **kwargs but won't be accessible as e.response. Additionally, resp.raise_for_status() is the idiomatic way to raise on HTTP errors, and bytes(resp.content) is redundant since .content is already bytes.

Proposed fix
 def _load_file_stats(url: str) -> Tuple[str, int]:
     resp = requests.get(url, timeout=30)
-    if resp.status_code >= 400:
-        raise requests.exceptions.RequestException(response=resp)
-
-    sha256sum = hashlib.sha256(bytes(resp.content)).hexdigest()
+    resp.raise_for_status()
+    sha256sum = hashlib.sha256(resp.content).hexdigest()
     content_length = len(resp.content)
     return sha256sum, content_length
🤖 Prompt for AI Agents
In `@databusclient/api/deploy.py` around lines 186 - 192, The current code raises
requests.exceptions.RequestException(response=resp) which doesn't set e.response
and is non-idiomatic; replace that error handling with resp.raise_for_status()
to raise an HTTPError with the response attached, and compute the hash using
resp.content directly (remove the redundant bytes(resp.content) wrapper). Update
the block around requests.get(url, timeout=30), resp.status_code check,
sha256sum = hashlib.sha256(...).hexdigest(), and content_length =
len(resp.content) accordingly.

Comment on lines 15 to 22
@click.group()
def app():
"""Databus Client CLI"""
"""Databus Client CLI.

Provides `deploy`, `download`, and `delete` commands for interacting
with the DBpedia Databus.
"""
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing -v/--verbose flag on the CLI group — the PR's primary feature is not wired up.

The download command references ctx.obj.get("verbose", False) at line 191, but:

  1. The app group has no @click.pass_context decorator and no --verbose option.
  2. ctx.obj is never initialized, so it defaults to None.
  3. Calling ctx.obj.get("verbose", False) will raise AttributeError: 'NoneType' object has no attribute 'get'.

This means the verbose feature described in the PR objectives is non-functional.

Proposed fix
 `@click.group`()
-def app():
+@click.option("-v", "--verbose", is_flag=True, default=False, help="Enable detailed HTTP request/response logging")
+@click.pass_context
+def app(ctx, verbose):
     """Databus Client CLI.
 
     Provides `deploy`, `download`, and `delete` commands for interacting
     with the DBpedia Databus.
     """
-    pass
+    ctx.ensure_object(dict)
+    ctx.obj["verbose"] = verbose
🤖 Prompt for AI Agents
In `@databusclient/cli.py` around lines 15 - 22, The CLI group function app()
currently lacks a verbose flag and context setup, causing ctx.obj to be None
when download reads ctx.obj.get("verbose"); fix by adding a Click option and
context handling: decorate the app function with `@click.pass_context` and a
`@click.option`("-v","--verbose", is_flag=True, help=...), add a verbose parameter
to app(ctx, verbose), call ctx.ensure_object(dict) (or set ctx.obj = {}) and
store ctx.obj["verbose"] = verbose so downstream commands like download can
safely use ctx.obj.get("verbose", False).

Comment on lines +264 to +267
@app.command()
@click.argument("shell", type=click.Choice(["bash","zsh","fish","powershell"]), required=False)
def completion(shell="bash"):
click.echo(f"Run: eval \"$(_DATABUSCLIENT_COMPLETE=source_{shell} python -m databusclient)\"")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

shell argument default not properly configured — will be None when omitted.

Click arguments with required=False default to None regardless of the Python function signature default. When a user runs databusclient completion without specifying a shell, shell will be None, producing source_None in the output.

Proposed fix
 `@app.command`()
-@click.argument("shell", type=click.Choice(["bash","zsh","fish","powershell"]), required=False)
-def completion(shell="bash"):
+@click.argument("shell", type=click.Choice(["bash","zsh","fish","powershell"]), default="bash")
+def completion(shell):
     click.echo(f"Run: eval \"$(_DATABUSCLIENT_COMPLETE=source_{shell} python -m databusclient)\"")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@app.command()
@click.argument("shell", type=click.Choice(["bash","zsh","fish","powershell"]), required=False)
def completion(shell="bash"):
click.echo(f"Run: eval \"$(_DATABUSCLIENT_COMPLETE=source_{shell} python -m databusclient)\"")
`@app.command`()
`@click.argument`("shell", type=click.Choice(["bash","zsh","fish","powershell"]), default="bash")
def completion(shell):
click.echo(f"Run: eval \"$(_DATABUSCLIENT_COMPLETE=source_{shell} python -m databusclient)\"")
🤖 Prompt for AI Agents
In `@databusclient/cli.py` around lines 264 - 267, The completion command's shell
argument can be None because Click ignores the Python default; update the
completion function (completion) to handle a missing shell by normalizing None
to a sensible default (e.g., "bash") before building the echo string — for
example set shell = shell or "bash" at the start of completion, then use that
normalized variable in the click.echo call that constructs the eval string.

Comment on lines 26 to 27
def test_with_query():
cl.download("tmp",DEFAULT_ENDPOINT,[TEST_QUERY])

api_download("tmp", DEFAULT_ENDPOINT, [TEST_QUERY])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

test_with_query hits a live SPARQL endpoint without mocking or skip marker.

This test calls api_download("tmp", DEFAULT_ENDPOINT, [TEST_QUERY]) which sends a real SPARQL query to https://databus.dbpedia.org/sparql and downloads files. Unlike test_with_collection (which is correctly marked with @pytest.mark.skip), this test has no skip marker or mocking.

Since SPARQLWrapper is a project dependency and will typically be installed, the conftest.py stub won't activate (if "SPARQLWrapper" not in sys.modules will be False). This makes the test dependent on network connectivity and the external service being available.

Consider adding @pytest.mark.skip(reason="Integration test: requires live databus.dbpedia.org connection") or mocking the SPARQL/download calls.

🤖 Prompt for AI Agents
In `@tests/test_download.py` around lines 26 - 27, The test test_with_query
currently calls api_download("tmp", DEFAULT_ENDPOINT, [TEST_QUERY]) against the
live SPARQL endpoint; add a skip marker or mock the network to prevent hitting
the external service: either decorate test_with_query with
`@pytest.mark.skip`(reason="Integration test: requires live databus.dbpedia.org
connection") placed immediately above the test, or modify tests/conftest.py or
the test to mock SPARQLWrapper/api_download network calls (e.g., monkeypatch the
api_download call or stub SPARQLWrapper) so the test does not perform real HTTP
requests.

Comment on lines +25 to +34
def test_fetch_databus_jsonld_verbose_redacts_api_key(caplog):
caplog.set_level(logging.DEBUG, logger='databusclient')
url = "https://databus.example/resource"
resp = make_response(status=200, headers={"content-type": "application/ld+json"}, text='{}')
with patch("databusclient.api.utils.requests.get", return_value=resp):
txt = utils.fetch_databus_jsonld(url, databus_key="SECRET", verbose=True)
assert "[HTTP] GET" in caplog.text
assert "REDACTED" in caplog.text
assert "SECRET" not in caplog.text
assert txt == '{}'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the actual function signatures
echo "=== fetch_databus_jsonld signature ==="
rg -n "def fetch_databus_jsonld" --type=py -A3

echo "=== _get_sparql_query_of_collection signature ==="
rg -n "def _get_sparql_query_of_collection" --type=py -A3

echo "=== _query_sparql_endpoint signature ==="
rg -n "def _query_sparql_endpoint" --type=py -A3

Repository: dbpedia/databus-python-client

Length of output: 1067


🏁 Script executed:

#!/bin/bash
# Check the actual test calls
echo "=== Test file lines 25-34 ==="
sed -n '25,34p' tests/test_utils_verbose.py

echo "=== Test file lines 38-47 ==="
sed -n '38,47p' tests/test_utils_verbose.py

echo "=== Test file lines 51-76 ==="
sed -n '51,76p' tests/test_utils_verbose.py

Repository: dbpedia/databus-python-client

Length of output: 2376


🏁 Script executed:

#!/bin/bash
# Check if functions use **kwargs or *args
echo "=== fetch_databus_jsonld full definition ==="
rg -A30 "def fetch_databus_jsonld" databusclient/api/utils.py

echo "=== _get_sparql_query_of_collection full definition ==="
rg -A30 "def _get_sparql_query_of_collection" databusclient/api/download.py

echo "=== _query_sparql_endpoint full definition ==="
rg -A30 "def _query_sparql_endpoint" databusclient/api/download.py

Repository: dbpedia/databus-python-client

Length of output: 3260


🏁 Script executed:

#!/bin/bash
# Check for module-level verbose variable definition
echo "=== verbose variable in utils.py ==="
rg -n "^verbose" databusclient/api/utils.py

echo "=== verbose variable in download.py ==="
rg -n "^verbose" databusclient/api/download.py

echo "=== All references to verbose in utils.py ==="
rg -n "verbose" databusclient/api/utils.py

echo "=== All references to verbose in download.py ==="
rg -n "verbose" databusclient/api/download.py

Repository: dbpedia/databus-python-client

Length of output: 282


Function signatures do not accept verbose parameter; tests will fail with TypeError at runtime.

All three test functions pass verbose=True to their respective functions, but none of the function signatures accept this parameter:

  • Line 30: utils.fetch_databus_jsonld(url, databus_key="SECRET", verbose=True) — signature is (uri: str, databus_key: str | None = None)
  • Line 43: dl._get_sparql_query_of_collection(url, databus_key="SECRET", verbose=True) — signature is (uri: str, databus_key: str | None = None)
  • Line 72: dl._query_sparql_endpoint(endpoint, query, databus_key="SECRET", verbose=True) — signature is (endpoint_url, query, databus_key=None)

Additionally, fetch_databus_jsonld (utils.py lines 57, 60) references a bare verbose variable that is never defined, which will cause a NameError if called. The other two functions (_get_sparql_query_of_collection and _query_sparql_endpoint) have no verbose logging logic implemented at all.

Either add verbose parameters to all three functions and implement the verbose logic in _get_sparql_query_of_collection and _query_sparql_endpoint, or use a different mechanism (e.g., module-level flag or monkeypatching) to control verbose output.

🤖 Prompt for AI Agents
In `@tests/test_utils_verbose.py` around lines 25 - 34, Tests pass verbose=True
but the function signatures and implementations don't accept or use it: add an
optional verbose: bool = False parameter to utils.fetch_databus_jsonld,
datastore loader method _get_sparql_query_of_collection, and
_query_sparql_endpoint (keep default False to preserve existing API), fix
fetch_databus_jsonld to reference the local verbose parameter instead of a bare
name, propagate verbose through calls (fetch_databus_jsonld ->
_get_sparql_query_of_collection -> _query_sparql_endpoint) and implement verbose
logging in the latter two to emit the same redacted "[HTTP] GET" messages when
verbose is true (redact databus_key values and ensure databus_key is accepted
and used for redaction in log output).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@databusclient/cli.py`:
- Line 270: The code uses re.match on the variable sha_length (see the
conditional using re.match(r'^[A-Fa-f0-9]{64}:\d+$', sha_length)) but the re
module is not imported; fix this by adding "import re" to the module-level
imports in databusclient/cli.py so re.match resolves correctly.

In `@tests/test_download_auth.py`:
- Around line 10-12: Remove the duplicate/unused imports in
tests/test_download_auth.py: delete the extra import of patch from unittest.mock
(it's already imported with Mock), remove the redundant DownloadAuthError import
(already imported elsewhere in the file), and drop the unused download import
(tests call dl._download_file directly); ensure only the required imports remain
so tests reference dl._download_file and any exception symbols via the existing
imports.
- Around line 108-135: The _download_file function in
databusclient/api/download.py needs a new optional parameter verbose (default
False) added to its signature and used to enable detailed HTTP request/response
logging with sensitive values redacted; update the function _download_file(...)
to accept verbose: bool = False, branch on verbose to emit debug logs (using the
existing logger used elsewhere in this module) and ensure any tokens (e.g.,
values from REFRESH_TOKEN, access_token, Authorization headers) are replaced
with a constant like "REDACTED" before logging; keep behavior unchanged when
verbose is False and consider using tmp_path in tests instead of localDir='.' to
avoid writing to the working directory.
🧹 Nitpick comments (2)
databusclient/cli.py (1)

279-281: Redundant re-import of json.

json is already imported at line 2. The local import json as _json is unnecessary.

Proposed fix
     if json_output:
-        import json as _json
-        click.echo(_json.dumps({"distribution": dist}))
+        click.echo(json.dumps({"distribution": dist}))
tests/test_download_auth.py (1)

119-135: Fragile token sentinel values — consider using unambiguous strings.

The mock token values "ACCESS" and "VAULT" (lines 120, 122) are common words that could appear in log messages for reasons unrelated to token leakage (e.g., a log line mentioning "VAULT_REQUIRED_HOSTS" or "ACCESS denied"). If the logging output ever includes these substrings in a non-token context, the assertions on lines 134–135 will produce a false failure.

Consider using unique nonsense strings that would never appear in a legitimate log message:

Suggested change
     post_resp_1 = Mock()
-    post_resp_1.json.return_value = {"access_token": "ACCESS"}
+    post_resp_1.json.return_value = {"access_token": "SENS1TIVE_TKN_AAA"}
     post_resp_2 = Mock()
-    post_resp_2.json.return_value = {"access_token": "VAULT"}
+    post_resp_2.json.return_value = {"access_token": "SENS1TIVE_TKN_BBB"}
 
     with patch("requests.head", return_value=resp_head), patch(
         "requests.get", side_effect=get_side_effects
     ), patch("requests.post", side_effect=[post_resp_1, post_resp_2]):
         monkeypatch.setenv("REFRESH_TOKEN", "x" * 90)
 
         # run download with verbose enabled
         dl._download_file(url, localDir='.', vault_token_file="/does/not/matter", verbose=True)
         assert "[HTTP] HEAD" in caplog.text or "[HTTP] GET" in caplog.text
         assert "REDACTED" in caplog.text
         # Ensure token values are not directly printed
-        assert "ACCESS" not in caplog.text
-        assert "VAULT" not in caplog.text
+        assert "SENS1TIVE_TKN_AAA" not in caplog.text
+        assert "SENS1TIVE_TKN_BBB" not in caplog.text

# Validate sha-length
sha_tuple = None
if sha_length:
if not re.match(r'^[A-Fa-f0-9]{64}:\d+$', sha_length):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

NameError: re module is not imported.

re.match() is used here but re is never imported in this file. This will crash at runtime whenever --sha-length is provided.

Proposed fix — add import at top of file

Add to the imports at the top of the file:

import re
🤖 Prompt for AI Agents
In `@databusclient/cli.py` at line 270, The code uses re.match on the variable
sha_length (see the conditional using re.match(r'^[A-Fa-f0-9]{64}:\d+$',
sha_length)) but the re module is not imported; fix this by adding "import re"
to the module-level imports in databusclient/cli.py so re.match resolves
correctly.

Comment on lines +10 to +12
from unittest.mock import patch

from databusclient.api.download import download, DownloadAuthError
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate and unused imports.

  • patch is already imported on line 1 (from unittest.mock import Mock, patch).
  • DownloadAuthError is already imported on line 9.
  • download (line 12) is not used anywhere in this file — all tests invoke dl._download_file directly.

These lines can be removed entirely.

Proposed fix
-from unittest.mock import patch
-
-from databusclient.api.download import download, DownloadAuthError
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from unittest.mock import patch
from databusclient.api.download import download, DownloadAuthError
🤖 Prompt for AI Agents
In `@tests/test_download_auth.py` around lines 10 - 12, Remove the
duplicate/unused imports in tests/test_download_auth.py: delete the extra import
of patch from unittest.mock (it's already imported with Mock), remove the
redundant DownloadAuthError import (already imported elsewhere in the file), and
drop the unused download import (tests call dl._download_file directly); ensure
only the required imports remain so tests reference dl._download_file and any
exception symbols via the existing imports.

Comment on lines +108 to +135
def test_verbose_redacts_authorization(monkeypatch, caplog):
caplog.set_level(logging.DEBUG, logger='databusclient')
vault_host = next(iter(VAULT_REQUIRED_HOSTS))
url = f"https://{vault_host}/protected/file.ttl"

resp_head = make_response(status=200, headers={})
resp_401 = make_response(status=401, headers={"WWW-Authenticate": "Bearer realm=\"auth\""})
resp_200 = make_response(status=200, headers={"content-length": "0"}, content=b"")

get_side_effects = [resp_401, resp_200]

post_resp_1 = Mock()
post_resp_1.json.return_value = {"access_token": "ACCESS"}
post_resp_2 = Mock()
post_resp_2.json.return_value = {"access_token": "VAULT"}

with patch("requests.head", return_value=resp_head), patch(
"requests.get", side_effect=get_side_effects
), patch("requests.post", side_effect=[post_resp_1, post_resp_2]):
monkeypatch.setenv("REFRESH_TOKEN", "x" * 90)

# run download with verbose enabled
dl._download_file(url, localDir='.', vault_token_file="/does/not/matter", verbose=True)
assert "[HTTP] HEAD" in caplog.text or "[HTTP] GET" in caplog.text
assert "REDACTED" in caplog.text
# Ensure token values are not directly printed
assert "ACCESS" not in caplog.text
assert "VAULT" not in caplog.text No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the _download_file function definition and check its signature
find . -type f -name "*.py" | xargs grep -l "_download_file" | head -20

Repository: dbpedia/databus-python-client

Length of output: 134


🏁 Script executed:

#!/bin/bash
# Search for _download_file function definition
rg "def _download_file" -A 10 --type py

Repository: dbpedia/databus-python-client

Length of output: 1296


🏁 Script executed:

#!/bin/bash
# Check how other tests in the test file handle file paths and tmp_path usage
head -50 tests/test_download_auth.py

Repository: dbpedia/databus-python-client

Length of output: 1411


🏁 Script executed:

#!/bin/bash
# Get the complete _download_file function signature
sed -n '/^def _download_file(/,/^[):]$/p' databusclient/api/download.py | head -30

Repository: dbpedia/databus-python-client

Length of output: 1253


🏁 Script executed:

#!/bin/bash
# Check complete test file to see how test_verbose_redacts_authorization is structured
wc -l tests/test_download_auth.py

Repository: dbpedia/databus-python-client

Length of output: 104


🏁 Script executed:

#!/bin/bash
# Read the entire test file
cat tests/test_download_auth.py

Repository: dbpedia/databus-python-client

Length of output: 5033


Add verbose parameter to _download_file function signature.

The test calls _download_file(..., verbose=True) at line 130, but the function signature in databusclient/api/download.py does not include a verbose parameter. This will raise TypeError: _download_file() got an unexpected keyword argument 'verbose' at runtime. The parameter must be added to the function signature and its logging behavior implemented before this test can pass.

Additionally, consider using pytest's tmp_path fixture instead of localDir='.' to isolate test artifacts, rather than writing to the current directory—though this applies to all tests in the file, not just this one.

🤖 Prompt for AI Agents
In `@tests/test_download_auth.py` around lines 108 - 135, The _download_file
function in databusclient/api/download.py needs a new optional parameter verbose
(default False) added to its signature and used to enable detailed HTTP
request/response logging with sensitive values redacted; update the function
_download_file(...) to accept verbose: bool = False, branch on verbose to emit
debug logs (using the existing logger used elsewhere in this module) and ensure
any tokens (e.g., values from REFRESH_TOKEN, access_token, Authorization
headers) are replaced with a constant like "REDACTED" before logging; keep
behavior unchanged when verbose is False and consider using tmp_path in tests
instead of localDir='.' to avoid writing to the working directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

verbose param

3 participants