Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
5b1303a
Adds remote path stripping to azure list_files function
Jsalz2000 Oct 11, 2024
261a0f5
Adds path stripping to remote_path and prefix in azure blob destination
Jsalz2000 Oct 14, 2024
2ea04fd
Formats changes with black to pass make lint
Jsalz2000 Oct 14, 2024
59ad4f9
Bumps version to 3.4.0
Jsalz2000 Oct 14, 2024
62b93be
Adds mysql hostname config argument, removes hardcoded address
Jsalz2000 Oct 14, 2024
c05b415
Adds Azure Client config parameters, refactors Azure destination
Jsalz2000 Dec 30, 2024
a71ef36
Bumps version to 3.4.1
Jsalz2000 Dec 30, 2024
2ecf1c8
Documents optional az.client configuration block
Jsalz2000 Dec 30, 2024
6094d67
Adds whitespace below file header in az config
Jsalz2000 Dec 31, 2024
0c3957d
Adds __cast_options function in config parsing
Jsalz2000 Dec 31, 2024
fc0582b
Makes az.client config optional in config parser
Jsalz2000 Jan 2, 2025
1e5cbba
Adds connection_timeout and max_concurrency settings to Azure Blob de…
Jsalz2000 Jan 2, 2025
925d98c
Formats with black, isort. Casts az config params
Jsalz2000 Jan 2, 2025
ff93bc7
Adds connection_timeout and max_concurrency Azure Blob unit tests
Jsalz2000 Jan 2, 2025
91275a1
Adds connection_timeout and max_concurrency Azure Blob options to docs
Jsalz2000 Jan 2, 2025
5470325
Adds Azure managed identity authentication support
Jsalz2000 Apr 16, 2026
a65cb7f
Bumps version to 3.5.0
Jsalz2000 Apr 16, 2026
ef0bab5
Adds cluster identifier and Azure lease-based single-writer gate
Jsalz2000 Apr 20, 2026
43c6621
Bumps version to 3.6.0
Jsalz2000 Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ Install TwinDB Backup.
.. code-block:: console

# Download the package
wget https://twindb-release.s3.amazonaws.com/twindb-backup/3.3.0/focal/twindb-backup_3.3.0-1_amd64.deb
wget https://twindb-release.s3.amazonaws.com/twindb-backup/3.6.0/focal/twindb-backup_3.6.0-1_amd64.deb
# Install TwinDB Backup
apt install ./twindb-backup_3.3.0-1_amd64.deb
apt install ./twindb-backup_3.6.0-1_amd64.deb

Configuring TwinDB Backup
~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -157,7 +157,7 @@ The package file will be generated in ``omnibus/pkg/``:
.. code-block:: console

$ ls omnibus/pkg/*.deb
omnibus/pkg/twindb-backup_3.3.0-1_amd64.deb
omnibus/pkg/twindb-backup_3.6.0-1_amd64.deb

Once the package is built you can install it with rpm/dpkg or upload it to your repository
and install it with apt or yum.
Expand Down
53 changes: 53 additions & 0 deletions docs/azure_worm_compatibility.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
Azure WORM Compatibility Notes
==============================

This note captures the phase-2 design direction for running TwinDB against immutable Azure Blob containers after the phase-1 managed identity rollout.

Current blockers
----------------

TwinDB is not WORM-compatible today because it still depends on mutable and destructive blob operations in the Azure destination flow:

- ``twindb_backup/source/mysql_source.py`` deletes remote backup copies when applying retention.
- ``twindb_backup/source/file_source.py`` deletes remote file backup copies when applying retention.
- ``twindb_backup/backup.py`` deletes binlogs and cleanup copies through the destination object.
- ``twindb_backup/status/base_status.py`` overwrites ``status`` and ``binlog-status`` blobs in place.

Container-level immutability blocks those delete and overwrite patterns, so authentication changes alone are not enough to make TwinDB WORM-safe.

Phase-2 decisions
-----------------

1. Add a provider-managed retention mode.

In this mode, TwinDB must stop calling ``dst.delete(...)`` for remote retention and cleanup. Azure lifecycle management and immutable storage retention policies become the source of truth for payload expiration.

2. Move mutable status out of the immutable backup container.

``status`` and ``binlog-status`` blobs should live in a separate mutable location. The cleanest follow-on design is a separate Azure status container or destination stanza for metadata, leaving the payload container append-only.

3. Use finite immutability windows.

The target model should use time-based retention sized to recovery requirements rather than "infinite retention". That keeps the operational model compatible with Azure lifecycle cleanup once blobs age past the immutability window.

4. Validate on unlocked non-production storage before any lock decision.

The first end-to-end WORM validation should use a non-production container with an unlocked immutability policy. Validate backup writes, status writes, restore reads, and lifecycle-driven cleanup behavior before any container is locked.

Suggested implementation shape
------------------------------

The likely code changes for the phase-2 follow-on are:

- Add a configuration switch such as ``remote_delete = false`` or ``retention_mode = provider_managed`` and thread it into the retention and cleanup call sites.
- Teach the backup/status flow to use a separate mutable Azure location for status metadata.
- Keep the existing phase-1 managed identity authentication path for both payload and status destinations, but allow them to point at different containers.

Recommended validation order
----------------------------

1. Enable managed identity auth first.
2. Pre-create separate payload and status containers.
3. Enable provider-managed retention mode so TwinDB stops issuing remote deletes.
4. Apply a finite, unlocked immutability policy to the payload container.
5. Confirm backups still upload, status metadata still updates, restores still read, and Azure lifecycle cleanup works after the retention window expires.
2 changes: 1 addition & 1 deletion docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ The package file will be generated in ``omnibus/pkg/``:
.. code-block:: console

$ ls omnibus/pkg/*.deb
omnibus/pkg/twindb-backup_3.3.0-1_amd64.deb
omnibus/pkg/twindb-backup_3.6.0-1_amd64.deb

Once the package is built you can install it with rpm/dpkg or upload it to your repository
and install it with apt or yum.
Expand Down
92 changes: 91 additions & 1 deletion docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,23 @@ Personally, I added it to skip files ``.gitignore`` would ignore.
backup_dirs = /etc /root /home "/path/to/important files"
tar_options = --exclude-vcs-ignores

``server_name`` is an optional identifier that TwinDB Backup uses as the per-source
segment of the remote backup path and of the status file. When unset, it defaults
to the local hostname (``socket.gethostname()``), which produces one backup tree
per host. When multiple replicas of a MySQL cluster back up to the same
destination, set ``server_name`` to a cluster-wide identifier so every replica
writes into a single shared path instead of a per-hostname fan-out. On Azure
Blob destinations, setting ``server_name`` also enables a blob-lease-based
single-writer gate so only one replica runs a given backup cycle at a time.

.. code-block:: ini

[source]

backup_dirs = /etc /root /home
backup_mysql = yes
server_name = prod-primary-db


Backup Destination
~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -92,16 +109,88 @@ In the ``[s3]`` section you specify Amazon credentials as well as an S3 bucket w
Azure Blob Storage
~~~~~~~~~~~~~~~~~~~~

In the ``[az]`` section you specify Azure credentials as well as Azure Blob Storage container where to store backups.
In the ``[az]`` section you specify Azure authentication as well as the Azure Blob Storage container where to store backups.

The default mode uses a storage connection string:

.. code-block:: ini

[az]

auth_mode = connection_string # optional, defaults to connection_string
connection_string = "DefaultEndpointsProtocol=https;AccountName=ACCOUNT_NAME;AccountKey=ACCOUNT_KEY;EndpointSuffix=core.windows.net"
container_name = twindb-backups
create_container_if_missing = true # optional, defaults to true
remote_path = /backups/mysql # optional
max_concurrency = 1 # optional

For Azure VMs, managed identity authentication is also supported:

.. code-block:: ini

[az]

auth_mode = managed_identity
account_url = "https://ACCOUNT_NAME.blob.core.windows.net"
container_name = twindb-backups
# Optional: target a specific user-assigned managed identity. Set at most ONE of:
# managed_identity_resource_id = "/subscriptions/.../userAssignedIdentities/NAME"
# managed_identity_client_id = "00000000-0000-0000-0000-000000000000"
# If neither is set the system-assigned managed identity is used (DefaultAzureCredential).
create_container_if_missing = false # optional, defaults to true
remote_path = /backups/mysql # optional
max_concurrency = 1 # optional

For Azure VM deployments, the recommended production setup is:

- Use one **user-assigned managed identity (UAMI) per workload role** and attach it
to every VM that fills that role. Keeping a single stable identity across VM
rebuilds is easier to reason about than per-VM system-assigned identities, and
it lets you scope RBAC to the exact container the workload writes to.
- Prefer ``managed_identity_resource_id`` over ``managed_identity_client_id``
when a VM has multiple UAMIs attached, or when you want the backup
configuration to be derivable from naming conventions instead of from a
Terraform output. The resource ID is the UAMI's full ARM ID, e.g.
``/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<name>``.
- Fall back to the VM's system-assigned managed identity (omit both
``managed_identity_*`` fields) when a single identity per VM is sufficient.
- Grant the identity the ``Storage Blob Data Contributor`` role scoped to the
target container. TwinDB currently reads, writes, lists, overwrites status
blobs, deletes old backups, and can optionally create the container, so it
still needs a role with blob read/write/delete plus container
read/write permissions.
- Prefer ``create_container_if_missing = false`` when infrastructure pre-creates
the container. This lets operations scope permissions to the existing
container or storage account instead of depending on first-run container
creation.

Use ``create_container_if_missing = false`` when the container should be pre-provisioned by infrastructure and the backup process should not attempt container creation.

Validation checklist for the managed identity rollout:

- From a developer workstation, validate the token-auth code path against an accessible non-production storage account by using ``account_url`` and Microsoft Entra credentials from ``DefaultAzureCredential``.
- From the target Azure VM, validate the production path with no ``connection_string`` configured. Confirm backup upload, list/read operations, status blob updates, and any retention deletes that remain enabled in phase 1.
- If the storage account uses network rules or private endpoints, run the validation from the VM or another allowed network path. Local validation may fail even when the identity and RBAC are correct.

For the separate immutable-storage follow-on, see ``docs/azure_worm_compatibility.rst``.

In the ``[az.client]`` section you specify optional Azure Blob Storage client options.

.. code-block:: ini

[az.client]

api_version = "2019-02-02"
secondary_hostname = "ACCOUNT_NAME-secondary.blob.core.windows.net"
max_block_size = 4194304
max_single_put_size = 67108864
min_large_block_upload_threshold = 4194305
use_byte_buffer = true
max_page_size = 4194304
max_single_get_size = 33554432
max_chunk_get_size = 4194304
audience = "https://storage.azure.com/"
connection_timeout = 20

Google Cloud Storage
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -151,6 +240,7 @@ The ``expire_log_days`` options specifies the retention period for MySQL binlogs
mysql_defaults_file = /etc/twindb/my.cnf
full_backup = daily
expire_log_days = 7
hostname = localhost # optional, defaults to 127.0.0.1

Backing up MySQL Binlog
-----------------------
Expand Down
2 changes: 1 addition & 1 deletion omnibus/config/projects/twindb-backup.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
# and /opt/twindb-backup on all other platforms
install_dir '/opt/twindb-backup'

build_version '3.3.0'
build_version '3.6.0'

build_iteration 1

Expand Down
1 change: 1 addition & 0 deletions requirements.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#@IgnoreInspection BashAddShebang
azure-core ~= 1.24
azure-identity
azure-storage-blob ~= 12.19
Click ~= 8.1
PyMySQL ~= 1.0
Expand Down
21 changes: 20 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
#
# This file is autogenerated by pip-compile with Python 3.8
# This file is autogenerated by pip-compile with Python 3.9
# by the following command:
#
# pip-compile --output-file=requirements.txt requirements.in
#
azure-core==1.31.0
# via
# -r requirements.in
# azure-identity
# azure-storage-blob
azure-identity==1.25.3
# via -r requirements.in
azure-storage-blob==12.23.0
# via -r requirements.in
bcrypt==4.2.0
Expand All @@ -34,8 +37,11 @@ click==8.1.7
# via -r requirements.in
cryptography==43.0.1
# via
# azure-identity
# azure-storage-blob
# msal
# paramiko
# pyjwt
datadog==0.50.1
# via -r requirements.in
google==3.0.0
Expand Down Expand Up @@ -67,6 +73,12 @@ jmespath==1.0.1
# via
# boto3
# botocore
msal==1.36.0
# via
# azure-identity
# msal-extensions
msal-extensions==1.3.1
# via azure-identity
paramiko==3.5.0
# via -r requirements.in
proto-plus==1.24.0
Expand All @@ -88,6 +100,10 @@ pyasn1-modules==0.4.1
# via google-auth
pycparser==2.22
# via cffi
pyjwt[crypto]==2.12.1
# via
# msal
# pyjwt
pymysql==1.1.1
# via -r requirements.in
pynacl==1.5.0
Expand All @@ -101,6 +117,7 @@ requests==2.31.0
# datadog
# google-api-core
# google-cloud-storage
# msal
rsa==4.9
# via google-auth
s3transfer==0.10.2
Expand All @@ -119,7 +136,9 @@ statsd-tags==3.2.1.post1
typing-extensions==4.12.2
# via
# azure-core
# azure-identity
# azure-storage-blob
# pyjwt
urllib3==1.26.20
# via
# botocore
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 3.3.0
current_version = 3.6.0
commit = True
tag = False

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

setup(
name="twindb-backup",
version="3.3.0",
version="3.6.0",
description="TwinDB Backup tool for files, MySQL et al.",
long_description=readme + "\n\n" + history,
author="TwinDB Development Team",
Expand Down
34 changes: 32 additions & 2 deletions support/twindb-backup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ backup_mysql=no
# When backing up files it might be useful to ignore what would .gitignore ignore.
# tar_options = --exclude-vcs-ignores --exclude-caches

# server_name overrides the per-source segment of the remote backup path
# (defaults to the local hostname). Set it to a cluster-wide identifier
# so every replica in a MySQL cluster writes to one path instead of one
# per host. Example:
# server_name = prod-primary-db

# Destination
[destination]
# backup destination can be ssh, s3, gcs
Expand Down Expand Up @@ -35,9 +41,33 @@ BUCKET=twindb-backups

# Azure destination settings

connection_string="DefaultEndpointsProtocol=https;AccountName=ACCOUNT_NAME;AccountKey=ACCOUNT_KEY;EndpointSuffix=core.windows.net"
# auth_mode="connection_string" # optional, defaults to connection_string
# connection_string="DefaultEndpointsProtocol=https;AccountName=ACCOUNT_NAME;AccountKey=ACCOUNT_KEY;EndpointSuffix=core.windows.net"
# account_url="https://ACCOUNT_NAME.blob.core.windows.net" # required for managed_identity auth
container_name=twindb-backups
# Set at most ONE of managed_identity_resource_id / managed_identity_client_id.
# If neither is set, the system-assigned identity is used via DefaultAzureCredential.
# managed_identity_resource_id="/subscriptions/.../providers/Microsoft.ManagedIdentity/userAssignedIdentities/NAME"
# managed_identity_client_id="00000000-0000-0000-0000-000000000000"
# create_container_if_missing=true # optional, defaults to true
#remote_path = /backups/mysql # optional
#max_concurrency = 1 # optional

[az.client]

# Azure client optional settings

# api_version="2019-02-02" # optional
# secondary_hostname="ACCOUNT_NAME-secondary.blob.core.windows.net" # optional
# max_block_size=4194304 # optional
# max_single_put_size=67108864 # optional
# min_large_block_upload_threshold=4194305 # optional
# use_byte_buffer=true # optional
# max_page_size=4194304 # optional
# max_single_get_size=33554432 # optional
# max_chunk_get_size=4194304 # optional
# audience="https://storage.azure.com/" # optional
# connection_timeout=20 # optional

[gcs]

Expand All @@ -60,8 +90,8 @@ ssh_key=/root/.ssh/id_rsa
# MySQL

mysql_defaults_file=/etc/twindb/my.cnf

full_backup=daily
#hostname=localhost # optional, defaults to 127.0.0.1

[retention]

Expand Down
6 changes: 5 additions & 1 deletion tests/unit/backup/test_backup_binlogs.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@
@mock.patch("twindb_backup.backup.osp")
def test_backup_binlogs_returns_if_no_binlogs(mock_osp, mock_save):
with mock.patch.object(MySQLClient, "variable", return_value=None):
backup_binlogs("foo", mock.Mock())
cfg = mock.Mock()
# server_name is used for the BinlogStatus status_directory; it must
# be a real string since it is joined with a filename.
cfg.server_name = "test-host"
backup_binlogs("foo", cfg)
assert mock_osp.dirname.call_count == 0
assert mock_save.call_count == 0
Loading
Loading