pg_hardstorage 入门
pg_hardstorage是一款基于 PostgreSQL **复制协议的持续 WAL 流式**传输备份工具。在生产环境中需要并行运行两个进程一个进程wal stream 持续接收来自 PostgreSQL 的 WAL 日志并将每个已完成的 16 MiB 数据段提交到存储库backup另一个进程则按计划例如每晚创建基础备份后续的 WAL 流式传输会以此为基础进行滚动。每日备份 持续运行的 WAL 流式传输 PITR基于数据段的备份恢复至任何数据段对齐点。它通过物理复制协议备份您自己运行的 PostgreSQL 数据库裸机、虚拟机、容器、Patroni 集群以及 CloudNativePG 等运维工具对内容寻址数据块进行去重和加密并使用 PITR 进行恢复。支持 PG 15 和 Apache 2.0。以下从零开始搭建pg_hardstorage备份。------## 1. 安装### Pre-built binary发布版本以静态文件linux/{amd64,arm64}和tar 包的形式提供Windows 仅支持命令行界面。从[github.com/cybertec-postgresql/pg_hardstorage/releases](https://github.com/cybertec-postgresql/pg_hardstorage/releases)darwin/arm64 获取匹配的版本 验证签名然后将二进制文件放到您的目录中$PATHVERSION1.0.4 # latest release: https://github.com/cybertec-postgresql/pg_hardstorage/releases/latestcurl -LO https://github.com/cybertec-postgresql/pg_hardstorage/releases/download/v${VERSION}/pg_hardstorage_${VERSION}_linux_amd64.tar.gztar xzf pg_hardstorage_${VERSION}_linux_amd64.tar.gzsudo install -m 0755 pg_hardstorage /usr/local/bin/pg_hardstorage version### .debDebian / UbuntuVERSION1.0.4 # the release you downloadedsudo dpkg -i pg-hardstorage_${VERSION}_amd64.deb该软件包在 处安装二进制文件/usr/bin/pg_hardstorage在 处放置一个 systemd 单元/lib/systemd/system/pg_hardstorage.service并创建/etc/pg_hardstorage/其模式为 0750 /var/lib/pg_hardstorage/ /var/log/pg_hardstorage/所有者为pg-hardstorage。### .rpmFedora / RHEL / Rocky / AlmaVERSION1.0.4 # the release you downloadedsudo rpm -i pg-hardstorage-${VERSION}-1.x86_64.rpm### Container imageVERSION1.0.4 # latest release tagdocker pull ghcr.io/cybertec-postgresql/pg_hardstorage:v${VERSION}docker run --rm ghcr.io/cybertec-postgresql/pg_hardstorage:v${VERSION} version该镜像不包含发行版。请在指定位置挂载一个配置目录/etc/pg_hardstorage 并在指定位置挂载一个状态目录/var/lib/pg_hardstorage这两个目录都必须对用户 ID 65532 具有写入权限。### From sourcegit clone https://github.com/cybertec-postgresql/pg_hardstoragecd pg_hardstoragemake # produces bin/pg_hardstoragesudo install -m 0755 bin/pg_hardstorage /usr/local/bin/需要 Go 1.26.make test在竞争检测器下运行完整的单元测试套件make test-integration通过 testcontainers-go 测试真正的 PostgreSQL 17 容器需要 Docker。$ pg_hardstorage versionpg_hardstorage 1.0.4 (6bf20a6, built 2026-06-24T20:10:58Z)------## 2. 五分钟快速入门### 2.1 在 PostgreSQL 上配置复制用户CREATE ROLE pgbackup REPLICATION LOGIN PASSWORD pgbackup;添加一行pg_hba.conf代码允许代理主机以该角色进行复制host replication pgbackup 192.192.103.117/32 scram-sha-256或者host replication pgbackup 0.0.0.0/0 scram-sha-256重新加载 PGSELECT pg_reload_conf()。### 2.2 创建存储库pg_hardstorage repo init file:///postgresql/backup/#提前创建本地目录 /postgresql/backuppostgres用户需要赋予目录读写权限chmod 700仓库是一个目录或 S3 对象存储其中包含代码块、清单和 WAL。一个仓库可以包含多个部署。该命令repo init对 URL 是幂等的——针对现有仓库重新运行命令会返回 conflict.repo_exists(exit 7)。S3 的工作原理相同pg_hardstorage repo init s3://acme-backups/?regioneu-central-1其他后端使用相同的格式——选择与存储相匹配的 URL 方案| 后端 | 示例网址 || ---------------------------- | --------------------------------------------------- || 本地文件系统 | file:///postgresql/backup || AWS S3 / MinIO / R2 / B2 | s3://acme-backups/?regioneu-central-1 || Google Cloud Storage | gcs://acme-backups/ || Azure Blob | azblob://account.blob.core.windows.net/container/ || 通过 SSHSFTP远程主机 | sftp://backupnas.example.com/srv/backups || 通过 SSH (ssh-exec) 远程主机 | scp://backupnas.example.com/srv/backups |sftp://两者scp://都使用 SSHsftp://默认情况下选择 SSHscp://当远程服务器禁用 SFTP 子系统时选择 SSH。有关身份验证/已知主机/额外映射设置的相关信息请参阅 [“添加 SFTP 存储库”](https://github.com/cybertec-postgresql/pg_hardstorage/blob/main/docs/how-to/adding/repository-sftp.md) 和[“添加 SCP 存储库”](https://github.com/cybertec-postgresql/pg_hardstorage/blob/main/docs/how-to/adding/repository-scp.md) 。### 2.3 验证 PG 是否已准备好进行备份恢复一次性预检wal stream每次启动时都会自动运行预检程序但您可以先单独运行它以确认源 PostgreSQL 满足复制要求然后再配置 systemdpg_hardstorage wal pg_hardstorage wal preflight db1 --pg-connection \ postgres://pgbackup:pgbackup192.192.103.117:5432/postgres输出如下{deployment: db1,role: pgbackup,pg_version_num: 180003,findings: [{severity: info,code: max_slot_wal_keep_size.unbounded,message: max_slot_wal_keep_size -1 (PG default); the slot will retain WAL until the streamer reconnects, even if pg_wal/ fills the partition,suggestion: pair with a disk-free alert on the partition holding pg_wal/ AND a streamer-lag alert; or set max_slot_wal_keep_size to bound the slot — see docs/how-to/operating/slot-disk-safety.md for the full trade-off,observed: -1},{severity: info,code: wal_keep_size,message: wal_keep_size 0 (informational; the slot is the primary retention guarantee),observed: 0}]}致命错误wal_level.too_low会导致命令以非零值退出并且每个错误都会阻塞程序。警告 在 PG 17 及以上级别会显示出来max_replication_slots.full 但不会阻塞程序。max_wal_senders.saturatedrole.no_replicationsuggestion:max_slot_wal_keep_size.setidle_replication_slot_timeout.set### 2.4 启动 WAL 流式传输这是始终开启的核心完成预检清理后启动 WAL 流。这是其主要功能pg_hardstorage也是需要 24*7 全天候运行的进程pg_hardstorage wal stream db1 \--pg-connection postgres://pgbackup:pgbackup192.192.103.117:5432/postgres \--repo file:///postgresql/backup输出如下08:26:18 [INFO ] wal.preflight.max_slot_wal_keep_size.unboundedbody: {message: max_slot_wal_keep_size -1 (PG default); the slot will retain WAL until the streamer reconnects, even if pg_wal/ fills the partition,observed: -1,required: ,suggestion: pair with a disk-free alert on the partition holding pg_wal/ AND a streamer-lag alert; or set max_slot_wal_keep_size to bound the slot — see docs/how-to/operating/slot-disk-safety.md for the full trade-off}08:26:18 [INFO ] wal.preflight.wal_keep_sizebody: {message: wal_keep_size 0 (informational; the slot is the primary retention guarantee),observed: 0,required: ,suggestion: }08:26:19 [INFO ] wal.stream.starting deploymentdb1 timeline1body: {attempt: 1,compression: ,encryption: false,resume_strategy: fresh-slot-restart-lsn,slot: pg_hardstorage_db1,start_lsn: 0/85000000,status_interval: 10s,system_id: 7639282777750685031}08:43:04 [WARNING] wal.stream.reconnectingbody: {attempt: 1,backoff: 1s,error: streaming: inactivity timeout (no message from server) (after 5m0s),reason: stream_break,synced_lsn: 0/87000000}08:43:05 [INFO ] wal.stream.starting deploymentdb1 timeline1body: {attempt: 2,compression: ,encryption: false,last_lsn_in_repo: 0/87000000,resume_strategy: resume-from-repo,slot: pg_hardstorage_db1,start_lsn: 0/87000000,status_interval: 10s,system_id: 7639282777750685031}08:48:05 [WARNING] wal.stream.reconnectingbody: {attempt: 2,backoff: 1s,error: streaming: inactivity timeout (no message from server) (after 5m0s),reason: stream_break,synced_lsn: 0/0}08:48:06 [INFO ] wal.stream.starting deploymentdb1 timeline1body: {attempt: 3,compression: ,encryption: false,last_lsn_in_repo: 0/87000000,resume_strategy: resume-from-repo,slot: pg_hardstorage_db1,start_lsn: 0/87000000,status_interval: 10s,system_id: 7639282777750685031}...................当前该进程是前台运行如果终止则服务停止可以转为后台运行如下方法1使用nohup临时/轻量级nohup pg_hardstorage wal stream db1 \--pg-connection postgres://pgbackup:pgbackup192.192.103.117:5432/postgres \--repo file:///postgresql/backup \ /postgresql/backup/wal_stream.log 21 方法2创建systemd服务生产环境最佳实践1. 创建服务文件使用 root 用户或 sudo 权限创建 /etc/systemd/system/pg_hardstorage-wal.service[Unit]DescriptionPostgreSQL Hardstorage WAL Stream for db1Afternetwork.target postgresql.service[Service]TypesimpleUserpostgresGrouppostgresExecStart/usr/bin/pg_hardstorage wal stream db1 \--pg-connection postgres://pgbackup:pgbackup192.192.103.117:5432/postgres \--repo file:///postgresql/backupRestartalwaysRestartSec5# 限制日志输出防止占满系统盘StandardOutputjournalStandardErrorjournal[Install]WantedBymulti-user.target2. 启动并启用服务# 重新加载 systemd 配置sudo systemctl daemon-reload# 启动服务sudo systemctl start pg_hardstorage-wal# 设置开机自启sudo systemctl enable pg_hardstorage-wal3. 查看日志与状态# 查看运行状态sudo systemctl status pg_hardstorage-wal# 实时查看流传输日志journalctl -u pg_hardstorage-wal -fCREATE_REPLICATION_SLOT pg_hardstorage_db1 PHYSICAL RESERVE_WAL如果槽位不存在代理会发出请求——在创建时立即RESERVE_WAL锁定槽位restart_lsn因此 PG 从那时起就保留 WAL 日志。然后它会 START_REPLICATION SLOT pg_hardstorage_db1 PHYSICAL针对该槽位发出请求。代理重启后数据流不会中断。可以使用 systemd或容器调度器来监控它。--skip-preflight如果已经审核过 PG则此方法是显式覆盖--no-slot对于仅用于归档的部署此方法是显式逃生舱口它通过另一种机制保证 WAL 保留两者都会发出响亮的警告——使用其中任何一种都是有意为之。让它继续运行。剩余步骤将在另一个终端或单独的调度程序下运行。### 2.5 采取第一次基础备份在streamer进程并发运行的情况下进行基础备份。这两个进程共享存储库 URL但除此之外没有其他协调——backup streamer进程BASE_BACKUP通过自身的复制连接传输数据同时wal stream继续发送 WAL 日志。向导程序会探测 PG生成签名密钥对和 KEK写入数据 pg_hardstorage.yaml并默认情况下进行第一次备份pg_hardstorage init \--pg-connection postgres://pgbackup:pgbackup192.192.103.117:5432/postgres \--repo file:///postgresql/backup \--deployment db1 \--yes输出内容如下08:30:38 [INFO ] init.probe.startingbody: {deployment: db1}08:30:38 [NOTICE] init.probe.okbody: {pg_version: 18,system_id: 7639282777750685031,timeline: 1}08:30:38 [NOTICE] init.repo.readybody: {repo_id: 9bc618d1f252be3ba8f3c6422d3c3c55,url: file:///postgresql/backup}08:30:39 [NOTICE] init.kek.readybody: {generated: true,path: /home/postgres/.config/pg_hardstorage/keyring/kek.bin}08:30:39 [NOTICE] init.config.writtenbody: {deployment: db1}08:30:39 [INFO ] init.backup.startingbody: {deployment: db1}08:43:58 [NOTICE] init.backup.okbody: {backup_id: db1.full.20260625T083040Z.68ac,duration_ms: 799160}✓ pg_hardstorage initializedDeployment: db1PostgreSQL: 18Cluster ID: 7639282777750685031Timeline: 1Repository: file:///postgresql/backupConfig: /home/postgres/.config/pg_hardstorage/pg_hardstorage.yamlKeyring: /home/postgres/.config/pg_hardstorage/keyringEncryption: AES-256-GCM (KEK generated at /home/postgres/.config/pg_hardstorage/keyring/kek.bin)First backup: db1.full.20260625T083040Z.68aclogical: 1.3 GiBduration: 799160msNext steps:1. Start the agent (drives scheduled backups retention):pg_hardstorage agent2. Continuously archive WAL for PITR:pg_hardstorage wal stream db1 --pg-connection ... --repo file:///postgresql/backup3. Inspect deployment health:pg_hardstorage doctor db1要稍后进行备份而无需通过向导这是调度程序每晚运行的命令pg_hardstorage backup db1 \--pg-connection postgres://pgbackup:pgbackup192.192.103.117:5432/postgres \--repo file:///postgresql/backup输出内容如下08:48:41 [INFO ] backup.dedup_hints_loaded deploymentdb1body: {hint_chunks: 20698}08:48:42 [INFO ] backup.pg_probedbody: {pg_version: 18,raw: 18.3}08:48:42 [INFO ] backup.identifiedbody: {system_id: 7639282777750685031,timeline: 1,xlogpos: 0/870001A0}08:48:42 [INFO ] backup.started tenantdefault deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd508:48:57 [INFO ] backup.stream_completebody: {bytes_received: 1422521710,messages: 46080,tablespaces: 1}08:49:03 [WARNING] backup.essential_files_unchecked deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5body: {error: backup: probe config locations: pg: read data_directory: ERROR: permission denied to examine \data_directory\ (SQLSTATE 42501),hint: could not query data_directory / config_file from source PG; skipping the pre-commit essential-files check}08:49:05 [INFO ] backup.committed tenantdefault deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5 timeline1 lsn0/88000158body: {chunks_deduped: 21160,chunks_written: 2,dedup_hit_rate: 0.9999054909743881,duration_ms: 24044,file_count: 1782,logical_bytes: 1421272834,primary_key: manifests/db1/backups/db1.full.20260625T084842Z.2dd5/manifest.json,total_chunk_refs: 21162,unique_chunk_bytes: 1409386943,unique_chunk_count: 20698}✓ Backup committedID: db1.full.20260625T084842Z.2dd5Deployment: db1PostgreSQL: 18Cluster ID: 7639282777750685031Stop LSN / TLI: 0/88000158 / 1Files: 1782 in 1 tablespace(s)Logical bytes: 1.3 GiBUnique chunks: 20698 (1.3 GiB after dedup)Dedup ratio: 1.01xDuration: 24044 msEncryption: AES-256-GCM (per-backup DEK, wrapped under local KEK)Manifest: manifests/db1/backups/db1.full.20260625T084842Z.2dd5/manifest.json生产环境中调度backupcron / systemd 定时器 / Kubernetes CronJob监控wal streamsystemd / Kubernetes Deployment。基础备份是周期性锚点流式传输器确保 PITR 在锚点之间达到字节级精确同步。### 2.6 恢复pg_hardstorage restore db1 latest \--target /postgresql/restored \--repo file:///postgresql/backup输出内容如下08:50:18 [INFO ] restore.manifest_loaded tenantdefault deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5 timeline1 lsn0/88000158body: {file_count: 1782,pg_version: 18,tablespaces: 1}08:50:18 [INFO ] restore.started deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5body: {resumed: false,target: /postgresql/restored}09:00:47 [INFO ] restore.verifybackup_ok deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5body: {algorithm: CRC32C,bytes_hashed: 1421273066,files_checked: 1783}09:00:49 [INFO ] restore.auto_recovery_armed deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5body: {recovery_target: immediate,restore_command: wired,signal: standby.signal}PITR 通过自然语言时间--恢复预览[postgrespg117 ~]$ pg_hardstorage restore db1 latest --target /postgresql/restored --repo file:///postgresql/backup --to 5 minutes ago --previewRestore plan (preview only — no files written)Backup: db1.full.20260626T071735Z.b2beDeployment: db1Target: /postgresql/restoredPostgreSQL: 18Cluster ID: 7639282777750685031Backup stop LSN: 0/A2000158 (TLI 1)Recovery target: time 2026-06-29T07:40:33.510854269Z (inclusivetrue)On target reached: pauseRecovery TLI: latestFiles: 1787Total bytes: 1.3 GiBChunk refs: 21166 (20702 unique, 1.3 GiB after dedup)backup_label: 232 bytesEstimated RTO: 13554 ms (assuming 100.0 MiB/s)Pre-flight: ✗ 1 issue(s)--开始恢复pg_hardstorage restore db1 latest \--target /postgresql/restored5 \--repo file:///postgresql/backup \--to 5 minutes ago输出内容如下09:03:36 [INFO ] restore.manifest_loaded tenantdefault deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5 timeline1 lsn0/88000158body: {file_count: 1782,pg_version: 18,tablespaces: 1}09:03:36 [INFO ] restore.started deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5body: {resumed: false,target: /postgresql/restored5}09:14:19 [INFO ] restore.verifybackup_ok deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5body: {algorithm: CRC32C,bytes_hashed: 1421273066,files_checked: 1783}09:14:20 [INFO ] restore.recovery_armed deploymentdb1 backup_iddb1.full.20260625T084842Z.2dd5body: {action: pause,inclusive: true,target_lsn: ,target_name: ,target_time: 2026-06-25T08:58:36Z,timeline: latest}或者恢复到特定的 LSN。--to-lsn 0/3000028或者恢复到指定的还原点--to-name pre_release。恢复操作会写入一个托管块recovery.signal并在 postgresql.auto.conf其中restore_command调用 pg_hardstorage wal fetch deployment %f %p --repo ...。启动 PG恢复过程继续进行。pg_verifybackup在恢复操作声明成功之前会先对数据目录进行检查。只有在您清楚自己在做什么的情况下才跳过此检查——--verifyskip退出代码表示验证程序判定为“否”。### 2.7 备份状态查看[postgrespg117 ~]$ pg_hardstorage status db1DEPLOYMENT BACKUPS HEALTH LATEST AGE STOP LSN TLI STREAMSdb1 5 ✓ db1.full.20260626T071735Z.b2be 3d 0/A2000158 1 -Audit anchor: ⚠ none (6 event(s) un-anchored — run pg_hardstorage audit anchor)Pending approvals: 0[postgrespg117 ~]$ pg_hardstorage list db1Backups for db1 (5):BACKUP ID TYPE WHEN FILES SIZE DEDUP DURATIONdb1.full.20260626T071735Z.b2be full 2026-06-26 07:17 1787 1.3 GiB 1.01x 10805ms db1.full.20260625T094450Z.9826 full 2026-06-25 09:45 1785 1.3 GiB 1.01x 18790ms db1.full.20260625T092448Z.25f6 full 2026-06-25 09:25 1782 1.3 GiB 1.01x 13424ms db1.full.20260625T084842Z.2dd5 full 2026-06-25 08:48 1782 1.3 GiB 1.01x 15683ms db1.full.20260625T083040Z.68ac full 2026-06-25 08:31 1782 1.3 GiB 1.01x 27398ms[postgrespg117 ~]$ pg_hardstorage show db1 db1.full.20260625T084842Z.2dd5Backup db1.full.20260625T084842Z.2dd5 Deployment db1Type fullPostgreSQL 18Cluster ID 7639282777750685031Start LSN 0/88000028Stop LSN 0/88000158Timeline 1Started 2026-06-25 08:48:42 UTCStopped 2026-06-25 08:48:57 UTCDuration 15683 msCompression zstdTablespaces 1oid0Files 1782Logical bytes 1.3 GiBUnique chunks 20698 (1.3 GiB)Dedup ratio 1.01xbackup_label 232 bytesSignature ed25519 / fingerprint 25aeb62195860b7bstatus返回 RPO当前时间 - 最近一次备份完成时间、WAL 延迟和下次计划运行时间。show转储完整清单包括 LSN 范围、时间线、去重率、加密信封详细信息以及已写入的验证记录。### 2.8 留存率默认策略deployments:db1:retention:policy: gfskeep_daily: 7keep_weekly: 4keep_monthly: 12keep_yearly: 5[postgrespg117 ~]$ pg_hardstorage rotate db1Rotation plan (dry-run — no manifests soft-deleted)Policy: gfsdb1keep: 2delete: 3[keep] db1.full.20260626T071735Z.b2be 2026-06-26T07:17:46Z (daily-1,weekly-1,monthly-1,yearly-1) [keep] db1.full.20260625T094450Z.9826 2026-06-25T09:45:08Z (daily-2) [del ] db1.full.20260625T092448Z.25f6 2026-06-25T09:25:02Z [del ] db1.full.20260625T084842Z.2dd5 2026-06-25T08:48:57Z [del ] db1.full.20260625T083040Z.68ac 2026-06-25T08:31:07Z------## 3. 验证安装doctor是单命令检查“是否存在任何问题”$ pg_hardstorage doctorMode: userPATHSConfig /home/postgres/.config/pg_hardstorage [xdg] exists drwx------drop-in /home/postgres/.config/pg_hardstorage/conf.d [derived] missingdeployments /home/postgres/.config/pg_hardstorage/deployments [derived] missingsinks /home/postgres/.config/pg_hardstorage/sinks [derived] missingskills /home/postgres/.config/pg_hardstorage/skills [derived] missingkeyring /home/postgres/.config/pg_hardstorage/keyring [derived] exists drwx------State /home/postgres/.local/share/pg_hardstorage [xdg] missingbookkeeping /home/postgres/.local/share/pg_hardstorage/bookkeeping [derived] missinginflight /home/postgres/.local/share/pg_hardstorage/inflight [derived] missingcrashes /home/postgres/.local/share/pg_hardstorage/crashes [derived] missingCache /home/postgres/.cache/pg_hardstorage [xdg] missingLogs /home/postgres/.local/state/pg_hardstorage [xdg] missingRuntime /run/user/1001/pg_hardstorage [xdg] missingShared data /home/postgres/.local/share/pg_hardstorage/share [xdg] missingCONFIGStatus: configuredSchema: pg_hardstorage.config.v1Deployments: 1 (db1)db1 class: internalSinks: 0Source files:[loaded ] /home/postgres/.config/pg_hardstorage/pg_hardstorage.yamlKEYSTOREDir: /home/postgres/.config/pg_hardstorage/keyringSigning key: ✓ presentKEK: ✓ present (encryption ON by default for new backups)AIRGAPMode: offREPOSfile:///postgresql/backup — reachableaudit chain: 6 event(s)anchor: ✗ NONE (chain has events, no transparency anchor)ISSUES[WARNING] audit.anchor_missing: doctor: 6 audit event(s) at file:///postgresql/backup but no transparency-log anchor; run pg_hardstorage audit anchorhint: run pg_hardstorage audit anchor --repo url (or wire periodic anchoring via the agents audit_anchor schedule)任何一行✗都是一个需要修正的建议。运行 pg_hardstorage doctor -o json生成机器可读格式。doctor正常时退出代码为 0--exit-on-issues发现问题时退出代码为 10 — 如果想要一个硬性失败信号可将其连接到警报系统中。