Postgres: Clean up stopped replication slot

When a slot is inactive, that means that Murty and his team have stopped the replication on their side. The SQL below will identify inactive slots. We should schedule this script in a job , running every hour.

(1)

SELECT

slot_name,

pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS size,

active,

slot_type

FROM pg_replication_slots;

(2)

Once you identify the replication slot, the job should drop it with the following command:

SELECT pg_drop_replication_slot ('pg_replication_slot_name_from_above');

(3)

Restart the database instance to clean up unwanted disk space.

Notes from Google:-

Thank you for contacting Google Cloud Support. My name is Jagadeesh and I will be assisting you with this issue.

I understand that WAL logs are expanding to over 1.5TB and you want to know what is causing WAL logs to grow and you want to reduce it. Please confirm if I have understood this correctly.

I have inspected your project “np-storeapps-1-d490” Cloud SQL PostgreSQL instance “pg-cusfit-cftest” observed that WAL log has consumed 1.44 TB of your disk and noticed Point-in-time recovery is disabled.

Upon further checking I noticed that there is one inactive logical replication slot for your instance “pg-cusfit-cftest”. Please note that a slot that is not actively used to track data causes PostgreSQL to hold onto WAL segments indefinitely, causing the disk space to grow indefinitely. Refer to the documentation [1] for more information and you can verify inactive slots by running the following query [2].

If the inactive slot is no longer needed you may consider dropping the slot by running the query [3] to remove WAL segments.

I hope the above provided information may resolve your issue. Additionally please let me know if you have any further questions, I will be happy to help you.

Please note that my shift hours are 5:00 AM to 2:00 PM IST (UTC+5:30). At any point if you prefer to keep this case in a particular timezone for your effective engagement, feel free to let us know and we can align the case in the respective time zone.

Looking forward to your response.

Best regards,

Jagadeesh,

Google Cloud Support, Bangalore.

Working Hours : [5:00 AM to 2:00 PM IST (UTC+5:30)]

[1] https://cloud.google.com/sql/docs/postgres/replication/configure-logical-replication#postgresql-resources

[2] SELECT

slot_name,

pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS size,

active,

slot_type

FROM pg_replication_slots;

[3] SELECT pg_drop_replication_slot('slot_name');

---

second response:

Hello Gillian,

Thank you for your response.

Please allow me to answer your questions in the following manner :

Q. Do you know the root cause? Why did the account stchadmin cause the WAL logs to grow uncontrollably? What can we do to avoid this going forward?

When using logical replication, PostgreSQL keeps track of what changes (WAL segments) have been sent to each subscriber and makes sure to not discard those changes until they have been sent to the subscriber. If replication stops for whatever reason, PostgreSQL will keep around all WAL segments going forward, leading to unending growth in WAL usage.
Sometimes, replication slots can leak. There are a lot of reasons this can happen and some of them include:

A. Interrupted actions (such as killing a copy)

B. Sudden instance death

C. Replica is deleted while master is doing some other operation

It is possible to determine the cause of the following symptoms:

Sudden large disk usage
a large growing write ahead log

As informed you in the earlier response on 30th March we noticed that there was one inactive logical replication slot for the instance “pg-cusfit-cftest”. As mentioned in the documentation [1] A slot that's not actively used to track data causes PostgreSQL to hold onto WAL segments indefinitely, causing the disk space to grow indefinitely. Most tools built on logical decoding create and drop replication slots automatically. Unused replication slots can be detected by querying the pg_replication_slots [2] system view and filtering on the active column. Unused slots can be dropped to remove WAL segments using the pg_drop_replication_slot [3] command .

Going forward in this situation you can verify inactive slots by running the following query [4] either you may resume logical replication, or, if replication no longer needs to happen, you need to drop the replication slot, at which point PostgreSQL will realize it can discard all the old WAL segments.

I hope this information is helpful for you. Let me know if you have any further queries, I will be happy to help you.

Looking forward to your response.

Best regards,

Jagadeesh,

Google Cloud Support, Bangalore.

Working Hours : [5:00 AM to 2:00 PM IST (UTC+5:30)]

[1] https://cloud.google.com/sql/docs/postgres/replication/configure-logical-replication#postgresql-resources

[2] https://www.postgresql.org/docs/current/view-pg-replication-slots.html

[3] https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-REPLICATION

[4]

SELECT

slot_name,

pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS size,

active,

slot_type

FROM pg_replication_slots;

Search This Blog

Groovy DBA

Postgres: Clean up stopped replication slot

Comments

Post a Comment

Popular posts from this blog

MySQL: To show privileges/ grants for a user

DB2: To check table status in the event of SQLCODE=-668, SQLSTATE=57007

Oracle: To clean up WRI$_ADV_OBJECTS