Sync replication and replication slot problem #41

pbrugier · 2016-10-06T08:21:27Z

Hello,

For one of our customer we have to use sync with synchronous_standby_names = '*' in postgresql.conf combine with replication slot in database.

There is a problem when one standby leave the cluster, replication slot stays in DB and PGSQL keeps its wal, due to the synchronous_standby_names in config, untill the standby come back. If the stanbdy is not present for a long time, wal directory will grow untill partition is full...

So do you think it'll be possible to manage replication slot with PAF RA ?

Regards,
Pascal.

The text was updated successfully, but these errors were encountered:

ioguix · 2016-10-07T16:36:11Z

Hello @pbrugier ,

This is something we have in mind yes. Not sure yet how we will implement it.

Do not hesitate to share here if you already have some idea about how this feature should be set, managed and work.

On my side, I was thinking about another RA dedicated to replication slot management.

Regards,

pbrugier · 2016-10-10T08:45:28Z

Hello @ioguix,

I think the new feature should be in the same RA to avoid race condition between the two RA, I'll explain why.

For the same Customer, which has PGSQL Cluster in Azure Cloud, it was not possible to use VIP on nodes, 'cause of Azure. So we have to set NAT on PGSQL master node to let Azure load balancer detect the master and redirect slaves replications connections throught Azure load Balancer, strange I know, but the only solution. So, to set the NAT I've defined a second RA, but sometimes, master and slaves were demoted/promoted to fast and replication connections stayed in place and, some slaves were still connected to old master. To avoid this problem we have to patch your RA and insert NAT redirection (and conntrack cleaning) in it, not so good but for a really special case.

To came back to the new feature, if replication slot are managed by a specific RA, PAF RA should be aware of this kind of race condition.

Hope this can help you a little.

Regards

vuntz · 2017-04-04T09:10:20Z

Would it make sense to use the pre-start/post-stop notifications for slaves on the master to automatically create/drop replication slots? Of course it requires that max_replication_slots is configured correctly... But it feels like a fair requirement.

sousaaguilherme · 2018-04-20T15:11:01Z

Hi @pbrugier,

Can you please share your configuration?

I'm facing the same problem since Azure still doesn't have a VIP implementation.. I've tried to tinker a bit with the azure load balancer in order to block the probe on the slave with iptables so it would only point to the master (and in case of failover, block the probe in the master and allow it in the slave) but without any luck since I'm a huge rookie with pacemaker..

If anyone knows how to implement this in Azure (with or without the LB playing as VIP) please share 😀

Best regards

YanChii · 2018-04-22T20:58:25Z

Hi @sousaaguilherme,

as I wrote in another issue, you might be able to use the cluster without a VIP with new pgsql 10 connection failover:
https://wiki.postgresql.org/wiki/New_in_postgres_10#Connection_Failover_and_Routing_in_libpq
The VIP itself is still a better solution but if you are unable to use it, the libpq failover is IMHO the second best option.

Jan

ioguix added the enhancement label Oct 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync replication and replication slot problem #41

Sync replication and replication slot problem #41

pbrugier commented Oct 6, 2016

ioguix commented Oct 7, 2016

pbrugier commented Oct 10, 2016

vuntz commented Apr 4, 2017

sousaaguilherme commented Apr 20, 2018

YanChii commented Apr 22, 2018

Sync replication and replication slot problem #41

Sync replication and replication slot problem #41

Comments

pbrugier commented Oct 6, 2016

ioguix commented Oct 7, 2016

pbrugier commented Oct 10, 2016

vuntz commented Apr 4, 2017

sousaaguilherme commented Apr 20, 2018

YanChii commented Apr 22, 2018