Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync replication and replication slot problem #41

Open
pbrugier opened this issue Oct 6, 2016 · 5 comments
Open

Sync replication and replication slot problem #41

pbrugier opened this issue Oct 6, 2016 · 5 comments

Comments

@pbrugier
Copy link

pbrugier commented Oct 6, 2016

Hello,

For one of our customer we have to use sync with synchronous_standby_names = '*' in postgresql.conf combine with replication slot in database.

There is a problem when one standby leave the cluster, replication slot stays in DB and PGSQL keeps its wal, due to the synchronous_standby_names in config, untill the standby come back. If the stanbdy is not present for a long time, wal directory will grow untill partition is full...

So do you think it'll be possible to manage replication slot with PAF RA ?

Regards,
Pascal.

@ioguix
Copy link
Member

ioguix commented Oct 7, 2016

Hello @pbrugier ,

This is something we have in mind yes. Not sure yet how we will implement it.

Do not hesitate to share here if you already have some idea about how this feature should be set, managed and work.

On my side, I was thinking about another RA dedicated to replication slot management.

Regards,

@pbrugier
Copy link
Author

Hello @ioguix,

I think the new feature should be in the same RA to avoid race condition between the two RA, I'll explain why.

For the same Customer, which has PGSQL Cluster in Azure Cloud, it was not possible to use VIP on nodes, 'cause of Azure. So we have to set NAT on PGSQL master node to let Azure load balancer detect the master and redirect slaves replications connections throught Azure load Balancer, strange I know, but the only solution. So, to set the NAT I've defined a second RA, but sometimes, master and slaves were demoted/promoted to fast and replication connections stayed in place and, some slaves were still connected to old master. To avoid this problem we have to patch your RA and insert NAT redirection (and conntrack cleaning) in it, not so good but for a really special case.

To came back to the new feature, if replication slot are managed by a specific RA, PAF RA should be aware of this kind of race condition.

Hope this can help you a little.

Regards

@vuntz
Copy link

vuntz commented Apr 4, 2017

Would it make sense to use the pre-start/post-stop notifications for slaves on the master to automatically create/drop replication slots? Of course it requires that max_replication_slots is configured correctly... But it feels like a fair requirement.

@sousaaguilherme
Copy link

Hi @pbrugier,

Can you please share your configuration?

I'm facing the same problem since Azure still doesn't have a VIP implementation.. I've tried to tinker a bit with the azure load balancer in order to block the probe on the slave with iptables so it would only point to the master (and in case of failover, block the probe in the master and allow it in the slave) but without any luck since I'm a huge rookie with pacemaker..

If anyone knows how to implement this in Azure (with or without the LB playing as VIP) please share 😀

Best regards

@YanChii
Copy link
Contributor

YanChii commented Apr 22, 2018

Hi @sousaaguilherme,

as I wrote in another issue, you might be able to use the cluster without a VIP with new pgsql 10 connection failover:
https://wiki.postgresql.org/wiki/New_in_postgres_10#Connection_Failover_and_Routing_in_libpq
The VIP itself is still a better solution but if you are unable to use it, the libpq failover is IMHO the second best option.

Jan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants