-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FAQ section of README.md updated #172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Some notes and a better description of the "synchronous" handling status.
### Does stolon use Consul as a DNS server as well? | ||
|
||
Consul (or etcd) is used only as a key-value storage. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't completely get the meaning of this question. Do you mean registering a service in consul? If so which service (the proxies?)? I don't see why stolon should do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consul has a concept of services (API like curl -XPUT -d @req.json http://10.0.3.223:8500/v1/agent/service/register
). These services are available not only via REST API (like curl -s http://10.0.3.224:8500/v1/catalog/service/postgresql-replica
) but also via DNS which is build-in into Consul:
$ dig @127.0.0.1 -p 8600 postgresql-replica.service.consul
;; QUESTION SECTION:
;postgresql-replica.service.consul. IN A
;; ANSWER SECTION:
postgresql-replica.service.consul. 0 IN A 10.0.3.223
postgresql-replica.service.consul. 0 IN A 10.0.3.224
You can also request an SRV record, in this case you will also receive port numbers:
$ dig srv @127.0.0.1 -p 8600 postgresql-replica.service.consul
;; QUESTION SECTION:
;postgresql-replica.service.consul. IN SRV
;; ANSWER SECTION:
postgresql-replica.service.consul. 0 IN SRV 1 1 5432
postgresql-slave-2.node.dc1.consul.
postgresql-replica.service.consul. 0 IN SRV 1 1 5432
postgresql-slave.node.dc1.consul.
;; ADDITIONAL SECTION:
postgresql-slave-2.node.dc1.consul. 0 IN A 10.0.3.224
postgresql-slave.node.dc1.consul. 0 IN A 10.0.3.223
This is very convenient for applications that are not aware of Consul. Everything you need - is to use a proper DNS without any caching (Consul's TTL is 0) and use domain names like current-postgresql-master.service.consul
and currrent-standby-3.service.consul
. No proxy is required. Naturally when you promote a standby you better close all client's connections so clients will be aware that something changed (e.g. SELECT pg_is_in_recovery();
) - see http://stackoverflow.com/a/5408501/1565238
So basically the question is whether a client can determine where are current master and standbys using DNS protocol.
> Specifies a comma-separated list of standby names that can support synchronous replication, as described in Section 25.2.8. At any one time there will be at most one active synchronous standby; transactions waiting for commit will be allowed to proceed after this standby server confirms receipt of their data. The synchronous standby will be the first standby named in this list that is both currently connected and streaming data in real-time (as shown by a state of streaming in the pg\_stat\_replication view). Other standby servers appearing later in this list represent potential synchronous standbys. | ||
|
||
It means that in case of netsplit synchronous standby can be not among majority nodes. In this case some recent changes will be lost. Although it's not a major problem for most web projects, currently you shouldn't use stolon for storing data that under no circumstances can't be lost. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't add the concept of quorum here since it creates more confusion. Also the postgres doc doesn't talk about "quorum".
In addition the real problem here is not a netsplit (this is just one of the possible causes) but the fact that we let postgres choose the active synchronous standby, so the sentinel cannot know what was the active synchronous standby when the master was declared as dead. So the unique ways the sentinel has is to find the "best" standby based on the last know xlog position. But if both the master and the active synchronous standby goes down at the same time another standby will be choosed and it cannot be in full sync.
I opened #173 with a description and a solution to this. This will work also with postgresql <= 9.5 but with the limitation of setting only one sync standby. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I've realized that the answer here is not entirely true anyway. If cluster size is 3 then master + one synchronous replica make a quorum, so in this case data can't be lost. I'll rewrite this.
#173 looks good to me. To determine which version of PostgreSQL is running is simple, and knowing that we know what to write to postgresql.conf if user would like to have a real consistency.
|
||
Currently the proxy redirects all requests to the master. There is a [feature request](https://github.com/sorintlab/stolon/issues/132) for using the proxy also for standbys but it's low in the priority list. There is a workaround though. | ||
|
||
Application can learn cluster configuration from `stolon/cluster/mycluster/clusterdata` key. Consul allows to subscribe to updates of this key like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add an example (also if this is going to change with #160) on what to do with that data (ie get the clusterview.keeperole infos)
The real problem with this that it's not assured that the standby are in sync, dead or else without more logic.
Currently the proxy redirects all requests to the master. There is a [feature request](https://github.com/sorintlab/stolon/issues/132) for using the proxy also for standbys but it's low in the priority list. There is a workaround though. | ||
|
||
Application can learn cluster configuration from `stolon/cluster/mycluster/clusterdata` key. Consul allows to subscribe to updates of this key like this: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also available with etcd. Not sure if this should be detailed but could just say that one can use the watching features of the store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll fix this.
|
||
### Lets say I have multiple stolon clusters. Do I need a separate Consul / etcd cluster for each stolon cluster? | ||
|
||
It depends on your architecture and where the different stolon clusters are located. In general, if two clusters live on complitely different hardware, to to handle all possible courner cases (like netslits) you need a separate Consul / etcd cluster for each stolon cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- complitely -> completely
- courner -> corner
- netslits -> netsplits
After #219 I'll open a PR to update these FAQs and change them due to the new implemented features. |
Merge and update sorintlab#164 and sorintlab#172.
Merge and update sorintlab#164 and sorintlab#172.
Merge and update sorintlab#164 and sorintlab#172.
Reworked in #224. |
Merge and update sorintlab#164 and sorintlab#172.
Merge and update sorintlab#164 and sorintlab#172.
Merge and update sorintlab#164 and sorintlab#172.
Merge and update sorintlab#164 and sorintlab#172.
Based on discussion: #168 (comment)