環(huán)境:
PostgreSQL 9.2.4
主機(jī):192.25.10.76
從機(jī):192.25.10.71
做postgresql的流復(fù)制主從時(shí),會(huì)遇到調(diào)整max_wal_sengers這個(gè)參數(shù),官方文檔對(duì)這個(gè)參數(shù)做了一個(gè)簡(jiǎn)要的說(shuō)明(9.2.4比早先版本多了幾句話并做了一些微調(diào)),但沒(méi)有實(shí)際的例子。
1.參數(shù)說(shuō)明:
Specifies the maximum number of concurrent connections from standby servers or streaming base backup clients (i.e., the maximum number of simultaneously running WAL sender processes). The default is zero, meaning replication is disabled. WAL sender processes count towards the total number of connections, so the parameter cannot be set higher than max_connections. This parameter can only be set at server start. wal_level must be set to archive or hot_standby to allow connections from standby servers.
也就是說(shuō),這個(gè)參數(shù)是在主機(jī)上設(shè)置的,是從機(jī)連接到主機(jī)的并發(fā)連接數(shù)之總和,所以這個(gè)參數(shù)是個(gè)正整型。默認(rèn)值是0,也即默認(rèn)沒(méi)有流復(fù)制功能。該并發(fā)連接數(shù)從進(jìn)程上看,就是各個(gè)wal sender進(jìn)程數(shù)之和,可以通過(guò)ps -ef|grep senders來(lái)查看,所以該值不能超過(guò)系統(tǒng)的最大連接數(shù)(max_connections,該BUG在9.1.5被修復(fù)),可以允許超過(guò)實(shí)際的流復(fù)制用戶數(shù)。該參數(shù)更改需要重啟DB,比如我只配了一個(gè)從機(jī):
[postgres@ndb2 database]$ ps -ef|grep sender
postgres 21257 21247 0 20:57 ? 00:00:00 postgres: wal sender process repuser 192.25.10.71(46161) streaming 0/4018ED8
postgres 22193 20949 0 23:02 pts/0 00:00:00 grep sender
2.異常情況
很多時(shí)候配置主從的時(shí)候會(huì)遺漏這個(gè)參數(shù),或者沒(méi)有設(shè)置正確(比如實(shí)際配的從機(jī)數(shù)超過(guò)設(shè)置的連接數(shù)),這個(gè)時(shí)候一般會(huì)報(bào)錯(cuò) number of requested standby connections exceeds max_wal_senders (currently X): 備機(jī)上顯示的日志異常:
2013-08-12 20:53:42.132 CST,,,8859,,5208dad6.229b,1,,2013-08-12 20:53:42 CST,,0,FATAL,XX000,"could not connect to the primary server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 0)
",,,,,,,,,""
2013-08-12 20:53:47.137 CST,,,8861,,5208dadb.229d,1,,2013-08-12 20:53:47 CST,,0,FATAL,XX000,"could not connect to the primary server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 0)
",,,,,,,,,""
2013-08-12 20:53:52.142 CST,,,8862,,5208dae0.229e,1,,2013-08-12 20:53:52 CST,,0,FATAL,XX000,"could not connect to the primary server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 0)
",,,,,,,,,""
2013-08-12 20:53:57.148 CST,,,8864,,5208dae5.22a0,1,,2013-08-12 20:53:57 CST,,0,FATAL,XX000,"could not connect to the primary server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 0)
",,,,,,,,,"
主機(jī)上顯示的日志異常:
receiver"
2013-08-12 20:43:26.937 CST,,,21064,"",5208d86e.5248,1,"",2013-08-12 20:43:26 CST,,0,LOG,00000,"connection received: host=192.25.10.71 port=46085",,,,,,,,,""
2013-08-12 20:43:26.938 CST,"repuser","",21064,"192.25.10.71:46085",5208d86e.5248,2,"authentication",2013-08-12 20:43:26 CST,2/1195,0,LOG,00000,"replication connection authorized: user=repuser",,,,,,,,,""
2013-08-12 20:43:26.938 CST,"repuser","",21064,"192.25.10.71:46085",5208d86e.5248,3,"startup",2013-08-12 20:43:26 CST,2/0,0,FATAL,53300,"number of requested standby connections exceeds max_wal_senders (currently 0)",,,,,,,,,"walreceiver"
2013-08-12 20:43:26.939 CST,"repuser","",21064,"192.25.10.71:46085",5208d86e.5248,4,"startup",2013-08-12 20:43:26 CST,,0,LOG,00000,"disconnection: session time: 0:00:00.002 user=repuser database= host=192.25.10.71 port=46085",,,,,,,,,"wa
lreceiver"
2013-08-12 20:43:41.513 CST,,,21066,"",5208d87d.524a,1,"",2013-08-12 20:43:41 CST,,0,LOG,00000,"connection received: host=192.25.10.71 port=46086",,,,,,,,,""
2013-08-12 20:43:41.514 CST,"repuser","",21066,"192.25.10.71:46086",5208d87d.524a,2,"authentication",2013-08-12 20:43:41 CST,2/1198,0,LOG,00000,"replication connection authorized: user=repuser",,,,,,,,,""
2013-08-12 20:43:41.514 CST,"repuser","",21066,"192.25.10.71:46086",5208d87d.524a,3,"startup",2013-08-12 20:43:41 CST,2/0,0,FATAL,53300,"number of requested standby connections exceeds max_wal_senders (currently 0)",,,,,,,,,"walreceiver"
2013-08-12 20:43:41.515 CST,"repuser","",21066,"192.25.10.71:46086",5208d87d.524a,4,"startup",2013-08-12 20:43:41 CST,,0,LOG,00000,"disconnection: session time: 0:00:00.002 user=repuser database= host=192.25.10.71 port=46086",,,,,,,,,"wa
lreceiver"
...
遇到如此問(wèn)題,則需要檢查postgresql.conf文件的max _wal_senders參數(shù)了。
3.參數(shù)生效
文檔上說(shuō)明需要重啟DB使之生效,如果動(dòng)態(tài)使之生效會(huì)報(bào)錯(cuò)如下:
[postgres@proxy1 ]$ psql
psql (9.2.4)
Type "help" for help.
postgres=# show max_wal_senders ;
max_wal_senders
-----------------
0
(1 row)
postgres=# set max_wal_senders=1;
ERROR: parameter "max_wal_senders" cannot be changed without restarting the server
postgres=#
補(bǔ)充: postgresql基于流復(fù)制 (streaming replication)的warm-standby
實(shí)例一枚:
Primary:
l 歸檔設(shè)置:
Wal_level=archive
Archive_mode=on
archive_command = 'cp -i %p /data/pgsql/archived_wal/%f'
l 流復(fù)制相關(guān)設(shè)置:
max_wal_senders = '10' #啟動(dòng)復(fù)制進(jìn)程數(shù)量限制,必須大于0
max_replication_slots = '10' #為使用replication slot,必須大于0;replication slot作用是保證wal沒(méi)有同步到standby之前不能從pg_xlog移走;
wal_keep_segments = '50' #指定pg_xlog中最少保留的wal數(shù)量
select pg_create_physical_replication_slot(‘gp1_a_slot'); #創(chuàng)建replication slot
select * from pg_replication_slots; #查詢創(chuàng)建的replication slot
l 編輯pg_hba.conf
# Allow replication connections from localhost, by a user with the replication privilege.
#host replication postgres 127.0.0.1/32 trust
#host replication postgres ::1/128 trust
local replication postgres trust
host replication postgres 192.168.12.0/24 trust
l 聯(lián)機(jī)備份過(guò)程(基礎(chǔ)備份)
#touch /var/lib/pgsql/backup_in_progress
$psql –c "select pg_start_backup('hot_backup');"
$tar -cf /var/lib/pgsql/backup.tar /var/lib/pgsql/data/
$psql -c "select pg_stop_backup();"
#rm /var/lib/pgsql/backup_in_progress
tar -rf /var/lib/pgsql/backup.tar /var/lib/pgsql/archive/ #打包歸檔
實(shí)例:
psql -c "select pg_start_backup('pgbk10');"
tar -zcf pgbk10.tgz data/
psql -c "select pg_stop_backup();"
Standby:
l 編輯recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.12.38 port=5666 user=postgres'
primary_slot_name='gp1_a_slot'
#restore_command = 'cp /data/pgsql/archived_wal/%f %p'
#archive_cleanup_command = 'pg_archivecleanup /data/pgsql/archived_wal %r'
l 將primary上的基礎(chǔ)備份傳輸?shù)絪tandby上
$scp primary: /var/lib/pgsql/backup.tar .
解壓備份到standby上的$PGDATA
l 啟動(dòng)standby
啟動(dòng)standby后,postgres開(kāi)始從primary上接收wal日志進(jìn)行恢復(fù),并且一直保持恢復(fù)狀態(tài),psql不能登錄;
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教。
您可能感興趣的文章:- PostgreSQL 邏輯復(fù)制 配置操作
- postgresql流復(fù)制原理以及流復(fù)制和邏輯復(fù)制的區(qū)別說(shuō)明
- PostgreSQL 流復(fù)制異步轉(zhuǎn)同步的操作
- Postgresql 檢查數(shù)據(jù)庫(kù)主從復(fù)制進(jìn)度的操作
- CentOS PostgreSQL 12 主從復(fù)制(主從切換)操作