# "[ERROR] WSREP: Will never receive state. Need to abort." occurs when trying to add a mariadb node to an existing Galera cluster.
# Issue
Adding a mariadb node to an existing Galera cluster fails. The adding mariadb log is as follows:
Dec  5 10:01:59 c mysqld: 2023-12-05 10:01:59 1 [Note] WSREP: IST receiver addr using tcp://c.uedasoft.com:4568
Dec  5 10:01:59 c mysqld: 2023-12-05 10:01:59 1 [Warning] WSREP: Failed to prepare for incremental state transfer: Failed to open IST listener at tcp://c.uedasoft.com:4568', asio error 'Failed to listen: bind: Cannot assign requested address: 99 (Cannot assign requested address)
Dec  5 10:01:59 c mysqld: #011 at /home/buildbot/buildbot/build/galerautils/src/gu_asio_stream_react.cpp:listen():850': 99 (Cannot assign requested address)
Dec  5 10:01:59 c mysqld: #011 at /home/buildbot/buildbot/build/galera/src/ist.cpp:prepare():331. IST will be unavailable.
Dec  5 10:01:59 c mysqld: 2023-12-05 10:01:59 0 [Note] WSREP: Member 1.0 (c) requested state transfer from '*any*'. Selected 0.0 (parsifal)(SYNCED) as donor.
Dec  5 10:01:59 c mysqld: 2023-12-05 10:01:59 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 84180)
Dec  5 10:01:59 c mysqld: 2023-12-05 10:01:59 1 [Note] WSREP: Requesting state transfer: success, donor: 0
Dec  5 10:01:59 c mysqld: 2023-12-05 10:01:59 1 [Note] WSREP: Resetting GCache seqno map due to different histories.
Dec  5 10:01:59 c mysqld: 2023-12-05 10:01:59 1 [Note] WSREP: GCache history reset: 6d4a0464-620e-11eb-9e95-32f29f45a3ef:0 -> 6d4a0464-620e-11eb-9e95-32f29f45a3ef:84180
Dec  5 10:01:59 c mysqld: 2023-12-05 10:01:59 0 [Warning] WSREP: 0.0 (parsifal): State transfer to 1.0 (c) failed: -42 (No message of desired type)
Dec  5 10:01:59 c mysqld: 2023-12-05 10:01:59 0 [ERROR] WSREP: /home/buildbot/buildbot/build/gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1207: Will never receive state. Need to abort.
# Log analysis
The c is the name of the host that I'm trying to start additional maria db, and c.uedahost.com is the FQDN, Fully Qualified Domain Name which means hostname + domain name of the host c mentioned above. The parsifal, a title role of the Wagner, is also host name that part of running Galera cluster, in this case it works as Donor node of this state transfer process.
The log above clearly says that:
- 2nd line: Failed to open IST listener at tcp://c.uedasoft.com:4568'
- 3rd line: Cannot assign requested address.
- 4th line: IST will be unavailable.
So the node gives up to take place the IST process, then start SST process, as result:
- 10th line: The state transfer process from parsifal(donor) to c(additional mariadb) is failed with unkown error code -41.
- 11th line: Need to abort.
# port 4568
It's interesting, it seems that the root cause seems that the port 4568 was blocked or occupied. The node c is a instance of Scaleway then the "Security Group" of it is All Accept for both inbound and outbound, regardless of whether things are good or bad. ^^;; Just in case I've also confirmed iptables it's also ALL Accept in precautionless.
So to check the possibility that port 4568 is occupied by another process, I'v confirmed lsof -i :4568 but result is also negative, there are no processes occupying port 4568.
I read above log again and noticed that it pointed out the additional mariadb node fails to listen on the host c.uedasoft.com:4568. So I've tried to listen on c.uedasoft.com:4568. The result is
ueda@c:/usr/local/mysql$ nc -l c.uedasoft.com 4568
nc: Cannot assign requested address
Bingo! I can't listen for the c.uedasoft.com:4568. But why can' I? There is no firewall nor occupying another process. One more thing I've tried that:
ueda@c:/usr/local/mysql$ nc -l localhost 4568
With localhost I can it. I have no idea but I've changed the setting of wsrep_node_address from "c.uedasoft.com" to "localhost". The result is slightly changed as follows:
Dec  5 11:34:00 c mysqld: 2023-12-05 11:34:00 1 [Note] WSREP: IST receiver addr using tcp://localhost:4568
Dec  5 11:34:00 c mysqld: 2023-12-05 11:34:00 1 [Note] WSREP: Prepared IST receiver for 0-84184, listening at: tcp://127.0.0.1:4568
Dec  5 11:34:00 c mysqld: 2023-12-05 11:34:00 0 [Note] WSREP: Member 1.0 (c) requested state transfer from '*any*'. Selected 0.0 (parsifal)(SYNCED) as donor.
Dec  5 11:34:00 c mysqld: 2023-12-05 11:34:00 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 84184)
Dec  5 11:34:00 c mysqld: 2023-12-05 11:34:00 1 [Note] WSREP: Requesting state transfer: success, donor: 0
Dec  5 11:34:00 c mysqld: 2023-12-05 11:34:00 1 [Note] WSREP: Resetting GCache seqno map due to different histories.
Dec  5 11:34:00 c mysqld: 2023-12-05 11:34:00 1 [Note] WSREP: GCache history reset: 6d4a0464-620e-11eb-9e95-32f29f45a3ef:0 -> 6d4a0464-620e-11eb-9e95-32f29f45a3ef:84184
Dec  5 11:34:00 c mysqld: 2023-12-05 11:34:00 0 [Warning] WSREP: 0.0 (parsifal): State transfer to 1.0 (c) failed: -111 (Connection refused)
Dec  5 11:34:00 c mysqld: 2023-12-05 11:34:00 0 [ERROR] WSREP: /home/buildbot/buildbot/build/gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1207: Will never receive state. Need to abort.
IST is passed but SST still fails. Aha, it's impossible to connect to local host from outer host(parsifal).
# Succeeded by removing node_address setting
I guess it may be work to use its IP address directory. But I was tired of specifying its IP address, which node c should naturally know, so I tried commenting out this setting instead and watching the result.
#wsrep_node_address="localhost"
Working well! The third node was successfully added to the cluster.
| wsrep_cluster_size            | 3