Discussion:
Setting up CTDB on OCFS2 and VMs ...
(too old to reply)
Richard Sharpe
2014-12-04 18:08:36 UTC
Permalink
Hi folks,

Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.

1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.

2. Because you will need a shared disk, create one:

vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.

Note, for that second command use the UUID for your disk which you can
find with:

vboxmanage list hdds --brief

Also, use the GUI to add the shared disk to both VMs.

3. Install the OS on each of the VMs.

4. I installed a bunch of clustering RPMs next:

yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel

It is not clear to me that openais was needed, for example

5. Next I installed Oracles UEK and ocfs2-tools

wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first

echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each

6. Configure cman and pacemaker

# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf

# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0

# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405

# Copy that file to the other node.
scp /etc/corosync/corosync.conf ***@172.16.170.6:/etc/corosync

/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here

yum install ccs pcs

# Create a cluster

ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2

# Copy the cluster config file to the other node:
scp /etc/cluster/cluster.conf ***@172.16.170.6:/etc/cluster

#Now turn off NetworkManager:
chkconfig NetworkManager off
service NetworkManager stop

# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start

# Also start it on the other node(s).

# Now check the status:
[***@ocfs2-1 ~]# crm_mon -1
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured


Online: [ ocfs2-1 ocfs2-2 ]

If you do not see all the other nodes online, then you have to debug
the problem.

These are essentially the steps from here:
http://clusterlabs.org/quickstart-redhat.html

7. Configure the Oracle cluster

o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo ocfs2-2

service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#

8. Find and install the ocfs2-tools git repos

git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools

# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel

# Now build
cd ocfs2-tools
./configure
make

# This will likely fail. If it first fails complaining about
xml/tree.h then you can do the following:
CPPFLAGS='-I/usr/include/libxml2' ./configure
make

# It might complain again complaining about some AIS include files that are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
# so copy it where it needs to be:

cp ocfs2_controld.cman /usr/sbin/
scp ocfs2_controld.cman ***@172.16.170.6:/usr/sbin/

# Now stop those two you started and start everything:

service pacemaker stop
service cman stop

service cman start
service o2cb start
service pacemaker start

8. Create the shared shared file system on one node:

mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb

9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.

10. Install ctdb and samba

11. Configure samba for the domain you want to join

# Make sure you have clustering = yes and the other things you need.

12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you disable winbindd

13. Start ctdb on all nodes

# You must have ctdb started so that the secrets file will get distributed

14. join the domain

15. Enable winbindd in the ctdb config

16. Restart ctdb on all nodes

At this point you should be done. The steps you need might vary.

I have limited time to help you with this.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Michael Adam
2014-12-04 21:38:04 UTC
Permalink
Hi Richard,

whithout having tried myself, this looks like a
really good set of instructions.

IMHO they deserve a place on the wiki for future
reference and elaboration!

Cheers - Michael
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
...
Scott Lovenberg
2014-12-05 03:55:24 UTC
Permalink
Post by Michael Adam
Hi Richard,
whithout having tried myself, this looks like a
really good set of instructions.
IMHO they deserve a place on the wiki for future
reference and elaboration!
Cheers - Michael
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
...
/Gah! Was the top post a mistake or is my email client trying to be
"helpful"?/
Agreed! Thanks, Richard. If you guys don't mind waiting a day or four, I
can verify the steps and add to the wiki. Richard's already went above and
beyond.
--
Peace and Blessings,
-Scott.
Rowland Penny
2014-12-06 10:58:02 UTC
Permalink
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files that are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you disable winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos, I also used the standard Debian kernel. I have got up to step 9,
after a bit of a battle and it all started so well. :-)

Most of the required packages are available from the repos:

apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker

Unfortunately, it turned out that pacemaker is not built to use the cman
stack, so I had to rebuild it

Next problem, ccs and pcs are not available, so I had to download &
build them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec

Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.

Started cman, o2cb and pacemaker (first time round, this is where I
found that pacemaker wouldn't work with cman)

I then created the shared shared file system and mounted it on both nodes

At this point I have a shared cluster, but in a way that I cannot see
any sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and think
about this and see if there is a Debian way of doing this, without
modifying anything or using anything that is not available from a repo.

Rowland
Richard Sharpe
2014-12-06 15:24:50 UTC
Permalink
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of Centos,
I also used the standard Debian kernel. I have got up to step 9, after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download & build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both nodes
OK, looks like you got the real stuff done.
At this point I have a shared cluster, but in a way that I cannot see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and think about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.

However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.

Not sure if I have the time to fix anything.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Rowland Penny
2014-12-06 15:33:51 UTC
Permalink
Post by Richard Sharpe
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of Centos,
I also used the standard Debian kernel. I have got up to step 9, after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download & build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both nodes
OK, looks like you got the real stuff done.
At this point I have a shared cluster, but in a way that I cannot see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and think about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
OK, I am getting closer :-)

I have got it working with just packages available from Debian repos,
apart from 'ccs', once I find a replacement for this, I will move onto
ctdb & samba.

Rowland
Post by Richard Sharpe
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
steve
2014-12-06 17:34:43 UTC
Permalink
Post by Richard Sharpe
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of Centos,
I also used the standard Debian kernel. I have got up to step 9, after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download & build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both nodes
OK, looks like you got the real stuff done.
At this point I have a shared cluster, but in a way that I cannot see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and think about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
Hi
We have ocfs2 working fine with ctdb on both Ubuntu and openSUSE using
only the packages that they supply. We have been in production on the
latter since September. On real hardware. There is extensive step by
step documentation for both distros:
openSUSE
http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html

Ubuntu (tested only on vms)
http://linuxcostablanca.blogspot.com.es/2014/08/ubuntu-samba4-cluster-ctdb-ocfs2-drbd.html

HTH
Rowland Penny
2014-12-06 17:40:20 UTC
Permalink
Post by steve
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Post by Rowland Penny
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo
ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo
ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files
that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you
disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get
distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos,
I also used the standard Debian kernel. I have got up to step 9,
after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the
cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download &
build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I
found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both
nodes
OK, looks like you got the real stuff done.
Post by Rowland Penny
At this point I have a shared cluster, but in a way that I cannot
see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and
think about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
Hi
We have ocfs2 working fine with ctdb on both Ubuntu and openSUSE using
only the packages that they supply. We have been in production on the
latter since September. On real hardware. There is extensive step by
openSUSE
http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html
Ubuntu (tested only on vms)
http://linuxcostablanca.blogspot.com.es/2014/08/ubuntu-samba4-cluster-ctdb-ocfs2-drbd.html
HTH
Yes, I know, but I am trying to do it without DRBD.

Rowland
Richard Sharpe
2014-12-06 18:05:20 UTC
Permalink
Post by Rowland Penny
Post by steve
Post by Richard Sharpe
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
Hi
We have ocfs2 working fine with ctdb on both Ubuntu and openSUSE using
only the packages that they supply. We have been in production on the latter
since September. On real hardware. There is extensive step by step
openSUSE
http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html
Ubuntu (tested only on vms)
http://linuxcostablanca.blogspot.com.es/2014/08/ubuntu-samba4-cluster-ctdb-ocfs2-drbd.html
HTH
Yes, I know, but I am trying to do it without DRBD.
What they have is not true clustering. It is simple HA. You really do
not need CTDB for that.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-06 18:05:26 UTC
Permalink
Post by Rowland Penny
Yes, I know, but I am trying to do it without DRBD.
Rowland
Not sure of your use case but it's not let us down all term and we've
really thrown everything we could at it. It's serving 80 real boxes from
real hardware too.

We've had only 1 sb (after a power cut) and drbd messaged me. It's then
just a case of choosing your node.
steve
2014-12-06 18:10:14 UTC
Permalink
Post by Richard Sharpe
Post by Rowland Penny
Post by steve
Post by Richard Sharpe
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
Hi
We have ocfs2 working fine with ctdb on both Ubuntu and openSUSE using
only the packages that they supply. We have been in production on the latter
since September. On real hardware. There is extensive step by step
openSUSE
http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html
Ubuntu (tested only on vms)
http://linuxcostablanca.blogspot.com.es/2014/08/ubuntu-samba4-cluster-ctdb-ocfs2-drbd.html
HTH
Yes, I know, but I am trying to do it without DRBD.
What they have is not true clustering. It is simple HA. You really do
not need CTDB for that.
How else do you do IP failover for Samba?
ronnie sahlberg
2014-12-06 19:03:42 UTC
Permalink
Steve, please. You are confused on what CTDB is.


What you are describing and you use is basically a primitive
Active/Passive failover pair where you have to manually trigger
failover by disabling/enabling one of the two CTDB nodes.
For such Active/Passive failover pairs there are heaps of really good
solutions. Use one of those solutions, not CTDB, if all you want is a
two-node active/passive failover pair.
CTDB is not a good solution for your use case.


What Richard and Rowland are aiming for is a proper multi-node cluster
with CTDB and OCFS2 for the use case that CTDB is designed for, i.e.
an All Active cluster where ALL nodes are active simultaneously and
serving data.
The whole point of ctdb is to NOT have active/passive and do failover
but instead a cluster where ALL nodes are active simultaneously.
In such a cluster there is no longer any failovers in the traditional
sense but merely a shuffle of ip addresses when nodes enter/leave the
active cluster.

For proper CTDB use in a production environment DRDB is not a viable solution.
Not sure of your use case but it's not let us down all term and we've really thrown everything we could at it. It's serving 80 real boxes from real hardware too.
That is not exactly enterprise. People using CTDB use clusters usually
ranging from 4 to 16 nodes, serving tens of thousands of simultaneous
clients.
I would say that while you CAN build a 2 node CTDB cluster, that is
probably not a great idea for a production environment. I think that 4
nodes is probably the smallest cluster you should use.
Feel free to disagree.
We've had only 1 sb (after a power cut) and drbd messaged me. It's then just a case of choosing your node.
That is not how CTDB is supposed to be used. If a node fails, CTDB is
supposed to automatically redistribute all traffic across the
surviging nodes. No human intervention required.
Seems you have a single point of failure since you need to manually intervene.




Please do not misrepresent what ctdb is or how it should be used. It
is a disservice to the mailing list readers that come here for
information and may form a broken idea of how/what ctdb is.


regards
ronnie sahlberg
Post by Rowland Penny
Yes, I know, but I am trying to do it without DRBD.
Rowland
Not sure of your use case but it's not let us down all term and we've really
thrown everything we could at it. It's serving 80 real boxes from real
hardware too.
We've had only 1 sb (after a power cut) and drbd messaged me. It's then just
a case of choosing your node.
steve
2014-12-06 19:17:02 UTC
Permalink
Post by ronnie sahlberg
Steve, please. You are confused on what CTDB is.
LOL! We use it in production.
Post by ronnie sahlberg
What you are describing and you use is basically a primitive
Active/Passive failover pair where you have to manually trigger
failover by disabling/enabling one of the two CTDB nodes.
No it isn't. We have 2 nodes active at the same time. The failure of
either node is invisible to the users. None of this needs manual
intervention.

Maybe you are the one who is confused? Please read our documentation.
Rowland Penny
2014-12-06 19:31:50 UTC
Permalink
Post by steve
Post by ronnie sahlberg
Steve, please. You are confused on what CTDB is.
LOL! We use it in production.
Post by ronnie sahlberg
What you are describing and you use is basically a primitive
Active/Passive failover pair where you have to manually trigger
failover by disabling/enabling one of the two CTDB nodes.
No it isn't. We have 2 nodes active at the same time. The failure of
either node is invisible to the users. None of this needs manual
intervention.
Maybe you are the one who is confused? Please read our documentation.
Maybe you are using it in production, but that doesn't mean that you are
using it in the way you think you are. I would also recommend you to
moderate the way you write, you are coming across as a know-it-all
adolescent, which you probably are.

Rowland
steve
2014-12-06 19:59:37 UTC
Permalink
Post by Rowland Penny
Maybe you are using it in production, but that doesn't mean that you are
using it in the way you think you are. I would also recommend you to
moderate the way you write, you are coming across as a know-it-all
adolescent, which you probably are.
Rowland
Adolescents. Never work children eh? But know it all, no. What I do know
is that a poster has stated something that is not only wrong, but
misleading. I then correct him.

Now please let us talk about Setting up CTDB on OCFS2 and VMs ...

One way to set up an active:active 2 node cluster is to use ocfs2, drbd
and ctdb. We have documented it and 3 independent domains are using it.
It is available now. It has been tested in the field since September.
That's what we offer. We thought we'd mention it. It may save you a lot
of time getting your cluster into production. Take it or leave it. But
please do not misrepresent it.
Richard Sharpe
2014-12-06 23:01:56 UTC
Permalink
Post by Rowland Penny
Maybe you are using it in production, but that doesn't mean that you are
using it in the way you think you are. I would also recommend you to
moderate the way you write, you are coming across as a know-it-all
adolescent, which you probably are.
Rowland
Adolescents. Never work children eh? But know it all, no. What I do know is
that a poster has stated something that is not only wrong, but misleading. I
then correct him.
Now please let us talk about Setting up CTDB on OCFS2 and VMs ...
One way to set up an active:active 2 node cluster is to use ocfs2, drbd and
ctdb. We have documented it and 3 independent domains are using it. It is
available now. It has been tested in the field since September. That's what
we offer. We thought we'd mention it. It may save you a lot of time getting
your cluster into production. Take it or leave it. But please do not
misrepresent it.
Actually, you seem to not understand what DRBD provides. What you seem
to be describing is not an ACTIVE/ACTIVE cluster using CTDB because it
does not provide a key feature that CTDB requires.

Perhaps if I get a chance next week I will try repro it to see what
sort of a mish-mash you have produced.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-06 23:09:16 UTC
Permalink
Post by Richard Sharpe
Post by Rowland Penny
Maybe you are using it in production, but that doesn't mean that you are
using it in the way you think you are. I would also recommend you to
moderate the way you write, you are coming across as a know-it-all
adolescent, which you probably are.
Rowland
Adolescents. Never work children eh? But know it all, no. What I do know is
that a poster has stated something that is not only wrong, but misleading. I
then correct him.
Now please let us talk about Setting up CTDB on OCFS2 and VMs ...
One way to set up an active:active 2 node cluster is to use ocfs2, drbd and
ctdb. We have documented it and 3 independent domains are using it. It is
available now. It has been tested in the field since September. That's what
we offer. We thought we'd mention it. It may save you a lot of time getting
your cluster into production. Take it or leave it. But please do not
misrepresent it.
Actually, you seem to not understand what DRBD provides. What you seem
to be describing is not an ACTIVE/ACTIVE cluster using CTDB because it
does not provide a key feature that CTDB requires.
Perhaps if I get a chance next week I will try repro it to see what
sort of a mish-mash you have produced.
A mish-mash which works:) Now start documenting.
Michael Adam
2014-12-06 23:48:56 UTC
Permalink
Post by steve
Now please let us talk about Setting up CTDB on OCFS2 and VMs ...
One way to set up an active:active 2 node cluster is to use ocfs2, drbd and
ctdb. We have documented it and 3 independent domains are using it. It is
available now. It has been tested in the field since September. That's what
we offer. We thought we'd mention it. It may save you a lot of time getting
your cluster into production. Take it or leave it. But please do not
misrepresent it.
From reading your various posts here on the list,
which were pretty vague, one could get the impression that

you are using drbd in parallel/addition to ocfs2
to provide a split brain prevention mechanism
as a substitute for the recovery lock file
placed on ocfs2.

When asked, you never detailed how you accomplished
this (to use drbd for split brain), though.
(Neither did you elaborate in which way the recovery
lock file did not work in your setup.)

Now I read your blog post, and it sheds some light...
I am not an expert of ocfs2 or drbd, and you don't
really explain it but merely give the instructions
(and do some amount of bashing of available documentation..).
But here is what I understand from your post:

- You use drbd to replicate the block storage for ocfs
between the two nodes.
- You format the drbd block device with ocfs2.
This gives you a clustered ocfs2 active on both nodes.
- You configure ctdb on the two node cluster
with mangement of public addresses and Samba,
but *without* split brain protection.
- Samba is run in all-active clustered mode using ctdb
for databases and failover and the ocfs2 for shares.

So the important bit is that in your case ctdb
is running unprotected from split brain.
The only reference to split brain is a notification
of user steve in case drbd detects a split brain.
If I get it right (there are no details about this
in the blog post), this means that until user steve
reacts to that notification the ctdb/samba cluster
runs happily in the split brain situation and
corrupts the users' data.

Note: notification of a split brain does
not imply protection of the damages it can do!
So the question is really: what apart from
notifying steve is your cluster doing when
a split brain occurs?


One more comment to your blog post:

According to your instructions, you call "net ads join"
before starting ctdb. This can not work. net ads join
needs ctdb running to operate, because it needs to
store the join information in the clustered secrets.tdb
handled by ctdb.

A procedure I'd recommend is this:

- configure ctdb without MANAGES_SAMBA and MANAGES_WINBIND
- start ctdb
- do net ads join
- configure ctdb to use MANAGES_SAMBA and MANAGES_WINBIND
- restart ctdb


Cheers - Michael
Michael Adam
2014-12-07 00:04:02 UTC
Permalink
Post by Richard Sharpe
Post by steve
Now please let us talk about Setting up CTDB on OCFS2 and VMs ...
One way to set up an active:active 2 node cluster is to use ocfs2, drbd and
ctdb. We have documented it and 3 independent domains are using it. It is
available now. It has been tested in the field since September. That's what
we offer. We thought we'd mention it. It may save you a lot of time getting
your cluster into production. Take it or leave it. But please do not
misrepresent it.
Actually, you seem to not understand what DRBD provides. What you seem
to be describing is not an ACTIVE/ACTIVE cluster using CTDB because it
does not provide a key feature that CTDB requires.
The blog post does in fact describe an all-active samba+ctdb
cluster with ocfs2 as a cluster file system, and drbd replicating
ofs2's storage, but without ctdb using a recovery lock for split
brain prevention.

See my other mail for details.

Michael
Michael Adam
2014-12-07 00:21:37 UTC
Permalink
Post by Michael Adam
So the important bit is that in your case ctdb
is running unprotected from split brain.
The only reference to split brain is a notification
of user steve in case drbd detects a split brain.
If I get it right (there are no details about this
in the blog post), this means that until user steve
reacts to that notification the ctdb/samba cluster
runs happily in the split brain situation and
corrupts the users' data.
Ok, maybe it is not quite as bad. The config snippet

net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}

Which is explained to some extent in

http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html

seems to indicate that in case of split brain
certain measures are potentially taken.

Also read the explanations about DRBD split brain here:

http://www.drbd.org/users-guide/s-split-brain-notification-and-recovery.html

This states that DRDB split brain is different from
cluster split brain (also called cluster partition).

So I'd really like to know what happens in your
setup in a split brain situation.

Cheers - Michael
Chan Min Wai
2014-12-07 12:42:21 UTC
Permalink
Dear Michael,

Thank for the mail with explanation.

I'm using CTDB with OCFS2 and without Drbd. (worst then Steve situation I believes)

But base on you explanation I briefly understand the risk and also what am I missing.

I really believes that I've the CTDB running by accident.

As per your description due to the locking problem I run CTDB in one mode and join the AD.

Thank for Steven suggestion I turn off locking and everything work.

So I'm in that situation.

I'll try to add in peacemaker and cman in future and change Gentoo wiki guide.

I'm really wondering why is with all the screaming for and now I know why.

In Steve and mine situation the splits brain issue on CTDB are not really well handled, we totally dependent on the recovery script on that.

But Drbd would have handle the ocfs2 issue. (As for me who share one network storage over 2 server that might not be an issue.).

It will be a big issue if we are using CTDB with PDC or classical domain. But as a member server, I'm not too sure how much problem we will have as all the information coming from AD DC.

Thank for the explanation again.

Regard,
Min Wai
Post by Michael Adam
Post by steve
Now please let us talk about Setting up CTDB on OCFS2 and VMs ...
One way to set up an active:active 2 node cluster is to use ocfs2, drbd and
ctdb. We have documented it and 3 independent domains are using it. It is
available now. It has been tested in the field since September. That's what
we offer. We thought we'd mention it. It may save you a lot of time getting
your cluster into production. Take it or leave it. But please do not
misrepresent it.
From reading your various posts here on the list,
which were pretty vague, one could get the impression that
you are using drbd in parallel/addition to ocfs2
to provide a split brain prevention mechanism
as a substitute for the recovery lock file
placed on ocfs2.
When asked, you never detailed how you accomplished
this (to use drbd for split brain), though.
(Neither did you elaborate in which way the recovery
lock file did not work in your setup.)
Now I read your blog post, and it sheds some light...
I am not an expert of ocfs2 or drbd, and you don't
really explain it but merely give the instructions
(and do some amount of bashing of available documentation..).
- You use drbd to replicate the block storage for ocfs
between the two nodes.
- You format the drbd block device with ocfs2.
This gives you a clustered ocfs2 active on both nodes.
- You configure ctdb on the two node cluster
with mangement of public addresses and Samba,
but *without* split brain protection.
- Samba is run in all-active clustered mode using ctdb
for databases and failover and the ocfs2 for shares.
So the important bit is that in your case ctdb
is running unprotected from split brain.
The only reference to split brain is a notification
of user steve in case drbd detects a split brain.
If I get it right (there are no details about this
in the blog post), this means that until user steve
reacts to that notification the ctdb/samba cluster
runs happily in the split brain situation and
corrupts the users' data.
Note: notification of a split brain does
not imply protection of the damages it can do!
So the question is really: what apart from
notifying steve is your cluster doing when
a split brain occurs?
According to your instructions, you call "net ads join"
before starting ctdb. This can not work. net ads join
needs ctdb running to operate, because it needs to
store the join information in the clustered secrets.tdb
handled by ctdb.
- configure ctdb without MANAGES_SAMBA and MANAGES_WINBIND
- start ctdb
- do net ads join
- configure ctdb to use MANAGES_SAMBA and MANAGES_WINBIND
- restart ctdb
Cheers - Michael
Richard Sharpe
2014-12-07 13:27:55 UTC
Permalink
Post by Michael Adam
Post by Michael Adam
So the important bit is that in your case ctdb
is running unprotected from split brain.
The only reference to split brain is a notification
of user steve in case drbd detects a split brain.
If I get it right (there are no details about this
in the blog post), this means that until user steve
reacts to that notification the ctdb/samba cluster
runs happily in the split brain situation and
corrupts the users' data.
Ok, maybe it is not quite as bad. The config snippet
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
Which is explained to some extent in
http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html
seems to indicate that in case of split brain
certain measures are potentially taken.
http://www.drbd.org/users-guide/s-split-brain-notification-and-recovery.html
This states that DRDB split brain is different from
cluster split brain (also called cluster partition).
So I'd really like to know what happens in your
setup in a split brain situation.
Well, it turns out that drbd has this thing called dual-master mode,
which turns it into shared storage for two nodes only.

So, as long as the OCFS2 DLM is also running, there should not be any
split-brain events.

Making sure that the DLM was running was why I put so much effort into
getting the ocfs2-tools code running.

The disadvantage of using DRBD is that you cannot run more than a
2-node cluster.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-08 10:35:55 UTC
Permalink
Post by steve
On Sat, Dec 6, 2014 at 9:40 AM, Rowland Penny
Post by Rowland Penny
Post by steve
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
Hi
We have ocfs2 working fine with ctdb on both Ubuntu and openSUSE using
only the packages that they supply. We have been in production on
the latter
since September. On real hardware. There is extensive step by step
openSUSE
http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html
Ubuntu (tested only on vms)
http://linuxcostablanca.blogspot.com.es/2014/08/ubuntu-samba4-cluster-ctdb-ocfs2-drbd.html
HTH
Yes, I know, but I am trying to do it without DRBD.
What they have is not true clustering. It is simple HA. You really do
not need CTDB for that.
How else do you do IP failover for Samba?
Well?
Min Wai Chan
2014-12-08 12:02:42 UTC
Permalink
Which meant that

We are running CTDB with not their initial design...

And out of the design specification :)

And not sure what will happen.

That would be a nice way to put it...
On Sat, Dec 6, 2014 at 9:40 AM, Rowland Penny
Post by Rowland Penny
Post by steve
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
Hi
We have ocfs2 working fine with ctdb on both Ubuntu and openSUSE using
only the packages that they supply. We have been in production on
the latter
since September. On real hardware. There is extensive step by step
openSUSE
http://linuxcostablanca.blogspot.com.es/2014/07/
samba4-cluster-for-ad-drbd-ocfs2-ctdb.html
Ubuntu (tested only on vms)
http://linuxcostablanca.blogspot.com.es/2014/08/
ubuntu-samba4-cluster-ctdb-ocfs2-drbd.html
HTH
Yes, I know, but I am trying to do it without DRBD.
What they have is not true clustering. It is simple HA. You really do
not need CTDB for that.
How else do you do IP failover for Samba?
Well?
steve
2014-12-08 13:54:32 UTC
Permalink
Post by Min Wai Chan
Which meant that
We are running CTDB with not their initial design...
And out of the design specification :)
And not sure what will happen.
That would be a nice way to put it...
So you've cluster samba without ctdb? What do you use for the IP takover
(or shuffle)?
Post by Min Wai Chan
On Sat, Dec 6, 2014 at 9:40 AM, Rowland Penny
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Indeed, I cannot imagine anyone using the
approach I used for
production as well. Having stuff that needs to
be rebuilt is not a
good idea, and it would be useful to get a
minimal complete set of
RPMs together and fix ocfs2-tools so that the
correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup
for testing purposes
while we get our file system supporting FCNTL
locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
Hi
We have ocfs2 working fine with ctdb on both Ubuntu
and openSUSE using
only the packages that they supply. We have been in
production on
the latter
since September. On real hardware. There is
extensive step by step
openSUSE
http://linuxcostablanca.__blogspot.com.es/2014/07/__samba4-cluster-for-ad-drbd-__ocfs2-ctdb.html
<http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html>
Ubuntu (tested only on vms)
http://linuxcostablanca.__blogspot.com.es/2014/08/__ubuntu-samba4-cluster-ctdb-__ocfs2-drbd.html
<http://linuxcostablanca.blogspot.com.es/2014/08/ubuntu-samba4-cluster-ctdb-ocfs2-drbd.html>
HTH
Yes, I know, but I am trying to do it without DRBD.
What they have is not true clustering. It is simple HA. You
really do
not need CTDB for that.
How else do you do IP failover for Samba?
Well?
Richard Sharpe
2014-12-08 14:52:52 UTC
Permalink
Post by Min Wai Chan
Which meant that
We are running CTDB with not their initial design...
And out of the design specification :)
And not sure what will happen.
That would be a nice way to put it...
So you've cluster samba without ctdb? What do you use for the IP takover (or
shuffle)?
Well, you do not actually have clustered Samba without CTDB.

You can also do IP address takeover with Pacemaker but it will not
work as well with Samba.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Richard Sharpe
2014-12-08 14:57:16 UTC
Permalink
Post by Min Wai Chan
Which meant that
We are running CTDB with not their initial design...
Yes, or certainly with a setup that has not been tested. The DRBD
dual-master feature was something I was unaware of since I have not
looked at DRBD for years.

It would seem to work for shared storage but you also need the OCFS2
distributed lock manager feature working for CTDB's lock recovery to
work.

In the steps I used and documented, the biggest problem was with
getting the DLM running for OCFS2.
Post by Min Wai Chan
And out of the design specification :)
And not sure what will happen.
That would be a nice way to put it...
steve
2014-12-08 15:27:40 UTC
Permalink
Post by Richard Sharpe
Post by Min Wai Chan
Which meant that
We are running CTDB with not their initial design...
And out of the design specification :)
And not sure what will happen.
That would be a nice way to put it...
So you've cluster samba without ctdb? What do you use for the IP takover (or
shuffle)?
Well, you do not actually have clustered Samba without CTDB.
You can also do IP address takeover with Pacemaker but it will not
work as well with Samba.
So what are you talking about when you say:
'What they have is not true clustering. It is simple HA. You
really do
not need CTDB for that.'

Samba shares on ocfs2 in an AD domain with pacemaker? If it were that
easy, we'd have done it!
steve
2014-12-08 15:32:15 UTC
Permalink
Post by Richard Sharpe
In the steps I used and documented, the biggest problem was with
getting the DLM running for OCFS2.
Depends how you go about it. On both Ubuntu and openSUSE, it's up out of
the box.
Richard Sharpe
2014-12-08 16:01:12 UTC
Permalink
Post by Richard Sharpe
In the steps I used and documented, the biggest problem was with
getting the DLM running for OCFS2.
Depends how you go about it. On both Ubuntu and openSUSE, it's up out of the
box.
And yet, people still seem to have problems getting it to work ...
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-08 16:22:09 UTC
Permalink
Post by Richard Sharpe
Post by Richard Sharpe
In the steps I used and documented, the biggest problem was with
getting the DLM running for OCFS2.
Depends how you go about it. On both Ubuntu and openSUSE, it's up out of the
box.
And yet, people still seem to have problems getting it to work ...
It depends what you call 'work':
https://lists.samba.org/archive/samba-technical/2014-December/104185.html
Richard Sharpe
2014-12-08 22:08:39 UTC
Permalink
Post by steve
Post by Richard Sharpe
Post by steve
Post by Richard Sharpe
In the steps I used and documented, the biggest problem was with
getting the DLM running for OCFS2.
Depends how you go about it. On both Ubuntu and openSUSE, it's up out of
the
box.
And yet, people still seem to have problems getting it to work ...
https://lists.samba.org/archive/samba-technical/2014-December/104185.html
Your contributions to this list are close to being indistinguishable
from noise ...
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Richard Sharpe
2014-12-08 22:10:07 UTC
Permalink
Post by Rowland Penny
Post by Richard Sharpe
Post by Rowland Penny
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo
ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo
ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get
distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos,
I also used the standard Debian kernel. I have got up to step 9, after a
bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download & build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both nodes
OK, looks like you got the real stuff done.
Post by Rowland Penny
At this point I have a shared cluster, but in a way that I cannot see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and think
about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
OK, I am getting closer :-)
I have got it working with just packages available from Debian repos, apart
from 'ccs', once I find a replacement for this, I will move onto ctdb &
samba.
I have made some progress on getting ocfs2-tools to build under CentOS
6.6 as well. Still one (or more) build problems to resolve.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Richard Sharpe
2014-12-08 22:32:01 UTC
Permalink
On Mon, Dec 8, 2014 at 2:10 PM, Richard Sharpe
Post by Richard Sharpe
Post by Rowland Penny
Post by Richard Sharpe
Post by Rowland Penny
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo
ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo
ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get
distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos,
I also used the standard Debian kernel. I have got up to step 9, after a
bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download & build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both nodes
OK, looks like you got the real stuff done.
Post by Rowland Penny
At this point I have a shared cluster, but in a way that I cannot see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and think
about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
OK, I am getting closer :-)
I have got it working with just packages available from Debian repos, apart
from 'ccs', once I find a replacement for this, I will move onto ctdb &
samba.
I have made some progress on getting ocfs2-tools to build under CentOS
6.6 as well. Still one (or more) build problems to resolve.
I have a clean build of the ocfs2-tools package on CentOS 6.6 with one
small change to ocfs2-mount in the cman-based branch.

I had to hack out an include of <asm/page.h>.

I will test that it works tomorrow ...
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-09 06:50:33 UTC
Permalink
Post by Richard Sharpe
Post by steve
Post by Richard Sharpe
Post by steve
Post by Richard Sharpe
In the steps I used and documented, the biggest problem was with
getting the DLM running for OCFS2.
Depends how you go about it. On both Ubuntu and openSUSE, it's up out of
the
box.
And yet, people still seem to have problems getting it to work ...
https://lists.samba.org/archive/samba-technical/2014-December/104185.html
Your contributions to this list are close to being indistinguishable
from noise ...
We are trying to help you get your cluster working by pointing out
problems that you have overlooked.
Richard Sharpe
2014-12-09 17:46:06 UTC
Permalink
Post by Richard Sharpe
Post by steve
Post by Richard Sharpe
Post by steve
Post by Richard Sharpe
In the steps I used and documented, the biggest problem was with
getting the DLM running for OCFS2.
Depends how you go about it. On both Ubuntu and openSUSE, it's up out
of
the
box.
And yet, people still seem to have problems getting it to work ...
https://lists.samba.org/archive/samba-technical/2014-December/104185.html
Your contributions to this list are close to being indistinguishable
from noise ...
We are trying to help you get your cluster working by pointing out problems
that you have overlooked.
So, I already have two clusters working, one on OCFS2 and the other on
our file system.

Since I am helping to develop a product that must expand to three and
above node clusters, your little cluster methodology is about as
interesting as Mozart's Requiem is to a man who is dying.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-09 17:51:56 UTC
Permalink
Post by Richard Sharpe
Post by Richard Sharpe
Post by steve
Post by Richard Sharpe
Post by steve
Post by Richard Sharpe
In the steps I used and documented, the biggest problem was with
getting the DLM running for OCFS2.
Depends how you go about it. On both Ubuntu and openSUSE, it's up out
of
the
box.
And yet, people still seem to have problems getting it to work ...
https://lists.samba.org/archive/samba-technical/2014-December/104185.html
Your contributions to this list are close to being indistinguishable
from noise ...
We are trying to help you get your cluster working by pointing out problems
that you have overlooked.
So, I already have two clusters working, one on OCFS2 and the other on
our file system.
Since I am helping to develop a product that must expand to three and
above node clusters, your little cluster methodology is about as
interesting as Mozart's Requiem is to a man who is dying.
Please try not to get so upset. We are discussing a piece of software.
We have no cluster methodology. The problems which you have overlooked
are outlined in the post that we sent to you. They were written by the
ctdb developer. We recommend you read them before you go to production.
Thanks,
Steve
Michael Adam
2014-12-09 19:42:26 UTC
Permalink
Post by Richard Sharpe
So, I already have two clusters working, one on OCFS2 and the other on
our file system.
Since I am helping to develop a product that must expand to three and
above node clusters, your little cluster methodology is about as
interesting as Mozart's Requiem is to a man who is dying.
Please try not to get so upset. We are discussing a piece of software. We
have no cluster methodology. The problems which you have overlooked
Er, what problems has Richard overlooked?
Rather has he treated and solved problems that you
have ignored or overlooked, as it seems to me.
are outlined in the post that we sent to you.
Are you referring to the blog post you sent earlier?
This one:
http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html

That one quite frankly does not seem to share a lot of insight or
valuable information for someone seeking to solve problems with
in a cluster setup.
They were written by the ctdb developer.
Er... by whom?

Michael
steve
2014-12-09 20:01:53 UTC
Permalink
Post by Michael Adam
Post by Richard Sharpe
So, I already have two clusters working, one on OCFS2 and the other on
our file system.
Since I am helping to develop a product that must expand to three and
above node clusters, your little cluster methodology is about as
interesting as Mozart's Requiem is to a man who is dying.
Please try not to get so upset. We are discussing a piece of software. We
have no cluster methodology. The problems which you have overlooked
Er, what problems has Richard overlooked?
Rather has he treated and solved problems that you
have ignored or overlooked, as it seems to me.
are outlined in the post that we sent to you.
Are you referring to the blog post you sent earlier?
No.
Post by Michael Adam
http://linuxcostablanca.blogspot.com.es/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html
No. Please read the thread, not just the bits we choose to trim.
Post by Michael Adam
That one quite frankly does not seem to share a lot of insight or
valuable information for someone seeking to solve problems with
in a cluster setup.
No obligation for you to follow it, although many do. I've no idea why
you posted it.
Post by Michael Adam
They were written by the ctdb developer.
Er... by whom?
Answer? Yep, you guessed it.
Post by Michael Adam
Michael
Michael Adam
2014-12-09 20:36:39 UTC
Permalink
Post by steve
Post by Michael Adam
Post by steve
are outlined in the post that we sent to you.
Are you referring to the blog post you sent earlier?
No. Please read the thread, not just the bits we choose to trim.
Ah, I was assuming that you were referring to a post of yours.
I think now I know what you mean. Were you referring to the
post about the ping_pong test by Martin Schwenke?
(The thread is way too long to find all the quotations easily...)
Post by steve
No obligation for you to follow it, although many do.
I've no idea why you posted it.
Right, that was superfluous - sorry for that!
But I thought you were referring to a post of yours,
and this came to my mind, because you _did_ post it here.
Post by steve
Post by Michael Adam
Post by steve
They were written by the ctdb developer.
Er... by whom?
Answer? Yep, you guessed it.
Well if my guess above was correct, Martin is
indeed one of the ctdb developers. Some more of us
who have commented in this and related threads are
as well..

By the way, did you also read my reply to Martin's mail? :-)

Cheers - Michael
steve
2014-12-09 20:53:51 UTC
Permalink
Post by Michael Adam
Post by steve
No obligation for you to follow it, although many do.
I've no idea why you posted it.
Right, that was superfluous - sorry for that!
But I thought you were referring to a post of yours,
and this came to my mind, because you _did_ post it here.
It has been posted on many occasions. Our posts attract hundreds of
visits per day. There really is no need to apologise. If you don't like
it or you do not find it useful then no one really cares. Please
remember that we are simply talking about a piece of software, not about
who is buying the next round of drinks!
Saludos,
Steve
Rowland Penny
2014-12-11 16:47:24 UTC
Permalink
Post by Rowland Penny
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Post by Rowland Penny
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo
ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo
ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files
that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you
disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get
distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos,
I also used the standard Debian kernel. I have got up to step 9,
after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the
cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download &
build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I
found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both
nodes
OK, looks like you got the real stuff done.
Post by Rowland Penny
At this point I have a shared cluster, but in a way that I cannot
see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and
think about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
OK, I am getting closer :-)
I have got it working with just packages available from Debian repos,
apart from 'ccs', once I find a replacement for this, I will move
onto ctdb & samba.
Rowland
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
OK, I officially give up! :-D

I can get a two node cluster working on Debian 7.7 very easily, no
compiling of software, but after this, it goes downhill fast. Whatever I
do I cannot get ctdb to work, I feel a lot of this is down to the very
small amount of documentation and what there is, seems to be biased
towards redhat.

Searching the internet brings up very little and what I find is
contradictory, also quite a lot of what I found would say something
like, 'setup ctdb' but wouldn't explain how or why. Don't get me started
on 'nodes', just what ipaddress should I use for each node? do I really
need two network interfaces and why?

So, I have come to the conclusion, I personally cannot setup Debian,
ocf2s, corosync, ctdb and samba, because I cannot find any documentation
that is complete and easily understandable. It is no good the people
writing ctdb knowing how to set it up, they need to look at the
documentation before they go much further.

Sorry if this upsets anybody, it just how I feel at the moment.

Rowland
steve
2014-12-11 17:28:19 UTC
Permalink
Post by Rowland Penny
Post by Rowland Penny
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Post by Rowland Penny
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo /etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control", MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo
ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo
ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files
that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you
disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get
distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos,
I also used the standard Debian kernel. I have got up to step 9,
after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the
cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download &
build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I
found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both
nodes
OK, looks like you got the real stuff done.
Post by Rowland Penny
At this point I have a shared cluster, but in a way that I cannot
see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and
think about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
OK, I am getting closer :-)
I have got it working with just packages available from Debian repos,
apart from 'ccs', once I find a replacement for this, I will move
onto ctdb & samba.
Rowland
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
OK, I officially give up! :-D
Aw, c'mon Rowland. You're almost there! Just take corosync out of the
mix and use something else which _is_ documented instead;)
Post by Rowland Penny
I can get a two node cluster working on Debian 7.7 very easily, no
compiling of software, but after this, it goes downhill fast. Whatever I
do I cannot get ctdb to work, I feel a lot of this is down to the very
small amount of documentation and what there is, seems to be biased
towards redhat.
Is Debian smbd built wth cluster? Ubuntu isn't.
Post by Rowland Penny
Searching the internet brings up very little and what I find is
contradictory, also quite a lot of what I found would say something
like, 'setup ctdb' but wouldn't explain how or why. Don't get me started
on 'nodes', just what ipaddress should I use for each node?
One addr. needs to be your domain; whatever subnet your current filer
uses. Connect that to the switch. The other, for the internal traffic is
another network. We use 192.168.1 for the domain and 192.168.0 for the
crossover.
do I really
Post by Rowland Penny
need two network interfaces and why?
No, but it's faster if you do and it saves having to route. Keep the
internal stuff away from the domain if you can would be our advice. With
only 1 cable, you can't
Post by Rowland Penny
So, I have come to the conclusion, I personally cannot setup Debian,
ocf2s, corosync, ctdb and samba, because I cannot find any documentation
that is complete and easily understandable. It is no good the people
writing ctdb knowing how to set it up, they need to look at the
documentation before they go much further.
+1
I see that there has been some activity of late, prompted by some
searching questioning techniques. This is great to see.
Post by Rowland Penny
Sorry if this upsets anybody, it just how I feel at the moment.
Meanwhile, if you want a no nonsense, documented 2 node cluster that
just works, just see our blog.
HTH
Post by Rowland Penny
Rowland
Rowland Penny
2014-12-11 17:46:04 UTC
Permalink
Post by steve
Post by Rowland Penny
Post by Rowland Penny
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Post by Rowland Penny
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which
you can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo
/etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control",
MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo
ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo
ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files
that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you
disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get
distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos,
I also used the standard Debian kernel. I have got up to step 9,
after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the
cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download &
build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I
found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both
nodes
OK, looks like you got the real stuff done.
Post by Rowland Penny
At this point I have a shared cluster, but in a way that I cannot
see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and
think about
this and see if there is a Debian way of doing this, without
modifying
anything or using anything that is not available from a repo.
OK, I am getting closer :-)
I have got it working with just packages available from Debian repos,
apart from 'ccs', once I find a replacement for this, I will move
onto ctdb & samba.
Rowland
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
OK, I officially give up! :-D
Aw, c'mon Rowland. You're almost there! Just take corosync out of the
mix and use something else which _is_ documented instead;)
Post by Rowland Penny
I can get a two node cluster working on Debian 7.7 very easily, no
compiling of software, but after this, it goes downhill fast. Whatever I
do I cannot get ctdb to work, I feel a lot of this is down to the very
small amount of documentation and what there is, seems to be biased
towards redhat.
Is Debian smbd built wth cluster? Ubuntu isn't.
Post by Rowland Penny
Searching the internet brings up very little and what I find is
contradictory, also quite a lot of what I found would say something
like, 'setup ctdb' but wouldn't explain how or why. Don't get me started
on 'nodes', just what ipaddress should I use for each node?
One addr. needs to be your domain; whatever subnet your current filer
uses. Connect that to the switch. The other, for the internal traffic
is another network. We use 192.168.1 for the domain and 192.168.0 for
the crossover.
do I really
Post by Rowland Penny
need two network interfaces and why?
No, but it's faster if you do and it saves having to route. Keep the
internal stuff away from the domain if you can would be our advice.
With only 1 cable, you can't
Post by Rowland Penny
So, I have come to the conclusion, I personally cannot setup Debian,
ocf2s, corosync, ctdb and samba, because I cannot find any documentation
that is complete and easily understandable. It is no good the people
writing ctdb knowing how to set it up, they need to look at the
documentation before they go much further.
+1
I see that there has been some activity of late, prompted by some
searching questioning techniques. This is great to see.
Post by Rowland Penny
Sorry if this upsets anybody, it just how I feel at the moment.
Meanwhile, if you want a no nonsense, documented 2 node cluster that
just works, just see our blog.
HTH
Post by Rowland Penny
Rowland
Hi Steve, I know that I can set it as you have done, I was trying to
follow Richards setup, but using Debian instead of redhat, I just cannot
get it to continue working once I bring CTDB into the picture and yes,
Samba on Debian does have cluster support.

There just doesn't seem to be full and easily readable documentation
available, the type of documentation that not only tells you what to do,
but also why. The documentation that is available seems to be biased
towards redhat and expects a deeper knowledge of the subject than a
normal user would have.

So, as I said, I give up.

Rowland
Richard Sharpe
2014-12-11 18:00:13 UTC
Permalink
Post by Rowland Penny
Post by Rowland Penny
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Post by Rowland Penny
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you
can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo
/etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control",
MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo
ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo
ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files
that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you
disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get
distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos,
I also used the standard Debian kernel. I have got up to step 9,
after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the
cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download &
build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I
found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both
nodes
OK, looks like you got the real stuff done.
Post by Rowland Penny
At this point I have a shared cluster, but in a way that I cannot
see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and
think about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
OK, I am getting closer :-)
I have got it working with just packages available from Debian repos,
apart from 'ccs', once I find a replacement for this, I will move
onto ctdb & samba.
Rowland
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
OK, I officially give up! :-D
Aw, c'mon Rowland. You're almost there! Just take corosync out of the mix
and use something else which _is_ documented instead;)
Post by Rowland Penny
I can get a two node cluster working on Debian 7.7 very easily, no
compiling of software, but after this, it goes downhill fast. Whatever I
do I cannot get ctdb to work, I feel a lot of this is down to the very
small amount of documentation and what there is, seems to be biased
towards redhat.
Is Debian smbd built wth cluster? Ubuntu isn't.
Post by Rowland Penny
Searching the internet brings up very little and what I find is
contradictory, also quite a lot of what I found would say something
like, 'setup ctdb' but wouldn't explain how or why. Don't get me started
on 'nodes', just what ipaddress should I use for each node?
One addr. needs to be your domain; whatever subnet your current filer
uses. Connect that to the switch. The other, for the internal traffic is
another network. We use 192.168.1 for the domain and 192.168.0 for the
crossover.
do I really
Post by Rowland Penny
need two network interfaces and why?
No, but it's faster if you do and it saves having to route. Keep the
internal stuff away from the domain if you can would be our advice. With
only 1 cable, you can't
Post by Rowland Penny
So, I have come to the conclusion, I personally cannot setup Debian,
ocf2s, corosync, ctdb and samba, because I cannot find any documentation
that is complete and easily understandable. It is no good the people
writing ctdb knowing how to set it up, they need to look at the
documentation before they go much further.
+1
I see that there has been some activity of late, prompted by some
searching questioning techniques. This is great to see.
Post by Rowland Penny
Sorry if this upsets anybody, it just how I feel at the moment.
Meanwhile, if you want a no nonsense, documented 2 node cluster that just
works, just see our blog.
HTH
Post by Rowland Penny
Rowland
Hi Steve, I know that I can set it as you have done, I was trying to follow
Richards setup, but using Debian instead of redhat, I just cannot get it to
continue working once I bring CTDB into the picture and yes, Samba on Debian
does have cluster support.
There just doesn't seem to be full and easily readable documentation
available, the type of documentation that not only tells you what to do, but
also why. The documentation that is available seems to be biased towards
redhat and expects a deeper knowledge of the subject than a normal user
would have.
So, as I said, I give up.
If I get some time over the Christmas break I might try to set that
up. At the moment I do not have the VMs available to try this because
my main machine already has 4 VMs running on it and I am not sure I
can afford the extra memory hit. Hmmm, maybe I can. I have 64GB ...
can you document for me the steps you took? I am not familiar with
Debian.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
ronnie sahlberg
2014-12-11 18:03:18 UTC
Permalink
So you can not even get ctdbd to start ?

I can help you setting it up.
Have you looked at http://ctdb.samba.org/configuring.html ?

You really only need to set three(two) arguments for a basic ctdb setup :

CTDB_NODES
CTDB_RECOVERY_LOCK
CTDB_PUBLIC_ADDRESSES

You can skip/remove CTDB_RECOVERY_LOCK if you do not have a cluster
filesystem that supports fcntl() locking.
So maybe leave that out for now until Richards text on how to build
the lock manager is finished.
This just means that you do not have a split brain protection, but you
can skip that as long until you plan to start running in production.

CTDB_NODES is the file where you list the static ip address for every
node in the cluster.
Each node must have an identical version of this file.
On big clusters, the recommendation is usually to run this on a
separate private network that is dedicated for ctdb traffic.
This is just to make sure that latency and recovery bandwidth is as
good as possible for the ctdb traffic, by not having to compete with
the i/o from tens of thousands of windows clients.
But you don't NEED this to be a private network.
If you have a small cluster and only light traffic you can just use a
single network everything and thus have these addresses and the public
addresses below all share the same interface/subnet.


CTDB_PUBLIC_ADDRESSES
Is a file that contains all the public addresses for the cluster.
Each line contains one ip address/mask and associated interface.
For small clusters, this file will usually be identical on all nodes
in the cluster.


And that should really be all there is to it to at least get ctdb working.

If you configure this and start ctdb on all nodes, what happens and
what does 'ctdb status' print ?


Also see
man ctdb
and
man ctdbd

The two manpages contain a lot of good information.

regards
ronnie sahlberg
Post by Rowland Penny
Post by Rowland Penny
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Post by Rowland Penny
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you
can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo
/etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control",
MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo
ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo
ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files
that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you
disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get
distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos,
I also used the standard Debian kernel. I have got up to step 9,
after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the
cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download &
build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I
found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both
nodes
OK, looks like you got the real stuff done.
Post by Rowland Penny
At this point I have a shared cluster, but in a way that I cannot
see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and
think about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
OK, I am getting closer :-)
I have got it working with just packages available from Debian repos,
apart from 'ccs', once I find a replacement for this, I will move
onto ctdb & samba.
Rowland
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
OK, I officially give up! :-D
Aw, c'mon Rowland. You're almost there! Just take corosync out of the mix
and use something else which _is_ documented instead;)
Post by Rowland Penny
I can get a two node cluster working on Debian 7.7 very easily, no
compiling of software, but after this, it goes downhill fast. Whatever I
do I cannot get ctdb to work, I feel a lot of this is down to the very
small amount of documentation and what there is, seems to be biased
towards redhat.
Is Debian smbd built wth cluster? Ubuntu isn't.
Post by Rowland Penny
Searching the internet brings up very little and what I find is
contradictory, also quite a lot of what I found would say something
like, 'setup ctdb' but wouldn't explain how or why. Don't get me started
on 'nodes', just what ipaddress should I use for each node?
One addr. needs to be your domain; whatever subnet your current filer
uses. Connect that to the switch. The other, for the internal traffic is
another network. We use 192.168.1 for the domain and 192.168.0 for the
crossover.
do I really
Post by Rowland Penny
need two network interfaces and why?
No, but it's faster if you do and it saves having to route. Keep the
internal stuff away from the domain if you can would be our advice. With
only 1 cable, you can't
Post by Rowland Penny
So, I have come to the conclusion, I personally cannot setup Debian,
ocf2s, corosync, ctdb and samba, because I cannot find any documentation
that is complete and easily understandable. It is no good the people
writing ctdb knowing how to set it up, they need to look at the
documentation before they go much further.
+1
I see that there has been some activity of late, prompted by some
searching questioning techniques. This is great to see.
Post by Rowland Penny
Sorry if this upsets anybody, it just how I feel at the moment.
Meanwhile, if you want a no nonsense, documented 2 node cluster that just
works, just see our blog.
HTH
Post by Rowland Penny
Rowland
Hi Steve, I know that I can set it as you have done, I was trying to follow
Richards setup, but using Debian instead of redhat, I just cannot get it to
continue working once I bring CTDB into the picture and yes, Samba on Debian
does have cluster support.
There just doesn't seem to be full and easily readable documentation
available, the type of documentation that not only tells you what to do, but
also why. The documentation that is available seems to be biased towards
redhat and expects a deeper knowledge of the subject than a normal user
would have.
So, as I said, I give up.
Rowland
ronnie sahlberg
2014-12-11 18:32:31 UTC
Permalink
I just tried building a single-node "cluster" on debian with ctdb.
I can check building a 4 node cluster next week when I am home from my travels.

To get ctdb running on ubuntu 14.10, as root:

1, Install the ctdb package:
apt-get install ctdb

2, create a missing directory
mkdir -p /var/lib/run/ctdb

3, remove the reclock file
vi /etc/default/ctdb
and comment out CTDB_RECOVERY_LOCK

4, create a nodes file
vi /etc/ctdb/nodes
and add the line 127.0.0.1

5, create a public addresses file
vi /etc/ctdb/public_addresses
and add the two lines
127.0.0.2/8 lo
127.0.0.3/8 lo

6, start ctdb
service ctdb start


then check everything looks fine with 'ctbb status' and 'tail
/var/log/ctdb/log.ctdb'


That will not really create a very interesting cluster, just one node,
two public addresses and all on loopback.
But this should at least verify that ctdbd will start and run.
Then you can just shut it down and edit
/etc/ctdb/nodes|public_addresses and make them more interesting.

I personally suggest never running anything smaller than 4 node
clusters for real data.


Please see
man ctdb
man ctdbd
less /etc/default/ctdb
http://ctdb.samba.org/configuring.html

it should contain most to get started with ctdb.


regards
ronnie sahlberg.


On Thu, Dec 11, 2014 at 1:03 PM, ronnie sahlberg
Post by ronnie sahlberg
So you can not even get ctdbd to start ?
I can help you setting it up.
Have you looked at http://ctdb.samba.org/configuring.html ?
CTDB_NODES
CTDB_RECOVERY_LOCK
CTDB_PUBLIC_ADDRESSES
You can skip/remove CTDB_RECOVERY_LOCK if you do not have a cluster
filesystem that supports fcntl() locking.
So maybe leave that out for now until Richards text on how to build
the lock manager is finished.
This just means that you do not have a split brain protection, but you
can skip that as long until you plan to start running in production.
CTDB_NODES is the file where you list the static ip address for every
node in the cluster.
Each node must have an identical version of this file.
On big clusters, the recommendation is usually to run this on a
separate private network that is dedicated for ctdb traffic.
This is just to make sure that latency and recovery bandwidth is as
good as possible for the ctdb traffic, by not having to compete with
the i/o from tens of thousands of windows clients.
But you don't NEED this to be a private network.
If you have a small cluster and only light traffic you can just use a
single network everything and thus have these addresses and the public
addresses below all share the same interface/subnet.
CTDB_PUBLIC_ADDRESSES
Is a file that contains all the public addresses for the cluster.
Each line contains one ip address/mask and associated interface.
For small clusters, this file will usually be identical on all nodes
in the cluster.
And that should really be all there is to it to at least get ctdb working.
If you configure this and start ctdb on all nodes, what happens and
what does 'ctdb status' print ?
Also see
man ctdb
and
man ctdbd
The two manpages contain a lot of good information.
regards
ronnie sahlberg
Post by Rowland Penny
Post by Rowland Penny
On Sat, Dec 6, 2014 at 2:58 AM, Rowland Penny
Post by Rowland Penny
Post by Richard Sharpe
Hi folks,
Here are the steps I used, as far as I can remember them. Please
excuse any mistakes and be prepared to think for yourself when
following them.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used CentOS 6.6 with 4GB and 20GB. (I actually
installed CentOS 6.3 and upgraded because I had the ISO handy.) You
will also need an extra interface on each VM for the clustering
private network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd 22ae1fcc-fda7-4e42-be9f-3b8bd7fc0c0e --type
shareable # Make it shareable.
Note, for that second command use the UUID for your disk which you
can
vboxmanage list hdds --brief
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VMs.
yum install openais corosync pacemaker-libs pacemaker-libs-devel gcc
corosync-devel openais-devel rpm-build e2fsprogs-devel libuuid-devel
git pygtk2 python-devel readline-devel clusterlib-devel redhat-lsb
sqlite-devel gnutls-devel byacc flex nss-devel
It is not clear to me that openais was needed, for example
5. Next I installed Oracles UEK and ocfs2-tools
wget http://public-yum.oracle.com/public-yum-ol6.repo
/etc/yum.repos.d
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O
/etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
yum install kernel-uek kernel-uek-devel
yum install ocfs2-tools
yum install openaislib-devel corosync-devel # It is not clear that I
needed to install the first
echo 'KERNEL=="ocfs2_control", NAME="misc/ocfs2_control",
MODE="0660"'
/etc/udev/rules.d/99-ocfs2_control.rules
reboot # on each
6. Configure cman and pacemaker
# configure corosync first
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
# Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.122.0
# Make sure that the mcastaddr is defined. I used 239.255.1.1
# make sure that the mcastport is defined. I used 5405
# Copy that file to the other node.
/etc/init.d/pacemaker stop # Stop these in case they were running
/etc/init.d/corosync stop # Same here
yum install ccs pcs
# Create a cluster
ccs -f /etc/cluster/cluster.conf --createcluster ctdbdemo
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-1
ccs -f /etc/cluster/cluster.conf --addnode ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-1
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect ocfs2-2
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-1
pcmk-redirect port=ocfs2-1
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk ocfs2-2
pcmk-redirect port=ocfs2-2
chkconfig NetworkManager off
service NetworkManager stop
# now start the cluster
service cman start
pcs property set stonith-enabled=false
service pacemaker start
# Also start it on the other node(s).
Last updated: Thu Dec 4 09:40:16 2014
Last change: Tue Dec 2 10:12:50 2014
Stack: cman
Current DC: ocfs2-2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured
Online: [ ocfs2-1 ocfs2-2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
http://clusterlabs.org/quickstart-redhat.html
7. Configure the Oracle cluster
o2cb add-cluster ctdbdemo
o2cb add-node --ip 192.168.122.10 --port --number 1 ctdbdemo ocfs2-1
o2cb add-node --ip 192.168.122.10 --port 7777 --number 1 ctdbdemo
ocfs2-1
o2cb add-node --ip 192.168.122.11 --port 7777 --number 2 ctdbdemo
ocfs2-2
service o2cb configure # This step will fail claiming that it can't
find /sbin/ocfs2_controld.cman
#
# However, it does the important stuff.
#
# NOTE, during the configuration steps you MUST SELECT cman AS THE
CLUSTER STACK!
#
8. Find and install the ocfs2-tools git repos
git clone git://oss.oracle.com/git/ocfs2-tools.git ocfs2-tools
# install stuff needed
yum install libaio libaio-devel
yum install pacemaker-libs-devel
# Now build
cd ocfs2-tools
./configure
make
# This will likely fail. If it first fails complaining about
CPPFLAGS='-I/usr/include/libxml2' ./configure
make
# It might complain again complaining about some AIS include files
that
are no
# longer in the packages installed. That is OK. It should have built
ocfs2_controld.cman,
cp ocfs2_controld.cman /usr/sbin/
service pacemaker stop
service cman stop
service cman start
service o2cb start
service pacemaker start
mkfs.ocfs2 -L CTDBdemocommon --cluster-name ctdbdemo --cluster-stack
ocfs2 -N 4 /dev/sdb
9. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
10. Install ctdb and samba
11. Configure samba for the domain you want to join
# Make sure you have clustering = yes and the other things you need.
12. Configure ctdb (/etc/sysconfig/ctdb) and make sure that you
disable
winbindd
13. Start ctdb on all nodes
# You must have ctdb started so that the secrets file will get
distributed
14. join the domain
15. Enable winbindd in the ctdb config
16. Restart ctdb on all nodes
At this point you should be done. The steps you need might vary.
I have limited time to help you with this.
OK, I have followed Richards 'howto' but using Debian 7.7 instead of
Centos,
I also used the standard Debian kernel. I have got up to step 9,
after a bit
of a battle and it all started so well. :-)
apt-get install openais corosync pacemaker cman ocfs2-tools-cman
ocfs2-tools-pacemaker
Unfortunately, it turned out that pacemaker is not built to use the
cman
stack, so I had to rebuild it
Next problem, ccs and pcs are not available, so I had to download &
build
them, though even this was not without problems, ccs put
'empty_cluster.conf' in the wrong place and pcs is hardwired to use
/usr/libexec
Next problem 'o2cb' appears to be called 'o2cb_ctl' on Debian.
Started cman, o2cb and pacemaker (first time round, this is where I
found
that pacemaker wouldn't work with cman)
I then created the shared shared file system and mounted it on both
nodes
OK, looks like you got the real stuff done.
Post by Rowland Penny
At this point I have a shared cluster, but in a way that I cannot
see any
sane sysadmin using. Most of the software is heavily modified or not
available from the distros repos. I am going to have to stop and
think about
this and see if there is a Debian way of doing this, without modifying
anything or using anything that is not available from a repo.
OK, I am getting closer :-)
I have got it working with just packages available from Debian repos,
apart from 'ccs', once I find a replacement for this, I will move
onto ctdb & samba.
Rowland
Indeed, I cannot imagine anyone using the approach I used for
production as well. Having stuff that needs to be rebuilt is not a
good idea, and it would be useful to get a minimal complete set of
RPMs together and fix ocfs2-tools so that the correct things are there
and the build works out what is needed.
However, I only currently need the OCFS2 setup for testing purposes
while we get our file system supporting FCNTL locks and so I have a
reference to work with.
Not sure if I have the time to fix anything.
OK, I officially give up! :-D
Aw, c'mon Rowland. You're almost there! Just take corosync out of the mix
and use something else which _is_ documented instead;)
Post by Rowland Penny
I can get a two node cluster working on Debian 7.7 very easily, no
compiling of software, but after this, it goes downhill fast. Whatever I
do I cannot get ctdb to work, I feel a lot of this is down to the very
small amount of documentation and what there is, seems to be biased
towards redhat.
Is Debian smbd built wth cluster? Ubuntu isn't.
Post by Rowland Penny
Searching the internet brings up very little and what I find is
contradictory, also quite a lot of what I found would say something
like, 'setup ctdb' but wouldn't explain how or why. Don't get me started
on 'nodes', just what ipaddress should I use for each node?
One addr. needs to be your domain; whatever subnet your current filer
uses. Connect that to the switch. The other, for the internal traffic is
another network. We use 192.168.1 for the domain and 192.168.0 for the
crossover.
do I really
Post by Rowland Penny
need two network interfaces and why?
No, but it's faster if you do and it saves having to route. Keep the
internal stuff away from the domain if you can would be our advice. With
only 1 cable, you can't
Post by Rowland Penny
So, I have come to the conclusion, I personally cannot setup Debian,
ocf2s, corosync, ctdb and samba, because I cannot find any documentation
that is complete and easily understandable. It is no good the people
writing ctdb knowing how to set it up, they need to look at the
documentation before they go much further.
+1
I see that there has been some activity of late, prompted by some
searching questioning techniques. This is great to see.
Post by Rowland Penny
Sorry if this upsets anybody, it just how I feel at the moment.
Meanwhile, if you want a no nonsense, documented 2 node cluster that just
works, just see our blog.
HTH
Post by Rowland Penny
Rowland
Hi Steve, I know that I can set it as you have done, I was trying to follow
Richards setup, but using Debian instead of redhat, I just cannot get it to
continue working once I bring CTDB into the picture and yes, Samba on Debian
does have cluster support.
There just doesn't seem to be full and easily readable documentation
available, the type of documentation that not only tells you what to do, but
also why. The documentation that is available seems to be biased towards
redhat and expects a deeper knowledge of the subject than a normal user
would have.
So, as I said, I give up.
Rowland
Rowland Penny
2014-12-12 10:36:08 UTC
Permalink
Post by ronnie sahlberg
I just tried building a single-node "cluster" on debian with ctdb.
Why a single node ????
Post by ronnie sahlberg
I can check building a 4 node cluster next week when I am home from my travels.
Try it with two nodes
Hang on, you said 'debian' above
Post by ronnie sahlberg
apt-get install ctdb
2, create a missing directory
mkdir -p /var/lib/run/ctdb
Why is there a missing directory, sounds like a bug to me.
Post by ronnie sahlberg
3, remove the reclock file
vi /etc/default/ctdb
and comment out CTDB_RECOVERY_LOCK
But I want the lock.
Post by ronnie sahlberg
4, create a nodes file
vi /etc/ctdb/nodes
and add the line 127.0.0.1
Yes, but why '127.0.0.1' ???
Post by ronnie sahlberg
5, create a public addresses file
vi /etc/ctdb/public_addresses
and add the two lines
127.0.0.2/8 lo
127.0.0.3/8 lo
Do you have to create these ipaddresses, if so where and how
Post by ronnie sahlberg
6, start ctdb
service ctdb start
That is this first part I really understood.
Post by ronnie sahlberg
then check everything looks fine with 'ctbb status' and 'tail
/var/log/ctdb/log.ctdb'
That will not really create a very interesting cluster, just one node,
two public addresses and all on loopback.
But this should at least verify that ctdbd will start and run.
Then you can just shut it down and edit
/etc/ctdb/nodes|public_addresses and make them more interesting.
Again, why just one node. ??
Post by ronnie sahlberg
I personally suggest never running anything smaller than 4 node
clusters for real data.
Yes, but I am testing, so where is the documentation for people like me,
who just want to get a couple of nodes up and running ???

Rowland
Post by ronnie sahlberg
Please see
man ctdb
man ctdbd
less /etc/default/ctdb
http://ctdb.samba.org/configuring.html
it should contain most to get started with ctdb.
regards
ronnie sahlberg.
steve
2014-12-12 11:08:09 UTC
Permalink
Post by Rowland Penny
Post by ronnie sahlberg
I just tried building a single-node "cluster" on debian with ctdb.
Why a single node ????
Post by ronnie sahlberg
I can check building a 4 node cluster next week when I am home from my
travels.
Try it with two nodes
Hang on, you said 'debian' above
Post by ronnie sahlberg
apt-get install ctdb
2, create a missing directory
mkdir -p /var/lib/run/ctdb
Why is there a missing directory, sounds like a bug to me.
Post by ronnie sahlberg
3, remove the reclock file
vi /etc/default/ctdb
and comment out CTDB_RECOVERY_LOCK
But I want the lock.
Post by ronnie sahlberg
4, create a nodes file
vi /etc/ctdb/nodes
and add the line 127.0.0.1
Yes, but why '127.0.0.1' ???
Post by ronnie sahlberg
5, create a public addresses file
vi /etc/ctdb/public_addresses
and add the two lines
127.0.0.2/8 lo
127.0.0.3/8 lo
Do you have to create these ipaddresses, if so where and how
Post by ronnie sahlberg
6, start ctdb
service ctdb start
That is this first part I really understood.
Post by ronnie sahlberg
then check everything looks fine with 'ctbb status' and 'tail
/var/log/ctdb/log.ctdb'
That will not really create a very interesting cluster, just one node,
two public addresses and all on loopback.
But this should at least verify that ctdbd will start and run.
Then you can just shut it down and edit
/etc/ctdb/nodes|public_addresses and make them more interesting.
Again, why just one node. ??
Post by ronnie sahlberg
I personally suggest never running anything smaller than 4 node
clusters for real data.
Yes, but I am testing, so where is the documentation for people like me,
who just want to get a couple of nodes up and running ???
There isn't anything even close to what you are asking for. Maybe we
should take this to the samba list? It's one thing sitting with a laptop
and making ctdb startup on its own, something very different making it
work with samba, debian, wires, cables, network cards, switches and a
real domain. It is also very different setting up a cluster on vms. I
suppose this is where e.g. SUSE-HA comes into its own. The devs have
real hardware to test upon, time to write documentation and support it.
They'll come over and set it up for you and you just call if it doesn't
work. We can't afford that though, devs who do it voluntarily or as a
hobby can't ('can't be expected to', careful :-Ed.) write documentation
at user level. But things are moving again with ctdb.
ronnie sahlberg
2014-12-12 14:13:37 UTC
Permalink
Can you please just go away.

We are trying to help here and your content-free spam is not helping.
Post by Rowland Penny
Post by ronnie sahlberg
I just tried building a single-node "cluster" on debian with ctdb.
Why a single node ????
Post by ronnie sahlberg
I can check building a 4 node cluster next week when I am home from my
travels.
Try it with two nodes
Hang on, you said 'debian' above
Post by ronnie sahlberg
apt-get install ctdb
2, create a missing directory
mkdir -p /var/lib/run/ctdb
Why is there a missing directory, sounds like a bug to me.
Post by ronnie sahlberg
3, remove the reclock file
vi /etc/default/ctdb
and comment out CTDB_RECOVERY_LOCK
But I want the lock.
Post by ronnie sahlberg
4, create a nodes file
vi /etc/ctdb/nodes
and add the line 127.0.0.1
Yes, but why '127.0.0.1' ???
Post by ronnie sahlberg
5, create a public addresses file
vi /etc/ctdb/public_addresses
and add the two lines
127.0.0.2/8 lo
127.0.0.3/8 lo
Do you have to create these ipaddresses, if so where and how
Post by ronnie sahlberg
6, start ctdb
service ctdb start
That is this first part I really understood.
Post by ronnie sahlberg
then check everything looks fine with 'ctbb status' and 'tail
/var/log/ctdb/log.ctdb'
That will not really create a very interesting cluster, just one node,
two public addresses and all on loopback.
But this should at least verify that ctdbd will start and run.
Then you can just shut it down and edit
/etc/ctdb/nodes|public_addresses and make them more interesting.
Again, why just one node. ??
Post by ronnie sahlberg
I personally suggest never running anything smaller than 4 node
clusters for real data.
Yes, but I am testing, so where is the documentation for people like me,
who just want to get a couple of nodes up and running ???
There isn't anything even close to what you are asking for. Maybe we should
take this to the samba list? It's one thing sitting with a laptop and making
ctdb startup on its own, something very different making it work with samba,
debian, wires, cables, network cards, switches and a real domain. It is also
very different setting up a cluster on vms. I suppose this is where e.g.
SUSE-HA comes into its own. The devs have real hardware to test upon, time
to write documentation and support it. They'll come over and set it up for
you and you just call if it doesn't work. We can't afford that though, devs
who do it voluntarily or as a hobby can't ('can't be expected to', careful
:-Ed.) write documentation at user level. But things are moving again with
ctdb.
ronnie sahlberg
2014-12-12 14:21:57 UTC
Permalink
Post by Rowland Penny
Post by ronnie sahlberg
I just tried building a single-node "cluster" on debian with ctdb.
Why a single node ????
I was under the impression you could not get ctdbd to run at all, so
as an example to troubleshoot and test
keep it as simple as possible.

A single node cluster running on loopback is as simple as it gets.
Once/if you can get that trivial cluster to run ctdbd successfully it
is trivial to change the configuration files to make it a multi node
cluster.
Post by Rowland Penny
Post by ronnie sahlberg
I can check building a 4 node cluster next week when I am home from my
travels.
Try it with two nodes
Hang on, you said 'debian' above
Post by ronnie sahlberg
apt-get install ctdb
2, create a missing directory
mkdir -p /var/lib/run/ctdb
Why is there a missing directory, sounds like a bug to me.
Yepp it definitely is.
There is something wrong with the deb package that at least ubuntu
14.10 ships with since it does not create required directory
structure.
Should open a bug with debian and/or ubuntu so that their folks that
package this can fix the bug.
Post by Rowland Penny
Post by ronnie sahlberg
3, remove the reclock file
vi /etc/default/ctdb
and comment out CTDB_RECOVERY_LOCK
But I want the lock.
Then you need a fcntl() capable distributed lock manager.
I think richards emails before showed what to do. A lot of manual work
apparently since the debian packages apparently did not ship with a
functioning lock manager
and he had to tweak the sources they ship and recompile locally.
Post by Rowland Penny
Post by ronnie sahlberg
4, create a nodes file
vi /etc/ctdb/nodes
and add the line 127.0.0.1
Yes, but why '127.0.0.1' ???
It is the simplest possible cluster.
You wanted a test to see if you could configure / run ctdbd at all,
so lets do it using the simplest possible test.
Post by Rowland Penny
Post by ronnie sahlberg
5, create a public addresses file
vi /etc/ctdb/public_addresses
and add the two lines
127.0.0.2/8 lo
127.0.0.3/8 lo
Do you have to create these ipaddresses, if so where and how
No, you should not/need not create them on the system.
Ctdbd will create and assign these addresses automatically and
dynamically while the cluster is running.
Post by Rowland Penny
Post by ronnie sahlberg
6, start ctdb
service ctdb start
That is this first part I really understood.
Post by ronnie sahlberg
then check everything looks fine with 'ctbb status' and 'tail
/var/log/ctdb/log.ctdb'
That will not really create a very interesting cluster, just one node,
two public addresses and all on loopback.
But this should at least verify that ctdbd will start and run.
Then you can just shut it down and edit
/etc/ctdb/nodes|public_addresses and make them more interesting.
Again, why just one node. ??
Simplest possible cluster to see that you can get ctdbd running.
Post by Rowland Penny
Post by ronnie sahlberg
I personally suggest never running anything smaller than 4 node
clusters for real data.
Yes, but I am testing, so where is the documentation for people like me, who
just want to get a couple of nodes up and running ???
/etc/sysconfig/ctdb on RPM based systems.
/etc/default/ctdb onDEV based systems

man ctdb
man ctdbd
Post by Rowland Penny
Rowland
Post by ronnie sahlberg
Please see
man ctdb
man ctdbd
less /etc/default/ctdb
http://ctdb.samba.org/configuring.html
it should contain most to get started with ctdb.
regards
ronnie sahlberg.
steve
2014-12-12 15:34:17 UTC
Permalink
Post by ronnie sahlberg
Can you please just go away.
We are trying to help here and your content-free spam is not helping.
It's a sight more positive than your single node cluster. Which has
helped no one LOL!
Michael Adam
2014-12-12 20:52:58 UTC
Permalink
Hi Rowland,

I am really sorry that you are frustrated.
Pleas keep on!

We do try to document stuff, but clusters with
ctdb is still an area that is not done by
the average admin. And there are not so many
people working on the code, so as usual,
documentation sadly lacks behind.

Someone like you, who with the help of us devs
finally gets to a working system, might help
us in the end to improve the docs! :-)

Clustering also relies on a whole additional
area which is the corresponding cluster file
systems used underneath, which makes it difficult
to have comprehensive documentation for a
concrete setup.

That being said, there is documentation out
there, but not necessarily terribly up to date,
and not precisely what you want.

On the samba wiki, there is the generic but
somewhat superficial

https://wiki.samba.org/index.php/CTDB_Setup

and as a specific example the GFS/CTDB/Samba howto:

https://wiki.samba.org/index.php/GFS_CTDB_HowTo

which is of course redhat-cluster/gfs specific as
far as the file system is concerned.

The manual pages of ctdb are also pretty good by now!

It is true that Debian is especially rare in
these clustered Samba docs. This is due to
the fact, that for a long time, there was no
big focus on cluster file systems in Debian.

A couple of years ago, I have given a couple
of talks and courses about clustered Samba
partly also using Debian and Gluster, but also
RedHat and GFS. My 2010 notes

http://www.samba.org/~obnox/presentations/sambaXP-2010/sambaxp-2010-tutorial-ctdb-handout.pdf

even contain some note on what is different
and needs to be considered on a Debian systems.
(The main difference is that /etc/default is
used instead of /etc/sysconfig by most packages.)

At that time, OCFS2 was not yet ready for
CTDB/Samba, and GlusterFS was mainly available
in Debian, but did not perform very well.
Now Gluster has been taken on by RedHat, and
I think we can expect updates from that corner in the
nearer future.

Here is a paper explaining some of the
fundamentals of CTDB and the basic configuration
of CTDB and clustered samba on Top.

http://www.samba.org/~obnox/presentations/sambaXP-2009/samba-and-ctdb.pdf

And a pretty similar article from Linux Magazine:

http://www.linux-magazine.com/Issues/2009/105/Samba-for-Clusters

and an accompanying one on registry configuration:

http://www.linux-magazine.com/Issues/2009/105/Samba-s-Registry

Some of the details need to be checked because the
info is 4-5 years old, but the basics are still
correct.

As a last but possibly important remark (for those
who have not fallen asleep by now ;-) let me
mention that as far as I know, Stefan Kania
(whom I have copied in this reply, since I am not
certain that he is subscribed to the list) is currently
working on a reproducible setup with Debian/Gluster/CTDB/Samba.
Maybe he can give some insight on the specialities of
a CTDB setup on current Debian.


I apologize for having pointed mainly to my own
writings, but at the time I have undertaken some
considerable efforts to create some documentation
and explanations, so I hope this is of any use. :-)

Cheers - Michael

PS: Also, you are asking why Ronnie suggests a single node
cluster.
Because it is always good to start out with a simple case.
And once you mastered that, move on to more complicated things.
And you can even see most config points in that 1-node cluster.
Post by Rowland Penny
Post by ronnie sahlberg
I just tried building a single-node "cluster" on debian with ctdb.
Why a single node ????
Post by ronnie sahlberg
I can check building a 4 node cluster next week when I am home from my travels.
Try it with two nodes
Hang on, you said 'debian' above
Post by ronnie sahlberg
apt-get install ctdb
2, create a missing directory
mkdir -p /var/lib/run/ctdb
Why is there a missing directory, sounds like a bug to me.
Post by ronnie sahlberg
3, remove the reclock file
vi /etc/default/ctdb
and comment out CTDB_RECOVERY_LOCK
But I want the lock.
Post by ronnie sahlberg
4, create a nodes file
vi /etc/ctdb/nodes
and add the line 127.0.0.1
Yes, but why '127.0.0.1' ???
Post by ronnie sahlberg
5, create a public addresses file
vi /etc/ctdb/public_addresses
and add the two lines
127.0.0.2/8 lo
127.0.0.3/8 lo
Do you have to create these ipaddresses, if so where and how
Post by ronnie sahlberg
6, start ctdb
service ctdb start
That is this first part I really understood.
Post by ronnie sahlberg
then check everything looks fine with 'ctbb status' and 'tail
/var/log/ctdb/log.ctdb'
That will not really create a very interesting cluster, just one node,
two public addresses and all on loopback.
But this should at least verify that ctdbd will start and run.
Then you can just shut it down and edit
/etc/ctdb/nodes|public_addresses and make them more interesting.
Again, why just one node. ??
Post by ronnie sahlberg
I personally suggest never running anything smaller than 4 node
clusters for real data.
Yes, but I am testing, so where is the documentation for people like me, who
just want to get a couple of nodes up and running ???
Rowland
Post by ronnie sahlberg
Please see
man ctdb
man ctdbd
less /etc/default/ctdb
http://ctdb.samba.org/configuring.html
it should contain most to get started with ctdb.
regards
ronnie sahlberg.
Rowland Penny
2014-12-12 21:11:53 UTC
Permalink
Post by Michael Adam
Hi Rowland,
I am really sorry that you are frustrated.
Pleas keep on!
Frustrated might just be an understatement :-D
Post by Michael Adam
We do try to document stuff, but clusters with
ctdb is still an area that is not done by
the average admin. And there are not so many
people working on the code, so as usual,
documentation sadly lacks behind.
Someone like you, who with the help of us devs
finally gets to a working system, might help
us in the end to improve the docs! :-)
ER, thats why I am trying to get to work, then I can update the samba wiki.
Post by Michael Adam
Clustering also relies on a whole additional
area which is the corresponding cluster file
systems used underneath, which makes it difficult
to have comprehensive documentation for a
concrete setup.
That being said, there is documentation out
there, but not necessarily terribly up to date,
and not precisely what you want.
Yes, I think I have read most of it :-)
Post by Michael Adam
On the samba wiki, there is the generic but
somewhat superficial
https://wiki.samba.org/index.php/CTDB_Setup
https://wiki.samba.org/index.php/GFS_CTDB_HowTo
which is of course redhat-cluster/gfs specific as
far as the file system is concerned.
The manual pages of ctdb are also pretty good by now!
It is true that Debian is especially rare in
these clustered Samba docs. This is due to
the fact, that for a long time, there was no
big focus on cluster file systems in Debian.
Setting up the cluster on Debian seems to be easier than on redhat, I
don't have to compile anything. The problems only start when I try to
setup CTDB.
Post by Michael Adam
A couple of years ago, I have given a couple
of talks and courses about clustered Samba
partly also using Debian and Gluster, but also
RedHat and GFS. My 2010 notes
http://www.samba.org/~obnox/presentations/sambaXP-2010/sambaxp-2010-tutorial-ctdb-handout.pdf
even contain some note on what is different
and needs to be considered on a Debian systems.
(The main difference is that /etc/default is
used instead of /etc/sysconfig by most packages.)
At that time, OCFS2 was not yet ready for
CTDB/Samba, and GlusterFS was mainly available
in Debian, but did not perform very well.
Now Gluster has been taken on by RedHat, and
I think we can expect updates from that corner in the
nearer future.
Here is a paper explaining some of the
fundamentals of CTDB and the basic configuration
of CTDB and clustered samba on Top.
http://www.samba.org/~obnox/presentations/sambaXP-2009/samba-and-ctdb.pdf
http://www.linux-magazine.com/Issues/2009/105/Samba-for-Clusters
http://www.linux-magazine.com/Issues/2009/105/Samba-s-Registry
Some of the details need to be checked because the
info is 4-5 years old, but the basics are still
correct.
As a last but possibly important remark (for those
who have not fallen asleep by now ;-) let me
mention that as far as I know, Stefan Kania
(whom I have copied in this reply, since I am not
certain that he is subscribed to the list) is currently
working on a reproducible setup with Debian/Gluster/CTDB/Samba.
Maybe he can give some insight on the specialities of
a CTDB setup on current Debian.
If he could chime in, it would be very welcome.
Post by Michael Adam
I apologize for having pointed mainly to my own
writings, but at the time I have undertaken some
considerable efforts to create some documentation
and explanations, so I hope this is of any use. :-)
Cheers - Michael
PS: Also, you are asking why Ronnie suggests a single node
cluster.
Because it is always good to start out with a simple case.
And once you mastered that, move on to more complicated things.
And you can even see most config points in that 1-node cluster.
OK. I can take a hint, I will start again with one node :-)

Rowland
Post by Michael Adam
Post by Rowland Penny
Post by ronnie sahlberg
I just tried building a single-node "cluster" on debian with ctdb.
Why a single node ????
Post by ronnie sahlberg
I can check building a 4 node cluster next week when I am home from my travels.
Try it with two nodes
Hang on, you said 'debian' above
Post by ronnie sahlberg
apt-get install ctdb
2, create a missing directory
mkdir -p /var/lib/run/ctdb
Why is there a missing directory, sounds like a bug to me.
Post by ronnie sahlberg
3, remove the reclock file
vi /etc/default/ctdb
and comment out CTDB_RECOVERY_LOCK
But I want the lock.
Post by ronnie sahlberg
4, create a nodes file
vi /etc/ctdb/nodes
and add the line 127.0.0.1
Yes, but why '127.0.0.1' ???
Post by ronnie sahlberg
5, create a public addresses file
vi /etc/ctdb/public_addresses
and add the two lines
127.0.0.2/8 lo
127.0.0.3/8 lo
Do you have to create these ipaddresses, if so where and how
Post by ronnie sahlberg
6, start ctdb
service ctdb start
That is this first part I really understood.
Post by ronnie sahlberg
then check everything looks fine with 'ctbb status' and 'tail
/var/log/ctdb/log.ctdb'
That will not really create a very interesting cluster, just one node,
two public addresses and all on loopback.
But this should at least verify that ctdbd will start and run.
Then you can just shut it down and edit
/etc/ctdb/nodes|public_addresses and make them more interesting.
Again, why just one node. ??
Post by ronnie sahlberg
I personally suggest never running anything smaller than 4 node
clusters for real data.
Yes, but I am testing, so where is the documentation for people like me, who
just want to get a couple of nodes up and running ???
Rowland
Post by ronnie sahlberg
Please see
man ctdb
man ctdbd
less /etc/default/ctdb
http://ctdb.samba.org/configuring.html
it should contain most to get started with ctdb.
regards
ronnie sahlberg.
Michael Adam
2014-12-12 21:47:24 UTC
Permalink
Post by Rowland Penny
Post by Michael Adam
I am really sorry that you are frustrated.
Pleas keep on!
Frustrated might just be an understatement :-D
I think that might also partly stem from the fact that that
this kind of troubleshooting/support is extremely difficult
to do via email.

Interactive support via Irc / chat / voice / shared screen/tmux
session is much better for stuff like that.
Post by Rowland Penny
Post by Michael Adam
We do try to document stuff, but clusters with
ctdb is still an area that is not done by
the average admin. And there are not so many
people working on the code, so as usual,
documentation sadly lacks behind.
Someone like you, who with the help of us devs
finally gets to a working system, might help
us in the end to improve the docs! :-)
ER, thats why I am trying to get to work, then I can update the samba wiki.
Cool! Keep it up! :-)
Post by Rowland Penny
Post by Michael Adam
That being said, there is documentation out
there, but not necessarily terribly up to date,
and not precisely what you want.
Yes, I think I have read most of it :-)
Ok... so far so good.
Post by Rowland Penny
Setting up the cluster on Debian seems to be easier than on redhat, I don't
have to compile anything.
Well, that's not the case any more.
And there the RPMs by SerNet if you don't
want to go with the distro versions.
Post by Rowland Penny
The problems only start when I try to setup CTDB.
Ok, I suggest, you move over to email
and try to grab one one of us in the
#ctdb (and/or #samba-technical) channel
and do stuff interactively.

Cheers - Michael (obnox)
Martin Schwenke
2014-12-13 09:02:07 UTC
Permalink
[+CC: Mathieu]

On Fri, 12 Dec 2014 09:21:57 -0500, ronnie sahlberg
Post by ronnie sahlberg
Post by Rowland Penny
Post by ronnie sahlberg
2, create a missing directory
mkdir -p /var/lib/run/ctdb
Why is there a missing directory, sounds like a bug to me.
Yepp it definitely is.
There is something wrong with the deb package that at least ubuntu
14.10 ships with since it does not create required directory
structure.
Should open a bug with debian and/or ubuntu so that their folks that
package this can fix the bug.
The main issue here is not a missing directory. It is this in
debian/rules:

conf_args = \
--localstatedir=/var/lib \
--with-socketpath=/var/run/ctdb/ctdbd.socket \
--with-logdir=/var/log/ctdb \
--enable-pmda

I don't understand why --localstatedir isn't isn't /var. :-(
If /var is used then the missing directory problem goes away
because /var/run/ctdb/ is used as expected.

Similarly, I couldn't find the log file until I re-read Ronnie's email,
and the reason is also above. Sure, the log can go into /var/log/ctdb/
but then things like /usr/sbin/ctdbd_wrapper need to be edited
correspondingly. Right now ctdbd_wrapper logs a message at startup
saying that syslog isn't being used and that logs can be found
in /var/log/log.ctdb... but that's not where they are.

I've reported the issue of --localstatedir=/var/lib resulting in a
missing directory as a grave bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773016

Hopefully that means it can be fixed for jessie.

I'll leave it to someone else to report the log location issue.

peace & happiness,
martin
Rowland Penny
2014-12-13 12:17:11 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Michael, hi Rowland,
Post by Michael Adam
Hi Rowland,
I am really sorry that you are frustrated. Pleas keep on!
We do try to document stuff, but clusters with ctdb is still an
area that is not done by the average admin. And there are not so
many people working on the code, so as usual, documentation sadly
lacks behind.
I'm on the way writing a documentation for ADMINS ;-). As soon as I
finished withe te german version (that will be if everything is
working) I will translate it. Rowland, if you want, I can send you my
german documentation if you want, at least you can get all the
commands and config-file information out of it.
I quite willing to read any documentation, so yes please.

Rowland
Post by Michael Adam
Someone like you, who with the help of us devs finally gets to a
working system, might help us in the end to improve the docs! :-)
That's what I want.
Post by Michael Adam
Clustering also relies on a whole additional area which is the
corresponding cluster file systems used underneath, which makes it
difficult to have comprehensive documentation for a concrete
setup.
That wat I'm fighting with, no documentation or only
developers-klingon-language ;-)
Post by Michael Adam
That being said, there is documentation out there, but not
necessarily terribly up to date, and not precisely what you want.
On the samba wiki, there is the generic but somewhat superficial
https://wiki.samba.org/index.php/CTDB_Setup
https://wiki.samba.org/index.php/GFS_CTDB_HowTo
which is of course redhat-cluster/gfs specific as far as the file
system is concerned.
The manual pages of ctdb are also pretty good by now!
That' right, BUT the wiki not mentions how to get the cluster into a
samba4 domain and how permissions work on a cluster-fs
Post by Michael Adam
It is true that Debian is especially rare in these clustered Samba
docs. This is due to the fact, that for a long time, there was no
big focus on cluster file systems in Debian.
A couple of years ago, I have given a couple of talks and courses
about clustered Samba partly also using Debian and Gluster, but
also RedHat and GFS. My 2010 notes
http://www.samba.org/~obnox/presentations/sambaXP-2010/sambaxp-2010-tutorial-ctdb-handout.pdf
even contain some note on what is different and needs to be
considered on a Debian systems. (The main difference is that
/etc/default is used instead of /etc/sysconfig by most packages.)
At that time, OCFS2 was not yet ready for CTDB/Samba, and GlusterFS
was mainly available in Debian, but did not perform very well. Now
Gluster has been taken on by RedHat, and I think we can expect
updates from that corner in the nearer future.
Here is a paper explaining some of the fundamentals of CTDB and the
basic configuration of CTDB and clustered samba on Top.
http://www.samba.org/~obnox/presentations/sambaXP-2009/samba-and-ctdb.pdf
http://www.linux-magazine.com/Issues/2009/105/Samba-for-Clusters
http://www.linux-magazine.com/Issues/2009/105/Samba-s-Registry
Some of the details need to be checked because the info is 4-5
years old, but the basics are still correct.
As a last but possibly important remark (for those who have not
fallen asleep by now ;-) let me mention that as far as I know,
Stefan Kania (whom I have copied in this reply, since I am not
certain that he is subscribed to the list) is currently working on
a reproducible setup with Debian/Gluster/CTDB/Samba. Maybe he can
give some insight on the specialities of a CTDB setup on current
Debian.
I'm reading the samba-mailinglist. Is there another one?
Post by Michael Adam
I apologize for having pointed mainly to my own writings, but at
the time I have undertaken some considerable efforts to create some
documentation and explanations, so I hope this is of any use. :-)
Regards
Stefan
Post by Michael Adam
Cheers - Michael
PS: Also, you are asking why Ronnie suggests a single node
cluster. Because it is always good to start out with a simple
case. And once you mastered that, move on to more complicated
things. And you can even see most config points in that 1-node
cluster.
Post by Rowland Penny
Post by ronnie sahlberg
I just tried building a single-node "cluster" on debian with
ctdb.
Why a single node ????
Post by ronnie sahlberg
I can check building a 4 node cluster next week when I am home
from my travels.
Try it with two nodes
Hang on, you said 'debian' above
Post by ronnie sahlberg
1, Install the ctdb package: apt-get install ctdb
2, create a missing directory mkdir -p /var/lib/run/ctdb
Why is there a missing directory, sounds like a bug to me.
Post by ronnie sahlberg
3, remove the reclock file vi /etc/default/ctdb and comment out
CTDB_RECOVERY_LOCK
But I want the lock.
Post by ronnie sahlberg
4, create a nodes file vi /etc/ctdb/nodes and add the line
127.0.0.1
Yes, but why '127.0.0.1' ???
Post by ronnie sahlberg
5, create a public addresses file vi
/etc/ctdb/public_addresses and add the two lines 127.0.0.2/8
lo 127.0.0.3/8 lo
Do you have to create these ipaddresses, if so where and how
Post by ronnie sahlberg
6, start ctdb service ctdb start
That is this first part I really understood.
Post by ronnie sahlberg
then check everything looks fine with 'ctbb status' and 'tail
/var/log/ctdb/log.ctdb'
That will not really create a very interesting cluster, just
one node, two public addresses and all on loopback. But this
should at least verify that ctdbd will start and run. Then you
can just shut it down and edit /etc/ctdb/nodes|public_addresses
and make them more interesting.
Again, why just one node. ??
Post by ronnie sahlberg
I personally suggest never running anything smaller than 4
node clusters for real data.
Yes, but I am testing, so where is the documentation for people
like me, who just want to get a couple of nodes up and running
???
Rowland
Post by ronnie sahlberg
Please see man ctdb man ctdbd less /etc/default/ctdb
http://ctdb.samba.org/configuring.html
it should contain most to get started with ctdb.
regards ronnie sahlberg.
Rowland Penny
2014-12-13 12:31:53 UTC
Permalink
Post by Michael Adam
Post by Rowland Penny
Post by Michael Adam
I am really sorry that you are frustrated.
Pleas keep on!
Frustrated might just be an understatement :-D
I think that might also partly stem from the fact that that
this kind of troubleshooting/support is extremely difficult
to do via email.
Interactive support via Irc / chat / voice / shared screen/tmux
session is much better for stuff like that.
Post by Rowland Penny
Post by Michael Adam
We do try to document stuff, but clusters with
ctdb is still an area that is not done by
the average admin. And there are not so many
people working on the code, so as usual,
documentation sadly lacks behind.
Someone like you, who with the help of us devs
finally gets to a working system, might help
us in the end to improve the docs! :-)
ER, thats why I am trying to get to work, then I can update the samba wiki.
Cool! Keep it up! :-)
Post by Rowland Penny
Post by Michael Adam
That being said, there is documentation out
there, but not necessarily terribly up to date,
and not precisely what you want.
Yes, I think I have read most of it :-)
Ok... so far so good.
Post by Rowland Penny
Setting up the cluster on Debian seems to be easier than on redhat, I don't
have to compile anything.
Well, that's not the case any more.
And there the RPMs by SerNet if you don't
want to go with the distro versions.
Post by Rowland Penny
The problems only start when I try to setup CTDB.
Ok, I suggest, you move over to email
and try to grab one one of us in the
#ctdb (and/or #samba-technical) channel
and do stuff interactively.
Cheers - Michael (obnox)
OK, I now have a single node up and running as per the instructions
provided by Ronnie. I just have a few questions:

there is this in the ctdb log

2014/12/13 11:52:43.522708 [ 5740]: Set runstate to INIT (1)
2014/12/13 11:52:43.540992 [ 5740]: 00.ctdb: awk: line 2: function
gensub never defined
2014/12/13 11:52:43.543178 [ 5740]: 00.ctdb: awk: line 2: function
gensub never defined
2014/12/13 11:52:43.545354 [ 5740]: 00.ctdb: awk: line 2: function
gensub never defined

2014/12/13 11:52:56.931393 [recoverd: 5887]: We are still serving a
public IP '127.0.0.3' that we should not be serving. Removing it
2014/12/13 11:52:56.931536 [ 5740]: Could not find which interface the
ip address is hosted on. can not release it
2014/12/13 11:52:56.931648 [recoverd: 5887]: We are still serving a
public IP '127.0.0.2' that we should not be serving. Removing it

The above three lines are there 4 times

the final 4 lines are:

2014/12/13 11:53:02.982441 [ 5740]: monitor event OK - node re-enabled
2014/12/13 11:53:02.982480 [ 5740]: Node became HEALTHY. Ask recovery
master 0 to perform ip reallocation
2014/12/13 11:53:02.982733 [recoverd: 5887]: Node 0 has changed flags -
now 0x0 was 0x2
2014/12/13 11:53:02.983266 [recoverd: 5887]: Takeover run starting
2014/12/13 11:53:03.046859 [recoverd: 5887]: Takeover run completed
successfully

ctdb status shows:

Number of nodes:1
pnn:0 127.0.0.1 OK (THIS NODE)
Generation:740799152
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0

So, how do I get rid of the 'awk' error lines ? Why shouldn't the public
IP lines be served ?

Now I know it works, I just have to pull it all together.

Rowland
Michael Adam
2014-12-13 13:55:31 UTC
Permalink
Post by Rowland Penny
OK, I now have a single node up and running as per the
instructions provided by Ronnie.
Yay!
Post by Rowland Penny
there is this in the ctdb log
2014/12/13 11:52:43.522708 [ 5740]: Set runstate to INIT (1)
2014/12/13 11:52:43.540992 [ 5740]: 00.ctdb: awk: line 2: function gensub
never defined
2014/12/13 11:52:43.543178 [ 5740]: 00.ctdb: awk: line 2: function gensub
never defined
2014/12/13 11:52:43.545354 [ 5740]: 00.ctdb: awk: line 2: function gensub
never defined
In Debian, there are several possible awk versions
that provide awk: at least: mawk, gawk, original-awk.
Which is chosen if multiple are installed, depends on
the alternatives-mechanism:
update-alternatives --display awk
update-alternatives --edit awk

A quick web search has reveiled that only the gawk
(gnu awk) variant might feature the needed gensub
function.

Maybe we should change "awk" to "gawk" in our scripts
and packages would need to adapt their dependencies.
Post by Rowland Penny
2014/12/13 11:52:56.931393 [recoverd: 5887]: We are still serving a public
IP '127.0.0.3' that we should not be serving. Removing it
2014/12/13 11:52:56.931536 [ 5740]: Could not find which interface the ip
address is hosted on. can not release it
2014/12/13 11:52:56.931648 [recoverd: 5887]: We are still serving a public
IP '127.0.0.2' that we should not be serving. Removing it
The above three lines are there 4 times
I guess this will not be the case any more, when you
move to a more realistic setup where you don't use loopback
for hosting nodes internal and public addresses, but
for a start that is ok.
Post by Rowland Penny
2014/12/13 11:53:02.982441 [ 5740]: monitor event OK - node re-enabled
2014/12/13 11:53:02.982480 [ 5740]: Node became HEALTHY. Ask recovery master
0 to perform ip reallocation
2014/12/13 11:53:02.982733 [recoverd: 5887]: Node 0 has changed flags - now
0x0 was 0x2
2014/12/13 11:53:02.983266 [recoverd: 5887]: Takeover run starting
2014/12/13 11:53:03.046859 [recoverd: 5887]: Takeover run completed
successfully
Number of nodes:1
pnn:0 127.0.0.1 OK (THIS NODE)
Generation:740799152
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0
Great!
Post by Rowland Penny
Now I know it works, I just have to pull it all together.
Right. Next step: take a "real" ethernet interface
and first use that for nodes address. You can even
start here with a single node.

You can also go towards more realistic clusters in two
steps: First no public addresse, only the nodes file.
That is the core of a ctdb cluster. Then you can go
towards cluster-resource management and add public
addresse and also CTDB_MANAGES_SAMBA and friends.

One further note:
Virtual machines or even containers (lxc or docker) are
awesome for setting up such clusters for learing and
testing. I use that for development myselves.

And here is one (imho) very neat trick:
If you use lxc containers (or docker can probably also
do that), you can completely take the complexity
of having to set up a cluster file system out of
the equation: You can just bind mount a directory
of the host file system into the node containers'
root file systems by the lxc fstab file.
Thereby you have a posix-file system that is shared
between the nodes and you can use that as cluster FS.

This way, you can concentrate on ctdb and samba immediately
until you are comfortable with that.

I wanted at some point to provide a mechanism to set
such a thing up automatically, by just providing some
config files. Maybe I'll investigate the vagrant+puppet
approach that Ralph Böhme has recently posted in this
or a related thread...

Cheers - Michael
Rowland Penny
2014-12-13 14:10:02 UTC
Permalink
Post by Michael Adam
Post by Rowland Penny
OK, I now have a single node up and running as per the
instructions provided by Ronnie.
Yay!
Post by Rowland Penny
there is this in the ctdb log
2014/12/13 11:52:43.522708 [ 5740]: Set runstate to INIT (1)
2014/12/13 11:52:43.540992 [ 5740]: 00.ctdb: awk: line 2: function gensub
never defined
2014/12/13 11:52:43.543178 [ 5740]: 00.ctdb: awk: line 2: function gensub
never defined
2014/12/13 11:52:43.545354 [ 5740]: 00.ctdb: awk: line 2: function gensub
never defined
In Debian, there are several possible awk versions
that provide awk: at least: mawk, gawk, original-awk.
Which is chosen if multiple are installed, depends on
update-alternatives --display awk
update-alternatives --edit awk
A quick web search has reveiled that only the gawk
(gnu awk) variant might feature the needed gensub
function.
Maybe we should change "awk" to "gawk" in our scripts
and packages would need to adapt their dependencies.
Installing gawk fixed the first problem, so I think that something needs
to be done, check for 'gawk' and refuse to do anything if it cannot be
found ?
Post by Michael Adam
Post by Rowland Penny
2014/12/13 11:52:56.931393 [recoverd: 5887]: We are still serving a public
IP '127.0.0.3' that we should not be serving. Removing it
2014/12/13 11:52:56.931536 [ 5740]: Could not find which interface the ip
address is hosted on. can not release it
2014/12/13 11:52:56.931648 [recoverd: 5887]: We are still serving a public
IP '127.0.0.2' that we should not be serving. Removing it
The above three lines are there 4 times
I guess this will not be the case any more, when you
move to a more realistic setup where you don't use loopback
for hosting nodes internal and public addresses, but
for a start that is ok.
OK
Post by Michael Adam
Post by Rowland Penny
2014/12/13 11:53:02.982441 [ 5740]: monitor event OK - node re-enabled
2014/12/13 11:53:02.982480 [ 5740]: Node became HEALTHY. Ask recovery master
0 to perform ip reallocation
2014/12/13 11:53:02.982733 [recoverd: 5887]: Node 0 has changed flags - now
0x0 was 0x2
2014/12/13 11:53:02.983266 [recoverd: 5887]: Takeover run starting
2014/12/13 11:53:03.046859 [recoverd: 5887]: Takeover run completed
successfully
Number of nodes:1
pnn:0 127.0.0.1 OK (THIS NODE)
Generation:740799152
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0
Great!
Post by Rowland Penny
Now I know it works, I just have to pull it all together.
Right. Next step: take a "real" ethernet interface
and first use that for nodes address. You can even
start here with a single node.
You can also go towards more realistic clusters in two
steps: First no public addresse, only the nodes file.
That is the core of a ctdb cluster. Then you can go
towards cluster-resource management and add public
addresse and also CTDB_MANAGES_SAMBA and friends.
Virtual machines or even containers (lxc or docker) are
awesome for setting up such clusters for learing and
testing. I use that for development myselves.
I am testing in a couple of VM's
Post by Michael Adam
If you use lxc containers (or docker can probably also
do that), you can completely take the complexity
of having to set up a cluster file system out of
the equation: You can just bind mount a directory
of the host file system into the node containers'
root file systems by the lxc fstab file.
Thereby you have a posix-file system that is shared
between the nodes and you can use that as cluster FS.
This way, you can concentrate on ctdb and samba immediately
until you are comfortable with that.
I wanted at some point to provide a mechanism to set
such a thing up automatically, by just providing some
config files. Maybe I'll investigate the vagrant+puppet
approach that Ralph Böhme has recently posted in this
or a related thread...
Cheers - Michael
OK, onwards and upwards :-)

Rowland
Stefan Kania
2014-12-13 11:48:01 UTC
Permalink
Hi Michael, hi Rowland,
Post by Michael Adam
Hi Rowland,
I am really sorry that you are frustrated. Pleas keep on!
We do try to document stuff, but clusters with ctdb is still an
area that is not done by the average admin. And there are not so
many people working on the code, so as usual, documentation sadly
lacks behind.
I'm on the way writing a documentation for ADMINS ;-). As soon as I
finished withe te german version (that will be if everything is
working) I will translate it. Rowland, if you want, I can send you my
german documentation if you want, at least you can get all the
commands and config-file information out of it.
Post by Michael Adam
Someone like you, who with the help of us devs finally gets to a
working system, might help us in the end to improve the docs! :-)
That's what I want.
Post by Michael Adam
Clustering also relies on a whole additional area which is the
corresponding cluster file systems used underneath, which makes it
difficult to have comprehensive documentation for a concrete
setup.
That wat I'm fighting with, no documentation or only
developers-klingon-language ;-)
Post by Michael Adam
That being said, there is documentation out there, but not
necessarily terribly up to date, and not precisely what you want.
On the samba wiki, there is the generic but somewhat superficial
https://wiki.samba.org/index.php/CTDB_Setup
https://wiki.samba.org/index.php/GFS_CTDB_HowTo
which is of course redhat-cluster/gfs specific as far as the file
system is concerned.
The manual pages of ctdb are also pretty good by now!
That' right, BUT the wiki not mentions how to get the cluster into a
samba4 domain and how permissions work on a cluster-fs
Post by Michael Adam
It is true that Debian is especially rare in these clustered Samba
docs. This is due to the fact, that for a long time, there was no
big focus on cluster file systems in Debian.
A couple of years ago, I have given a couple of talks and courses
about clustered Samba partly also using Debian and Gluster, but
also RedHat and GFS. My 2010 notes
http://www.samba.org/~obnox/presentations/sambaXP-2010/sambaxp-2010-tutorial-ctdb-handout.pdf
even contain some note on what is different and needs to be
considered on a Debian systems. (The main difference is that
/etc/default is used instead of /etc/sysconfig by most packages.)
At that time, OCFS2 was not yet ready for CTDB/Samba, and GlusterFS
was mainly available in Debian, but did not perform very well. Now
Gluster has been taken on by RedHat, and I think we can expect
updates from that corner in the nearer future.
Here is a paper explaining some of the fundamentals of CTDB and the
basic configuration of CTDB and clustered samba on Top.
http://www.samba.org/~obnox/presentations/sambaXP-2009/samba-and-ctdb.pdf
http://www.linux-magazine.com/Issues/2009/105/Samba-for-Clusters
http://www.linux-magazine.com/Issues/2009/105/Samba-s-Registry
Some of the details need to be checked because the info is 4-5
years old, but the basics are still correct.
As a last but possibly important remark (for those who have not
fallen asleep by now ;-) let me mention that as far as I know,
Stefan Kania (whom I have copied in this reply, since I am not
certain that he is subscribed to the list) is currently working on
a reproducible setup with Debian/Gluster/CTDB/Samba. Maybe he can
give some insight on the specialities of a CTDB setup on current
Debian.
I'm reading the samba-mailinglist. Is there another one?
Post by Michael Adam
I apologize for having pointed mainly to my own writings, but at
the time I have undertaken some considerable efforts to create some
documentation and explanations, so I hope this is of any use. :-)
Regards

Stefan
Post by Michael Adam
Cheers - Michael
PS: Also, you are asking why Ronnie suggests a single node
cluster. Because it is always good to start out with a simple
case. And once you mastered that, move on to more complicated
things. And you can even see most config points in that 1-node
cluster.
Post by Rowland Penny
Post by ronnie sahlberg
I just tried building a single-node "cluster" on debian with
ctdb.
Why a single node ????
Post by ronnie sahlberg
I can check building a 4 node cluster next week when I am home
from my travels.
Try it with two nodes
Hang on, you said 'debian' above
Post by ronnie sahlberg
1, Install the ctdb package: apt-get install ctdb
2, create a missing directory mkdir -p /var/lib/run/ctdb
Why is there a missing directory, sounds like a bug to me.
Post by ronnie sahlberg
3, remove the reclock file vi /etc/default/ctdb and comment out
CTDB_RECOVERY_LOCK
But I want the lock.
Post by ronnie sahlberg
4, create a nodes file vi /etc/ctdb/nodes and add the line
127.0.0.1
Yes, but why '127.0.0.1' ???
Post by ronnie sahlberg
5, create a public addresses file vi
/etc/ctdb/public_addresses and add the two lines 127.0.0.2/8
lo 127.0.0.3/8 lo
Do you have to create these ipaddresses, if so where and how
Post by ronnie sahlberg
6, start ctdb service ctdb start
That is this first part I really understood.
Post by ronnie sahlberg
then check everything looks fine with 'ctbb status' and 'tail
/var/log/ctdb/log.ctdb'
That will not really create a very interesting cluster, just
one node, two public addresses and all on loopback. But this
should at least verify that ctdbd will start and run. Then you
can just shut it down and edit /etc/ctdb/nodes|public_addresses
and make them more interesting.
Again, why just one node. ??
Post by ronnie sahlberg
I personally suggest never running anything smaller than 4
node clusters for real data.
Yes, but I am testing, so where is the documentation for people
like me, who just want to get a couple of nodes up and running
???
Rowland
Post by ronnie sahlberg
Please see man ctdb man ctdbd less /etc/default/ctdb
http://ctdb.samba.org/configuring.html
it should contain most to get started with ctdb.
regards ronnie sahlberg.
- --
Stefan Kania
Landweg 13
25693 St. Michaelisdonn


Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
E-Mail. Weiter Informationen unter http://www.gnupg.org

Mein Schlüssel liegt auf

hkp://subkeys.pgp.net
Martin Schwenke
2014-12-14 09:31:44 UTC
Permalink
On Thu, 11 Dec 2014 13:03:18 -0500, ronnie sahlberg
Post by ronnie sahlberg
Also see
man ctdb
and
man ctdbd
The two manpages contain a lot of good information.
These days there is also ctdb(7) (via "man 7 ctdb"), which is meant to
give a reasonable overview of CTDB. We've moved a lot of the overview
and common material into this manpage. It also has pointers to the
other manpages.

peace & happiness,
martin
Martin Schwenke
2014-12-14 09:42:17 UTC
Permalink
Post by Michael Adam
Maybe we should change "awk" to "gawk" in our scripts
and packages would need to adapt their dependencies.
Yeah, perhaps. That is probably the simplest way.

The eventscript are hardly portable but, given that there are only a
few uses of gensub(), I'll take a look at the relevant code and see if
it is worth switching to gsub() instead for a little added portability.
Either way, you'll probably see some patches for that this week.
Post by Michael Adam
Post by Rowland Penny
2014/12/13 11:52:56.931393 [recoverd: 5887]: We are still serving a public
IP '127.0.0.3' that we should not be serving. Removing it
2014/12/13 11:52:56.931536 [ 5740]: Could not find which interface the ip
address is hosted on. can not release it
2014/12/13 11:52:56.931648 [recoverd: 5887]: We are still serving a public
IP '127.0.0.2' that we should not be serving. Removing it
The above three lines are there 4 times
I guess this will not be the case any more, when you
move to a more realistic setup where you don't use loopback
for hosting nodes internal and public addresses, but
for a start that is ok.
Right. CTDB checks to see if a public IP address is on an interface by
attempting to bind to a socket with that address. 127.0.0.1/8 loopback
addresses are magic so can always be bound to, something we take
advantage of in tests.

peace & happiness,
martin
Martin Schwenke
2014-12-14 09:46:36 UTC
Permalink
On Sat, 13 Dec 2014 14:10:02 +0000, Rowland Penny
Post by Rowland Penny
Installing gawk fixed the first problem, so I think that something needs
to be done, check for 'gawk' and refuse to do anything if it cannot be
found ?
If we switch to explicitly using "gawk" then this should become a
Debian package dependency, so it shouldn't even install.

I'll see which way to go by comparing POSIX awk, mawk and gawk.

peace & happiness,
martin
Rowland Penny
2014-12-15 13:25:34 UTC
Permalink
Post by Michael Adam
Post by Rowland Penny
OK, I now have a single node up and running as per the
instructions provided by Ronnie.
Yay!
Post by Rowland Penny
there is this in the ctdb log
2014/12/13 11:52:43.522708 [ 5740]: Set runstate to INIT (1)
2014/12/13 11:52:43.540992 [ 5740]: 00.ctdb: awk: line 2: function gensub
never defined
2014/12/13 11:52:43.543178 [ 5740]: 00.ctdb: awk: line 2: function gensub
never defined
2014/12/13 11:52:43.545354 [ 5740]: 00.ctdb: awk: line 2: function gensub
never defined
In Debian, there are several possible awk versions
that provide awk: at least: mawk, gawk, original-awk.
Which is chosen if multiple are installed, depends on
update-alternatives --display awk
update-alternatives --edit awk
A quick web search has reveiled that only the gawk
(gnu awk) variant might feature the needed gensub
function.
Maybe we should change "awk" to "gawk" in our scripts
and packages would need to adapt their dependencies.
Post by Rowland Penny
2014/12/13 11:52:56.931393 [recoverd: 5887]: We are still serving a public
IP '127.0.0.3' that we should not be serving. Removing it
2014/12/13 11:52:56.931536 [ 5740]: Could not find which interface the ip
address is hosted on. can not release it
2014/12/13 11:52:56.931648 [recoverd: 5887]: We are still serving a public
IP '127.0.0.2' that we should not be serving. Removing it
The above three lines are there 4 times
I guess this will not be the case any more, when you
move to a more realistic setup where you don't use loopback
for hosting nodes internal and public addresses, but
for a start that is ok.
Post by Rowland Penny
2014/12/13 11:53:02.982441 [ 5740]: monitor event OK - node re-enabled
2014/12/13 11:53:02.982480 [ 5740]: Node became HEALTHY. Ask recovery master
0 to perform ip reallocation
2014/12/13 11:53:02.982733 [recoverd: 5887]: Node 0 has changed flags - now
0x0 was 0x2
2014/12/13 11:53:02.983266 [recoverd: 5887]: Takeover run starting
2014/12/13 11:53:03.046859 [recoverd: 5887]: Takeover run completed
successfully
Number of nodes:1
pnn:0 127.0.0.1 OK (THIS NODE)
Generation:740799152
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0
Great!
Post by Rowland Penny
Now I know it works, I just have to pull it all together.
Right. Next step: take a "real" ethernet interface
and first use that for nodes address. You can even
start here with a single node.
You can also go towards more realistic clusters in two
steps: First no public addresse, only the nodes file.
That is the core of a ctdb cluster. Then you can go
towards cluster-resource management and add public
addresse and also CTDB_MANAGES_SAMBA and friends.
Virtual machines or even containers (lxc or docker) are
awesome for setting up such clusters for learing and
testing. I use that for development myselves.
If you use lxc containers (or docker can probably also
do that), you can completely take the complexity
of having to set up a cluster file system out of
the equation: You can just bind mount a directory
of the host file system into the node containers'
root file systems by the lxc fstab file.
Thereby you have a posix-file system that is shared
between the nodes and you can use that as cluster FS.
This way, you can concentrate on ctdb and samba immediately
until you are comfortable with that.
I wanted at some point to provide a mechanism to set
such a thing up automatically, by just providing some
config files. Maybe I'll investigate the vagrant+puppet
approach that Ralph Böhme has recently posted in this
or a related thread...
Cheers - Michael
Getting closer :-)

I now have two ctdb nodes up and running:

***@cluster1:~# ctdb status
Number of nodes:3 (including 1 deleted nodes)
pnn:1 192.168.1.10 OK (THIS NODE)
pnn:2 192.168.1.11 OK
Generation:1073761636
Size:2
hash:0 lmaster:1
hash:1 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:1

This is with CTDB_RECOVERY_LOCK turned off, if I turn it on, the nodes
go unhealthy. I am putting the lockfile on the shared cluster, should I
be putting it somewhere else, it says at the top of /etc/default/ctdb :

# Shared recovery lock file to avoid split brain. No default.
#
# Do NOT run CTDB without a recovery lock file unless you know exactly
# what you are doing.
#CTDB_RECOVERY_LOCK=/some/place/on/shared/storage

As I don't know what I am doing, I need to run the recovery lockfile :-D

Rowland
Stefan Kania
2014-12-15 14:14:37 UTC
Permalink
Hi Rowland,
Post by Rowland Penny
Post by Rowland Penny
OK, I now have a single node up and running as per the
instructions provided by Ronnie.
Yay!
Post by Rowland Penny
there is this in the ctdb log
2014/12/13 11:52:43.522708 [ 5740]: Set runstate to INIT (1)
function gensub never defined 2014/12/13 11:52:43.543178 [
5740]: 00.ctdb: awk: line 2: function gensub never defined
function gensub never defined
In Debian, there are several possible awk versions that provide
awk: at least: mawk, gawk, original-awk. Which is chosen if
update-alternatives --display awk update-alternatives --edit awk
A quick web search has reveiled that only the gawk (gnu awk)
variant might feature the needed gensub function.
Maybe we should change "awk" to "gawk" in our scripts and
packages would need to adapt their dependencies.
Post by Rowland Penny
2014/12/13 11:52:56.931393 [recoverd: 5887]: We are still
serving a public IP '127.0.0.3' that we should not be serving.
Removing it 2014/12/13 11:52:56.931536 [ 5740]: Could not find
which interface the ip address is hosted on. can not release
it 2014/12/13 11:52:56.931648 [recoverd: 5887]: We are still
serving a public IP '127.0.0.2' that we should not be serving.
Removing it
The above three lines are there 4 times
I guess this will not be the case any more, when you move to a
more realistic setup where you don't use loopback for hosting
nodes internal and public addresses, but for a start that is ok.
Post by Rowland Penny
2014/12/13 11:53:02.982441 [ 5740]: monitor event OK - node
re-enabled 2014/12/13 11:53:02.982480 [ 5740]: Node became
HEALTHY. Ask recovery master 0 to perform ip reallocation
2014/12/13 11:53:02.982733 [recoverd: 5887]: Node 0 has changed
5887]: Takeover run starting 2014/12/13 11:53:03.046859
[recoverd: 5887]: Takeover run completed successfully
Number of nodes:1 pnn:0 127.0.0.1 OK (THIS NODE)
Generation:740799152 Size:1 hash:0 lmaster:0 Recovery
mode:NORMAL (0) Recovery master:0
Great!
Post by Rowland Penny
Now I know it works, I just have to pull it all together.
Right. Next step: take a "real" ethernet interface and first use
that for nodes address. You can even start here with a single
node.
First no public addresse, only the nodes file. That is the core
of a ctdb cluster. Then you can go towards cluster-resource
management and add public addresse and also CTDB_MANAGES_SAMBA
and friends.
One further note: Virtual machines or even containers (lxc or
docker) are awesome for setting up such clusters for learing and
testing. I use that for development myselves.
And here is one (imho) very neat trick: If you use lxc containers
(or docker can probably also do that), you can completely take
the complexity of having to set up a cluster file system out of
the equation: You can just bind mount a directory of the host
file system into the node containers' root file systems by the
lxc fstab file. Thereby you have a posix-file system that is
shared between the nodes and you can use that as cluster FS.
This way, you can concentrate on ctdb and samba immediately until
you are comfortable with that.
I wanted at some point to provide a mechanism to set such a thing
up automatically, by just providing some config files. Maybe I'll
investigate the vagrant+puppet approach that Ralph Böhme has
recently posted in this or a related thread...
Cheers - Michael
Getting closer :-)
nodes) pnn:1 192.168.1.10 OK (THIS NODE) pnn:2 192.168.1.11
OK Generation:1073761636 Size:2 hash:0 lmaster:1 hash:1 lmaster:2
Recovery mode:NORMAL (0) Recovery master:1
This is with CTDB_RECOVERY_LOCK turned off, if I turn it on, the
nodes go unhealthy. I am putting the lockfile on the shared
cluster, should I be putting it somewhere else, it says at the top
# Shared recovery lock file to avoid split brain. No default. # #
Do NOT run CTDB without a recovery lock file unless you know
exactly # what you are doing.
#CTDB_RECOVERY_LOCK=/some/place/on/shared/storage
As I don't know what I am doing, I need to run the recovery
lockfile :-D
Rowland
Are there any errors in /var/log/log.ctdb? Which version of ctdb you
are using? And could you post your /etc/sysconfig/ctdb file?


Regards

Stefan
Stefan Kania
2014-12-16 07:53:42 UTC
Permalink
Hi Rowland,

did you see that you have som Problems with IPs on node 1?
2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find node to
cover ip 192.168.0.9
2014/12/15 16:32:28.300412 [recoverd: 2497]: Failed to find node to
cover ip 192.168.0.8
I also had some problems with IP and nameresolutions at the beginning.
After I solved that problem everything was fine.

Stefan
On 15/12/14 14:14, Stefan Kania wrote: Hi Rowland,
Post by Rowland Penny
Post by Rowland Penny
OK, I now have a single node up and running as per the
instructions provided by Ronnie.
Yay!
Post by Rowland Penny
there is this in the ctdb log
2014/12/13 11:52:43.522708 [ 5740]: Set runstate to INIT
line 2: function gensub never defined 2014/12/13
11:52:43.543178 [ 5740]: 00.ctdb: awk: line 2: function
00.ctdb: awk: line 2: function gensub never defined
In Debian, there are several possible awk versions that
provide awk: at least: mawk, gawk, original-awk. Which is
chosen if multiple are installed, depends on the
alternatives-mechanism: update-alternatives --display awk
update-alternatives --edit awk
A quick web search has reveiled that only the gawk (gnu
awk) variant might feature the needed gensub function.
Maybe we should change "awk" to "gawk" in our scripts and
packages would need to adapt their dependencies.
Post by Rowland Penny
2014/12/13 11:52:56.931393 [recoverd: 5887]: We are
still serving a public IP '127.0.0.3' that we should not
be serving. Removing it 2014/12/13 11:52:56.931536 [
5740]: Could not find which interface the ip address is
hosted on. can not release it 2014/12/13 11:52:56.931648
[recoverd: 5887]: We are still serving a public IP
'127.0.0.2' that we should not be serving. Removing it
The above three lines are there 4 times
I guess this will not be the case any more, when you move
to a more realistic setup where you don't use loopback for
hosting nodes internal and public addresses, but for a
start that is ok.
Post by Rowland Penny
2014/12/13 11:53:02.982441 [ 5740]: monitor event OK -
node re-enabled 2014/12/13 11:53:02.982480 [ 5740]: Node
became HEALTHY. Ask recovery master 0 to perform ip
Node 0 has changed flags - now 0x0 was 0x2 2014/12/13
11:53:02.983266 [recoverd: 5887]: Takeover run starting
2014/12/13 11:53:03.046859 [recoverd: 5887]: Takeover run
completed successfully
Number of nodes:1 pnn:0 127.0.0.1 OK (THIS NODE)
Generation:740799152 Size:1 hash:0 lmaster:0 Recovery
mode:NORMAL (0) Recovery master:0
Great!
Post by Rowland Penny
Now I know it works, I just have to pull it all
together.
Right. Next step: take a "real" ethernet interface and
first use that for nodes address. You can even start here
with a single node.
You can also go towards more realistic clusters in two
steps: First no public addresse, only the nodes file. That
is the core of a ctdb cluster. Then you can go towards
cluster-resource management and add public addresse and
also CTDB_MANAGES_SAMBA and friends.
One further note: Virtual machines or even containers (lxc
or docker) are awesome for setting up such clusters for
learing and testing. I use that for development myselves.
And here is one (imho) very neat trick: If you use lxc
containers (or docker can probably also do that), you can
completely take the complexity of having to set up a
cluster file system out of the equation: You can just bind
mount a directory of the host file system into the node
containers' root file systems by the lxc fstab file.
Thereby you have a posix-file system that is shared between
the nodes and you can use that as cluster FS.
This way, you can concentrate on ctdb and samba immediately
until you are comfortable with that.
I wanted at some point to provide a mechanism to set such a
thing up automatically, by just providing some config
files. Maybe I'll investigate the vagrant+puppet approach
that Ralph Böhme has recently posted in this or a related
thread...
Cheers - Michael
Getting closer :-)
deleted nodes) pnn:1 192.168.1.10 OK (THIS NODE) pnn:2
192.168.1.11 OK Generation:1073761636 Size:2 hash:0 lmaster:1
hash:1 lmaster:2 Recovery mode:NORMAL (0) Recovery master:1
This is with CTDB_RECOVERY_LOCK turned off, if I turn it on,
the nodes go unhealthy. I am putting the lockfile on the
shared cluster, should I be putting it somewhere else, it
# Shared recovery lock file to avoid split brain. No
default. # # Do NOT run CTDB without a recovery lock file
unless you know exactly # what you are doing.
#CTDB_RECOVERY_LOCK=/some/place/on/shared/storage
As I don't know what I am doing, I need to run the recovery
lockfile :-D
Rowland
Are there any errors in /var/log/log.ctdb? Which version of ctdb
you are using? And could you post your /etc/sysconfig/ctdb file?
Regards
Stefan
Well I could answer: there is nothing in /var/log/log.ctdb,
Version 2.5.3 and I do not have /etc/sysconfig/ctdb :-D
But instead, I will attach a tarball containing
/var/log/ctdb/log.ctdb from both nodes, it also contains
/etc/default/ctdb :-)
Rowland
- --
Stefan Kania
Landweg 13
25693 St. Michaelisdonn


Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
E-Mail. Weiter Informationen unter http://www.gnupg.org

Mein Schlüssel liegt auf

hkp://subkeys.pgp.net
Rowland Penny
2014-12-16 09:30:44 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Rowland,
did you see that you have som Problems with IPs on node 1?
2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find node to
cover ip 192.168.0.9
2014/12/15 16:32:28.300412 [recoverd: 2497]: Failed to find node to
cover ip 192.168.0.8
I also had some problems with IP and nameresolutions at the beginning.
After I solved that problem everything was fine.
Stefan
I did wonder about those lines, I do not have 192.168.0.8 & 192.168.0.9,
but Ronnie posted this:

No, you should not/need not create them on the system.
Ctdbd will create and assign these addresses automatically and
dynamically while the cluster is running.

So, do I need to create them and if so, where? This is one of those
areas of CTDB that doesn't seem to documented at all.

Rowland
Stefan Kania
2014-12-16 13:12:18 UTC
Permalink
Hi Rowland,

If these addresses should be your IPs vor Clients accessing the
Cluster, you must put tese IPs in your public_addresses file and in
your DNS the hostname vor your cluster should point to both addresses.
Remeber that you have ti install "ethtool" on all nodes! So if you
start the cluster the system will pick an IP-Address out of the file.
If there is no public_addresses file your system will not get any IP.
If there is no ethtool but a public_address file, the node cant set on
of the IPs. If one node of your clusters fails the second not will get
the IP address from the failed host. BUT REMEBER you wount see the IPs
with "ifconfig" you MUST use "ip a l".

Stefan
Post by Rowland Penny
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Rowland,
did you see that you have som Problems with IPs on node 1?
2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find node
2497]: Failed to find node to cover ip 192.168.0.8 I also had
some problems with IP and nameresolutions at the beginning. After
I solved that problem everything was fine.
Stefan
I did wonder about those lines, I do not have 192.168.0.8 &
No, you should not/need not create them on the system. Ctdbd will
create and assign these addresses automatically and dynamically
while the cluster is running.
So, do I need to create them and if so, where? This is one of
those areas of CTDB that doesn't seem to documented at all.
Rowland
- --
Stefan Kania
Landweg 13
25693 St. Michaelisdonn


Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
E-Mail. Weiter Informationen unter http://www.gnupg.org

Mein Schlüssel liegt auf

hkp://subkeys.pgp.net
Rowland Penny
2014-12-16 14:27:03 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Rowland,
If these addresses should be your IPs vor Clients accessing the
Cluster, you must put tese IPs in your public_addresses file
OK, they are in the public_addresses file
and in
your DNS the hostname vor your cluster should point to both addresses.
DOH! CNAME.
Remeber that you have ti install "ethtool" on all nodes!
What do mean remember? I haven't seen it anywhere that you must install
ethtool, It is installed anyway :-D
So if you
start the cluster the system will pick an IP-Address out of the file.
If there is no public_addresses file your system will not get any IP.
If there is no ethtool but a public_address file, the node cant set on
of the IPs. If one node of your clusters fails the second not will get
the IP address from the failed host. BUT REMEBER you wount see the IPs
with "ifconfig" you MUST use "ip a l".
That is something else that I haven't seen anywhere! :-)

Rowland
Stefan
Post by Rowland Penny
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Rowland,
did you see that you have som Problems with IPs on node 1?
2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find node
2497]: Failed to find node to cover ip 192.168.0.8 I also had
some problems with IP and nameresolutions at the beginning. After
I solved that problem everything was fine.
Stefan
I did wonder about those lines, I do not have 192.168.0.8 &
No, you should not/need not create them on the system. Ctdbd will
create and assign these addresses automatically and dynamically
while the cluster is running.
So, do I need to create them and if so, where? This is one of
those areas of CTDB that doesn't seem to documented at all.
Rowland
- --
Stefan Kania
Landweg 13
25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
iEYEARECAAYFAlSQL7IACgkQ2JOGcNAHDTZrNwCghP9Z+1fg3WmNT74cnw9gEFF/
i9QAnjlkHo+0tgptWboK1qRlAQFKuyB5
=+io7
-----END PGP SIGNATURE-----
Stefan Kania
2014-12-16 17:38:31 UTC
Permalink
Hi Rowland,
On 16/12/14 13:12, Stefan Kania wrote: Hi Rowland,
If these addresses should be your IPs vor Clients accessing the
Cluster, you must put tese IPs in your public_addresses file
Post by Rowland Penny
OK, they are in the public_addresses file
and in your DNS the hostname vor your cluster should point to both
addresses.
Post by Rowland Penny
DOH! CNAME.
Remeber that you have ti install "ethtool" on all nodes!
Post by Rowland Penny
What do mean remember? I haven't seen it anywhere that you must
install ethtool, It is installed anyway :-D
I had an error-massage on the node without ethtool, and the nodes were
unhealty. After I installaed ethtool it worked for me and the
error-message was gone.
So if you start the cluster the system will pick an IP-Address out
of the file. If there is no public_addresses file your system will
not get any IP. If there is no ethtool but a public_address file,
the node cant set on of the IPs. If one node of your clusters fails
the second not will get the IP address from the failed host. BUT
REMEBER you wount see the IPs with "ifconfig" you MUST use "ip a
l".
Post by Rowland Penny
That is something else that I haven't seen anywhere! :-)
Read this:
http://unix.stackexchange.com/questions/93412/difference-between-ifconfig-and-ip-commands

What I think ist that CTDB is assaigning the virtual IP over "ip" and
configuring the NIC with ethtool. So if you are setting an IP with the
"ip" command, this address is not shown with "ifconfig"

Did you get rid of the IP-Errormessage?

I think that's your main problem.

Stefan
Post by Rowland Penny
Rowland
Stefan
Post by Rowland Penny
Post by Rowland Penny
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Rowland,
did you see that you have som Problems with IPs on node 1?
2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find
node to cover ip 192.168.0.9 2014/12/15 16:32:28.300412
[recoverd: 2497]: Failed to find node to cover ip
192.168.0.8 I also had some problems with IP and
nameresolutions at the beginning. After I solved that
problem everything was fine.
Stefan
I did wonder about those lines, I do not have 192.168.0.8 &
No, you should not/need not create them on the system. Ctdbd
will create and assign these addresses automatically and
dynamically while the cluster is running.
So, do I need to create them and if so, where? This is one
of those areas of CTDB that doesn't seem to documented at
all.
Rowland
-- Stefan Kania Landweg 13 25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie
ihre E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
- --
Stefan Kania
Landweg 13
25693 St. Michaelisdonn


Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
E-Mail. Weiter Informationen unter http://www.gnupg.org

Mein Schlüssel liegt auf

hkp://subkeys.pgp.net
Rowland Penny
2014-12-16 18:22:02 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Rowland,
On 16/12/14 13:12, Stefan Kania wrote: Hi Rowland,
If these addresses should be your IPs vor Clients accessing the
Cluster, you must put tese IPs in your public_addresses file
Post by Rowland Penny
OK, they are in the public_addresses file
and in your DNS the hostname vor your cluster should point to both
addresses.
Post by Rowland Penny
DOH! CNAME.
Remeber that you have ti install "ethtool" on all nodes!
Post by Rowland Penny
What do mean remember? I haven't seen it anywhere that you must
install ethtool, It is installed anyway :-D
I had an error-massage on the node without ethtool, and the nodes were
unhealty. After I installaed ethtool it worked for me and the
error-message was gone.
So if you start the cluster the system will pick an IP-Address out
of the file. If there is no public_addresses file your system will
not get any IP. If there is no ethtool but a public_address file,
the node cant set on of the IPs. If one node of your clusters fails
the second not will get the IP address from the failed host. BUT
REMEBER you wount see the IPs with "ifconfig" you MUST use "ip a
l".
Post by Rowland Penny
That is something else that I haven't seen anywhere! :-)
http://unix.stackexchange.com/questions/93412/difference-between-ifconfig-and-ip-commands
What I think ist that CTDB is assaigning the virtual IP over "ip" and
configuring the NIC with ethtool. So if you are setting an IP with the
"ip" command, this address is not shown with "ifconfig"
Did you get rid of the IP-Errormessage?
It would seem so, the last time it appeared in log.ctdb was here:

2014/12/16 14:48:41.784044 [recoverd:13666]: Takeover run starting
2014/12/16 14:48:41.784284 [recoverd:13666]: Failed to find node to
cover ip 192.168.0.9
2014/12/16 14:48:41.784305 [recoverd:13666]: Failed to find node to
cover ip 192.168.0.8
2014/12/16 14:48:41.850344 [recoverd:13666]: Takeover run completed
successfully

A short while later there is this:

2014/12/16 14:52:34.242911 [recoverd:13666]: Takeover run starting
2014/12/16 14:52:34.243356 [13513]: Takeover of IP 192.168.0.9/8 on
interface eth0
2014/12/16 14:52:34.261916 [13513]: Takeover of IP 192.168.0.8/8 on
interface eth0
2014/12/16 14:52:34.490010 [recoverd:13666]: Takeover run completed
successfully

The ipaddresses never appear again.
I think that's your main problem.
I dont think so, tailing the log shows this:

***@cluster1:~# tail /var/log/ctdb/log.ctdb
2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
2014/12/16 18:11:23.873189 [recoverd:13666]: ctdb_control error:
'managed to lock reclock file from inside daemon'
2014/12/16 18:11:23.873235 [recoverd:13666]: ctdb_control error:
'managed to lock reclock file from inside daemon'
2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
ret=-1 res=-1 opcode=16
2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
fail_count=1
2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
Unable to set recovery mode. Recovery failed.
2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
Unable to set recovery mode to normal on cluster

This appears to be happening over and over again.

ctdb status shows this:

Number of nodes:3 (including 1 deleted nodes)
pnn:1 192.168.1.10 OK (THIS NODE)
pnn:2 192.168.1.11 UNHEALTHY
Generation:1226492970
Size:2
hash:0 lmaster:1
hash:1 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:1

ip a l
Shows this:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP group default qlen 1000
link/ether 08:00:27:d6:92:30 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.6/24 brd 192.168.0.255 scope global eth0
inet 192.168.0.8/8 brd 192.255.255.255 scope global eth0
inet 192.168.0.9/8 brd 192.255.255.255 scope global secondary eth0
inet6 fe80::a00:27ff:fed6:9230/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP group default qlen 1000
link/ether 08:00:27:03:79:17 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.10/24 brd 192.168.1.255 scope global eth1
inet6 fe80::a00:27ff:fe03:7917/64 scope link
valid_lft forever preferred_lft forever

Rowland
Stefan
Post by Rowland Penny
Rowland
Stefan
Post by Rowland Penny
Post by Rowland Penny
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Rowland,
did you see that you have som Problems with IPs on node 1?
2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find
node to cover ip 192.168.0.9 2014/12/15 16:32:28.300412
[recoverd: 2497]: Failed to find node to cover ip
192.168.0.8 I also had some problems with IP and
nameresolutions at the beginning. After I solved that
problem everything was fine.
Stefan
I did wonder about those lines, I do not have 192.168.0.8 &
No, you should not/need not create them on the system. Ctdbd
will create and assign these addresses automatically and
dynamically while the cluster is running.
So, do I need to create them and if so, where? This is one
of those areas of CTDB that doesn't seem to documented at
all.
Rowland
-- Stefan Kania Landweg 13 25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie
ihre E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
- --
Stefan Kania
Landweg 13
25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
iEUEARECAAYFAlSQbhcACgkQ2JOGcNAHDTYthwCWJWPKLQRHCKGPKfTIcD6M/NSy
UACghab8M/tgslaBgc6Ynk0D0jshjJA=
=66WA
-----END PGP SIGNATURE-----
ronnie sahlberg
2014-12-16 18:57:43 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Rowland,
On 16/12/14 13:12, Stefan Kania wrote: Hi Rowland,
If these addresses should be your IPs vor Clients accessing the
Cluster, you must put tese IPs in your public_addresses file
Post by Rowland Penny
OK, they are in the public_addresses file
and in your DNS the hostname vor your cluster should point to both
addresses.
Post by Rowland Penny
DOH! CNAME.
Remeber that you have ti install "ethtool" on all nodes!
Post by Rowland Penny
What do mean remember? I haven't seen it anywhere that you must
install ethtool, It is installed anyway :-D
I had an error-massage on the node without ethtool, and the nodes were
unhealty. After I installaed ethtool it worked for me and the
error-message was gone.
ethtool should be used by one of the eventscripts in ctdb.
The idea here is that ethtool is used in the script to check that the
link/interface that is used to host public addresses is good.
So that IF you unplug the cable, or if the switch goes down, then
the script/ethtool will detect that the node is bad and the node
becomes UNHEALTHY.
When the node becomes unhealthy all public ip addresses will be moved
over to other nodes that are not unhealthy.

I.e. ethtool is to make sure that if the interface/link goes bad,
ctdb can detect it and take action.
So if you start the cluster the system will pick an IP-Address out
of the file. If there is no public_addresses file your system will
not get any IP. If there is no ethtool but a public_address file,
the node cant set on of the IPs. If one node of your clusters fails
the second not will get the IP address from the failed host. BUT
REMEBER you wount see the IPs with "ifconfig" you MUST use "ip a
l".
Post by Rowland Penny
That is something else that I haven't seen anywhere! :-)
http://unix.stackexchange.com/questions/93412/difference-between-ifconfig-and-ip-commands
What I think ist that CTDB is assaigning the virtual IP over "ip" and
configuring the NIC with ethtool. So if you are setting an IP with the
"ip" command, this address is not shown with "ifconfig"
The difference is more that ifconfig is a really old command from the
days when you could only have one ip address assigned to an interface.
As such ifconfig will only show a single ip address, even if there
are several addresses assigned.
(which ip it will show is kind of undefined)


ip addr show is the command to use to show all addresses that are assigned.
Did you get rid of the IP-Errormessage?
I think that's your main problem.
Stefan
Post by Rowland Penny
Rowland
Stefan
Post by Rowland Penny
Post by Rowland Penny
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Rowland,
did you see that you have som Problems with IPs on node 1?
2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find
node to cover ip 192.168.0.9 2014/12/15 16:32:28.300412
[recoverd: 2497]: Failed to find node to cover ip
192.168.0.8 I also had some problems with IP and
nameresolutions at the beginning. After I solved that
problem everything was fine.
Stefan
I did wonder about those lines, I do not have 192.168.0.8 &
No, you should not/need not create them on the system. Ctdbd
will create and assign these addresses automatically and
dynamically while the cluster is running.
So, do I need to create them and if so, where? This is one
of those areas of CTDB that doesn't seem to documented at
all.
Rowland
-- Stefan Kania Landweg 13 25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie
ihre E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
- --
Stefan Kania
Landweg 13
25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
iEUEARECAAYFAlSQbhcACgkQ2JOGcNAHDTYthwCWJWPKLQRHCKGPKfTIcD6M/NSy
UACghab8M/tgslaBgc6Ynk0D0jshjJA=
=66WA
-----END PGP SIGNATURE-----
ronnie sahlberg
2014-12-16 19:05:27 UTC
Permalink
Post by Rowland Penny
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Rowland,
On 16/12/14 13:12, Stefan Kania wrote: Hi Rowland,
If these addresses should be your IPs vor Clients accessing the
Cluster, you must put tese IPs in your public_addresses file
Post by Rowland Penny
OK, they are in the public_addresses file
and in your DNS the hostname vor your cluster should point to both
addresses.
Post by Rowland Penny
DOH! CNAME.
Remeber that you have ti install "ethtool" on all nodes!
Post by Rowland Penny
What do mean remember? I haven't seen it anywhere that you must
install ethtool, It is installed anyway :-D
I had an error-massage on the node without ethtool, and the nodes were
unhealty. After I installaed ethtool it worked for me and the
error-message was gone.
So if you start the cluster the system will pick an IP-Address out
of the file. If there is no public_addresses file your system will
not get any IP. If there is no ethtool but a public_address file,
the node cant set on of the IPs. If one node of your clusters fails
the second not will get the IP address from the failed host. BUT
REMEBER you wount see the IPs with "ifconfig" you MUST use "ip a
l".
Post by Rowland Penny
That is something else that I haven't seen anywhere! :-)
http://unix.stackexchange.com/questions/93412/difference-between-ifconfig-and-ip-commands
What I think ist that CTDB is assaigning the virtual IP over "ip" and
configuring the NIC with ethtool. So if you are setting an IP with the
"ip" command, this address is not shown with "ifconfig"
Did you get rid of the IP-Errormessage?
2014/12/16 14:48:41.784044 [recoverd:13666]: Takeover run starting
2014/12/16 14:48:41.784284 [recoverd:13666]: Failed to find node to cover ip
192.168.0.9
2014/12/16 14:48:41.784305 [recoverd:13666]: Failed to find node to cover ip
192.168.0.8
2014/12/16 14:48:41.850344 [recoverd:13666]: Takeover run completed
successfully
If this only happens during startup I would not worry about it.
It may be that none of the nodes are ready to accept IP addresses yet
and this is then just a benign but annoying message.
Post by Rowland Penny
2014/12/16 14:52:34.242911 [recoverd:13666]: Takeover run starting
2014/12/16 14:52:34.243356 [13513]: Takeover of IP 192.168.0.9/8 on
interface eth0
2014/12/16 14:52:34.261916 [13513]: Takeover of IP 192.168.0.8/8 on
interface eth0
This looks wrong.
I suspect you want this to be using /24 bit netmasks, not 8 bit masks.
See also below in the 'ip addr show' output where you see the mask
beeing /24 for the static address.

I.e. you should probably change your public addresses file and set
the netmask to 24
Post by Rowland Penny
2014/12/16 14:52:34.490010 [recoverd:13666]: Takeover run completed
successfully
The ipaddresses never appear again.
I think that's your main problem.
2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
2014/12/16 18:11:23.873189 [recoverd:13666]: ctdb_control error: 'managed to
lock reclock file from inside daemon'
2014/12/16 18:11:23.873235 [recoverd:13666]: ctdb_control error: 'managed to
lock reclock file from inside daemon'
2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
ret=-1 res=-1 opcode=16
2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
fail_count=1
2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
Unable to set recovery mode. Recovery failed.
2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
Unable to set recovery mode to normal on cluster
This appears to be happening over and over again.
Number of nodes:3 (including 1 deleted nodes)
pnn:1 192.168.1.10 OK (THIS NODE)
pnn:2 192.168.1.11 UNHEALTHY
You can run 'ctdb scriptstatus' on node 1 and it should give you
more detail about why the node is unhealthy.
Post by Rowland Penny
Generation:1226492970
Size:2
hash:0 lmaster:1
hash:1 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:1
ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN group
default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
UP group default qlen 1000
link/ether 08:00:27:d6:92:30 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.6/24 brd 192.168.0.255 scope global eth0
inet 192.168.0.8/8 brd 192.255.255.255 scope global eth0
inet 192.168.0.9/8 brd 192.255.255.255 scope global secondary eth0
inet6 fe80::a00:27ff:fed6:9230/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
UP group default qlen 1000
link/ether 08:00:27:03:79:17 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.10/24 brd 192.168.1.255 scope global eth1
inet6 fe80::a00:27ff:fe03:7917/64 scope link
valid_lft forever preferred_lft forever
Rowland
Stefan
Post by Rowland Penny
Rowland
Stefan
Post by Rowland Penny
Post by Rowland Penny
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Rowland,
did you see that you have som Problems with IPs on node 1?
2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find
node to cover ip 192.168.0.9 2014/12/15 16:32:28.300412
[recoverd: 2497]: Failed to find node to cover ip
192.168.0.8 I also had some problems with IP and
nameresolutions at the beginning. After I solved that
problem everything was fine.
Stefan
I did wonder about those lines, I do not have 192.168.0.8 &
No, you should not/need not create them on the system. Ctdbd
will create and assign these addresses automatically and
dynamically while the cluster is running.
So, do I need to create them and if so, where? This is one
of those areas of CTDB that doesn't seem to documented at
all.
Rowland
-- Stefan Kania Landweg 13 25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie
ihre E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
- -- Stefan Kania
Landweg 13
25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
iEUEARECAAYFAlSQbhcACgkQ2JOGcNAHDTYthwCWJWPKLQRHCKGPKfTIcD6M/NSy
UACghab8M/tgslaBgc6Ynk0D0jshjJA=
=66WA
-----END PGP SIGNATURE-----
Rowland Penny
2014-12-16 20:02:54 UTC
Permalink
Post by ronnie sahlberg
Post by Rowland Penny
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Rowland,
On 16/12/14 13:12, Stefan Kania wrote: Hi Rowland,
If these addresses should be your IPs vor Clients accessing the
Cluster, you must put tese IPs in your public_addresses file
Post by Rowland Penny
OK, they are in the public_addresses file
and in your DNS the hostname vor your cluster should point to both
addresses.
Post by Rowland Penny
DOH! CNAME.
Remeber that you have ti install "ethtool" on all nodes!
Post by Rowland Penny
What do mean remember? I haven't seen it anywhere that you must
install ethtool, It is installed anyway :-D
I had an error-massage on the node without ethtool, and the nodes were
unhealty. After I installaed ethtool it worked for me and the
error-message was gone.
So if you start the cluster the system will pick an IP-Address out
of the file. If there is no public_addresses file your system will
not get any IP. If there is no ethtool but a public_address file,
the node cant set on of the IPs. If one node of your clusters fails
the second not will get the IP address from the failed host. BUT
REMEBER you wount see the IPs with "ifconfig" you MUST use "ip a
l".
Post by Rowland Penny
That is something else that I haven't seen anywhere! :-)
http://unix.stackexchange.com/questions/93412/difference-between-ifconfig-and-ip-commands
What I think ist that CTDB is assaigning the virtual IP over "ip" and
configuring the NIC with ethtool. So if you are setting an IP with the
"ip" command, this address is not shown with "ifconfig"
Did you get rid of the IP-Errormessage?
2014/12/16 14:48:41.784044 [recoverd:13666]: Takeover run starting
2014/12/16 14:48:41.784284 [recoverd:13666]: Failed to find node to cover ip
192.168.0.9
2014/12/16 14:48:41.784305 [recoverd:13666]: Failed to find node to cover ip
192.168.0.8
2014/12/16 14:48:41.850344 [recoverd:13666]: Takeover run completed
successfully
If this only happens during startup I would not worry about it.
It may be that none of the nodes are ready to accept IP addresses yet
and this is then just a benign but annoying message.
Post by Rowland Penny
2014/12/16 14:52:34.242911 [recoverd:13666]: Takeover run starting
2014/12/16 14:52:34.243356 [13513]: Takeover of IP 192.168.0.9/8 on
interface eth0
2014/12/16 14:52:34.261916 [13513]: Takeover of IP 192.168.0.8/8 on
interface eth0
This looks wrong.
I suspect you want this to be using /24 bit netmasks, not 8 bit masks.
See also below in the 'ip addr show' output where you see the mask
beeing /24 for the static address.
I.e. you should probably change your public addresses file and set
the netmask to 24
So I changed the netmask to 24 and now neither of the nodes come up,
change back to 8 and node 1 becomes OK, back to to 24 and both UNHEALTHY
again.

Now Steve said that this wasn't rocket science, he was wrong, it is!!

I cannot spend any more time on this, it might work on redhat but I
cannot make it work on Debian. I must be missing something, but for the
life of me, I cannot find what it is.

Rowland
Post by ronnie sahlberg
Post by Rowland Penny
2014/12/16 14:52:34.490010 [recoverd:13666]: Takeover run completed
successfully
The ipaddresses never appear again.
I think that's your main problem.
2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
2014/12/16 18:11:23.873189 [recoverd:13666]: ctdb_control error: 'managed to
lock reclock file from inside daemon'
2014/12/16 18:11:23.873235 [recoverd:13666]: ctdb_control error: 'managed to
lock reclock file from inside daemon'
2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
ret=-1 res=-1 opcode=16
2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
fail_count=1
2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
Unable to set recovery mode. Recovery failed.
2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
Unable to set recovery mode to normal on cluster
This appears to be happening over and over again.
Number of nodes:3 (including 1 deleted nodes)
pnn:1 192.168.1.10 OK (THIS NODE)
pnn:2 192.168.1.11 UNHEALTHY
You can run 'ctdb scriptstatus' on node 1 and it should give you
more detail about why the node is unhealthy.
Post by Rowland Penny
Generation:1226492970
Size:2
hash:0 lmaster:1
hash:1 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:1
ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN group
default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
UP group default qlen 1000
link/ether 08:00:27:d6:92:30 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.6/24 brd 192.168.0.255 scope global eth0
inet 192.168.0.8/8 brd 192.255.255.255 scope global eth0
inet 192.168.0.9/8 brd 192.255.255.255 scope global secondary eth0
inet6 fe80::a00:27ff:fed6:9230/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
UP group default qlen 1000
link/ether 08:00:27:03:79:17 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.10/24 brd 192.168.1.255 scope global eth1
inet6 fe80::a00:27ff:fe03:7917/64 scope link
valid_lft forever preferred_lft forever
Rowland
Stefan
Post by Rowland Penny
Rowland
Stefan
Post by Rowland Penny
Post by Rowland Penny
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Rowland,
did you see that you have som Problems with IPs on node 1?
2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find
node to cover ip 192.168.0.9 2014/12/15 16:32:28.300412
[recoverd: 2497]: Failed to find node to cover ip
192.168.0.8 I also had some problems with IP and
nameresolutions at the beginning. After I solved that
problem everything was fine.
Stefan
I did wonder about those lines, I do not have 192.168.0.8 &
No, you should not/need not create them on the system. Ctdbd
will create and assign these addresses automatically and
dynamically while the cluster is running.
So, do I need to create them and if so, where? This is one
of those areas of CTDB that doesn't seem to documented at
all.
Rowland
-- Stefan Kania Landweg 13 25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie
ihre E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
- -- Stefan Kania
Landweg 13
25693 St. Michaelisdonn
Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
E-Mail. Weiter Informationen unter http://www.gnupg.org
Mein Schlüssel liegt auf
hkp://subkeys.pgp.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
iEUEARECAAYFAlSQbhcACgkQ2JOGcNAHDTYthwCWJWPKLQRHCKGPKfTIcD6M/NSy
UACghab8M/tgslaBgc6Ynk0D0jshjJA=
=66WA
-----END PGP SIGNATURE-----
Martin Schwenke
2014-12-16 20:37:33 UTC
Permalink
On Tue, 16 Dec 2014 18:38:31 +0100, Stefan Kania
Post by Stefan Kania
On 16/12/14 13:12, Stefan Kania wrote: Hi Rowland,
Remeber that you have ti install "ethtool" on all nodes!
Post by Rowland Penny
What do mean remember? I haven't seen it anywhere that you must
install ethtool, It is installed anyway :-D
I had an error-massage on the node without ethtool, and the nodes were
unhealty. After I installaed ethtool it worked for me and the
error-message was gone.
Here are the dependencies for the current ctdb package in Debian
testing/unstable:

Depends: libc6 (>= 2.8), libpopt0 (>= 1.14), libtalloc2 (>= 2.0.4~git20101213), libtdb1 (>= 1.3.0), libtevent0 (>= 0.9.16), lsb-base, iproute2, psmisc, tdb-tools, time, sudo
Recommends: ethtool
Suggests: logrotate, lsof, libctdb-dev

So it recommends ethtool. I guess this is because you can use CTDB
without public addresses (e.g. with LVS). That seems sane.

I seem to remember that you have to try hard in Debian to switch off
automatic installation of recommended packages, since doing so indicates
that you know what you're doing. I do this on my machines because I
like to keep them lightweight and I occasionally know what I'm
doing. ;-)

So, on Debian that dependency looks sane.

More generally, you could argue that someone should extract the
dependencies that we have in our RPM spec file and put them in an
"INSTALL" file or somewhere on the wiki.

peace & happiness,
martin
Martin Schwenke
2014-12-16 20:46:42 UTC
Permalink
On Tue, 16 Dec 2014 11:05:27 -0800, ronnie sahlberg
Post by ronnie sahlberg
Post by Rowland Penny
Post by Stefan Kania
Did you get rid of the IP-Errormessage?
2014/12/16 14:48:41.784044 [recoverd:13666]: Takeover run starting
2014/12/16 14:48:41.784284 [recoverd:13666]: Failed to find node to cover ip
192.168.0.9
2014/12/16 14:48:41.784305 [recoverd:13666]: Failed to find node to cover ip
192.168.0.8
2014/12/16 14:48:41.850344 [recoverd:13666]: Takeover run completed
successfully
If this only happens during startup I would not worry about it.
It may be that none of the nodes are ready to accept IP addresses yet
and this is then just a benign but annoying message.
That's right. This is seem more often during startup than it used to
be, since we now don't allocate IPs to nodes until they are in the
"RUNNING" runstate, which means that the "startup" event has completed.

Those messages are somewhat difficult to avoid. If they were easy to
avoid then I would have gotten rid of them. However, I've taken
another look and I see a possible but slightly ugly way... I'll add it
to the list... :-)

peace & happiness,
martin
Martin Schwenke
2014-12-16 20:50:44 UTC
Permalink
On Tue, 16 Dec 2014 20:02:54 +0000, Rowland Penny
Post by Rowland Penny
Post by ronnie sahlberg
I.e. you should probably change your public addresses file and set
the netmask to 24
So I changed the netmask to 24 and now neither of the nodes come up,
change back to 8 and node 1 becomes OK, back to to 24 and both UNHEALTHY
again.
So, what do you see if you run "ctdb scriptstatus" on an unhealthy
node? This should give you the reason that it is unhealthy. If it
says "monitor cycle never run" then please try "ctdb scriptstatus
startup".

I'm guessing that you've restarted CTDB on both nodes after making the
netmask change? There's a small chance that you've used "ctdb
reloadips", which is missing some logic to deal with change of
netmasks... but it hasn't bitten anyone yet, to my knowledge, so is low
on the priority list.
Post by Rowland Penny
Now Steve said that this wasn't rocket science, he was wrong, it is!!
Not rocket science. Just a bit harder than setting up Samba. More
dependencies...

peace & happiness,
martin
Martin Schwenke
2014-12-16 20:59:36 UTC
Permalink
On Tue, 16 Dec 2014 18:22:02 +0000, Rowland Penny
Post by Rowland Penny
2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
'managed to lock reclock file from inside daemon'
'managed to lock reclock file from inside daemon'
2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
ret=-1 res=-1 opcode=16
2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
fail_count=1
2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
Unable to set recovery mode. Recovery failed.
2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
Unable to set recovery mode to normal on cluster
This appears to be happening over and over again.
That is the indicator that you have a lock coherency problem. Please
see the stuff I made bold in:

https://wiki.samba.org/index.php/Ping_pong

Yes, this is hard and it tripped me up when I rushed through the
ping-pong test... and there was nothing in bold there to draw my
attention to that detail. As Michael Adam has mentioned, some cluster
filesystems will look like they fail this test when they actually pass,
so it is difficult to have a test that works everywhere...

I'll try to update that message to make this clearer and send users
back to the ping-pong test.

peace & happiness,
martin
Rowland Penny
2014-12-16 21:12:12 UTC
Permalink
Post by Martin Schwenke
On Tue, 16 Dec 2014 18:22:02 +0000, Rowland Penny
Post by Rowland Penny
2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
'managed to lock reclock file from inside daemon'
'managed to lock reclock file from inside daemon'
2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
ret=-1 res=-1 opcode=16
2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
fail_count=1
2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
Unable to set recovery mode. Recovery failed.
2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
Unable to set recovery mode to normal on cluster
This appears to be happening over and over again.
That is the indicator that you have a lock coherency problem. Please
https://wiki.samba.org/index.php/Ping_pong
Yes, this is hard and it tripped me up when I rushed through the
ping-pong test... and there was nothing in bold there to draw my
attention to that detail. As Michael Adam has mentioned, some cluster
filesystems will look like they fail this test when they actually pass,
so it is difficult to have a test that works everywhere...
I'll try to update that message to make this clearer and send users
back to the ping-pong test.
peace & happiness,
martin
I ran the ping_pong test this morning, following the wiki page and as
far as I could see it passed all tests.

I have come to the conclusion that you need to be a CTDB dev to set CTDB
up, only they seem to have ALL the information required.

I absolutely give up, I cannot make it work, god knows I have tried, but
I just cannot make it work with the information available. I can find
bits here and bits there, but there still seems to be something missing,
or is it just me. Debian 7.7, Pacemaker, Corosync and Ocfs2 work OK, it
is just when you try to add CTDB.

Rowland
Ralph Böhme
2014-12-16 21:19:40 UTC
Permalink
Post by Rowland Penny
Post by Martin Schwenke
On Tue, 16 Dec 2014 18:22:02 +0000, Rowland Penny
Post by Rowland Penny
2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
'managed to lock reclock file from inside daemon'
'managed to lock reclock file from inside daemon'
2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
ret=-1 res=-1 opcode=16
2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
fail_count=1
2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
Unable to set recovery mode. Recovery failed.
2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
Unable to set recovery mode to normal on cluster
This appears to be happening over and over again.
That is the indicator that you have a lock coherency problem. Please
https://wiki.samba.org/index.php/Ping_pong
Yes, this is hard and it tripped me up when I rushed through the
ping-pong test... and there was nothing in bold there to draw my
attention to that detail. As Michael Adam has mentioned, some cluster
filesystems will look like they fail this test when they actually pass,
so it is difficult to have a test that works everywhere...
I'll try to update that message to make this clearer and send users
back to the ping-pong test.
peace & happiness,
martin
I ran the ping_pong test this morning, following the wiki page and
as far as I could see it passed all tests.
I have come to the conclusion that you need to be a CTDB dev to set
CTDB up, only they seem to have ALL the information required.
I absolutely give up, I cannot make it work, god knows I have tried,
but I just cannot make it work with the information available. I can
find bits here and bits there, but there still seems to be something
missing, or is it just me. Debian 7.7, Pacemaker, Corosync and Ocfs2
work OK, it is just when you try to add CTDB.
can you share the bits from Debain to ocfs2? I'll set this up the next
day and see if I get ctdb to behave.

Cheerio!
-Ralph
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de,mailto:***@sernet.de
Rowland Penny
2014-12-16 22:08:08 UTC
Permalink
Post by Ralph Böhme
Post by Rowland Penny
Post by Martin Schwenke
On Tue, 16 Dec 2014 18:22:02 +0000, Rowland Penny
Post by Rowland Penny
2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
'managed to lock reclock file from inside daemon'
'managed to lock reclock file from inside daemon'
2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
ret=-1 res=-1 opcode=16
2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
fail_count=1
2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
Unable to set recovery mode. Recovery failed.
2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
Unable to set recovery mode to normal on cluster
This appears to be happening over and over again.
That is the indicator that you have a lock coherency problem. Please
https://wiki.samba.org/index.php/Ping_pong
Yes, this is hard and it tripped me up when I rushed through the
ping-pong test... and there was nothing in bold there to draw my
attention to that detail. As Michael Adam has mentioned, some cluster
filesystems will look like they fail this test when they actually pass,
so it is difficult to have a test that works everywhere...
I'll try to update that message to make this clearer and send users
back to the ping-pong test.
peace & happiness,
martin
I ran the ping_pong test this morning, following the wiki page and
as far as I could see it passed all tests.
I have come to the conclusion that you need to be a CTDB dev to set
CTDB up, only they seem to have ALL the information required.
I absolutely give up, I cannot make it work, god knows I have tried,
but I just cannot make it work with the information available. I can
find bits here and bits there, but there still seems to be something
missing, or is it just me. Debian 7.7, Pacemaker, Corosync and Ocfs2
work OK, it is just when you try to add CTDB.
can you share the bits from Debain to ocfs2? I'll set this up the next
day and see if I get ctdb to behave.
Cheerio!
-Ralph
OK, I based this on what Richard posted:

1. Create two VirtualBox VMs with enough memory and disk for your Linux
Distro. I used Debian 7.7 with 512MB and 8GB. You will also need an
extra interface on each VM for the clustering private network. I set
them to an internal type.

2. Because you will need a shared disk, create one:

vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size 10240
--variant Fixed --format VDI # Creates a 10GB fixed sized disk

vboxmanage modifyhd ~/VirtualBox\ VMs/SharedHD1.vdi --type shareable

Also, use the GUI to add the shared disk to both VMs.

3. Install the OS on each of the VM's.

4. Install clustering packages:

apt-get install openais corosync pacemaker ocfs2-tools-pacemaker dlm-pcmk

5. Configure corosync

nano /etc/corosync/corosync.conf

Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.1.0

# Copy the file to the other node.
scp /etc/corosync/corosync.conf ***@192.168.0.7:/etc/corosync/

# ON BOTH NODES
service pacemaker stop # Stop these in case they were running
service corosync stop # Same here

nano /etc/default/corosync
Change:
START=no

To:

START=yes

# now start the cluster
service corosync start

# Also start it on the other node(s).

# Now check the status:
***@cluster1:~# crm_mon -1
============
Last updated: Mon Dec 15 10:46:20 2014
Last change: Mon Dec 15 10:44:18 2014 via crmd on cluster1
Stack: openais
Current DC: cluster1 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ cluster1 cluster2 ]

If you do not see all the other nodes online, then you have to debug the
problem.

6. Configure the Oracle cluster

dpkg-reconfigure ocfs2-tools
Configuring ocfs2-tools
# Would you like to start an OCFS2 cluster (O2CB) at boot time?
#
# <Yes>
#
# Name of the cluster to start at boot time:
#
# ctdbdemo

# Create the ocfs2 cluster conf file
o2cb_ctl -C -n ctdbdemo -t cluster
o2cb_ctl -C -n cluster1 -t node -a number=1 -a ip_address=192.168.1.10
-a ip_port=7777 -a cluster=ctdbdemo
o2cb_ctl -C -n cluster2 -t node -a number=2 -a ip_address=192.168.1.11
-a ip_port=7777 -a cluster=ctdbdemo

# ON BOTH NODES
service corosync stop

# Copy files to the other node.
scp /etc/default/o2cb ***@192.168.0.7:/etc/default/
scp /etc/ocfs2/cluster.conf ***@192.168.0.7:/etc/ocfs2/

service o2cb start
service corosync start

crm configure property stonith-enabled=false

7. Create the shared shared file system on one node:

mkfs.ocfs2 -L CTDBdemocommon -T datafiles -N 4 /dev/sdb

8. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.

mkdir /cluster
mount -t ocfs2 /dev/sdb /cluster

This gets you a shared cluster??

For me it all goes pear shaped when I try to add CTDB.

If you find that I have missed something or done something wrong, then I
will not be surprised, info is very hard to find.

Rowland
Martin Schwenke
2014-12-16 23:45:41 UTC
Permalink
On Tue, 16 Dec 2014 21:12:12 +0000, Rowland Penny
Post by Rowland Penny
I ran the ping_pong test this morning, following the wiki page and as
far as I could see it passed all tests.
When I run "ping_pong /clusterfs/test.dat 3" on 1 node of a 2 node OCFS2
cluster, I see a very high locking rate - in the 10000s. When I run it
on another node I see the same high locking rate and I don't see the
rate drop on the 1st node. That's a fail.

This is on a cluster where I haven't worked out the extra steps to get
lock coherence.
Post by Rowland Penny
I have come to the conclusion that you need to be a CTDB dev to set CTDB
up, only they seem to have ALL the information required.
Sorry, but that line is starting to grate. I'm concerned that
statements like this are likely to put people off using CTDB. There are
many non-CTDB-devs out there running CTDB with other cluster
filesystems.

When the CTDB recovery lock is configured then CTDB has a hard
requirement that the cluster filesystem *must* provide lock coherence.
So the problem you have is a lack of lock coherence in OCFS2.

I am a CTDB dev. I haven't yet got OCFS2 working, partly due to lack
of time to figure out which pieces I'm missing. I have a simple recipe
that gets me to a similar point to where you are at and I haven't even
looked at corosync. At some time I will try to go through Richard's
instructions and try to distill out the part that adds lock coherence.

I was confused by the ping pong test results so I tried to clarify the
documentation for that test.

It seems like OCFS2 is stupendously difficult to setup with lock
coherence. This is not CTDB's fault. Perhaps you need to be an OCFS2
dev to setup CTDB with OCFS2? ;-)
Post by Rowland Penny
I absolutely give up, I cannot make it work, god knows I have tried, but
I just cannot make it work with the information available. I can find
bits here and bits there, but there still seems to be something missing,
or is it just me. Debian 7.7, Pacemaker, Corosync and Ocfs2 work OK, it
is just when you try to add CTDB.
If all those other things provided lock coherence on the cluster
filesystem then CTDB would work. So adding CTDB makes you notice the
problem but CTDB does not cause it. :-)

peace & happiness,
martin
Rowland Penny
2014-12-17 09:08:47 UTC
Permalink
Post by Martin Schwenke
On Tue, 16 Dec 2014 21:12:12 +0000, Rowland Penny
Post by Rowland Penny
I ran the ping_pong test this morning, following the wiki page and as
far as I could see it passed all tests.
When I run "ping_pong /clusterfs/test.dat 3" on 1 node of a 2 node OCFS2
cluster, I see a very high locking rate - in the 10000s. When I run it
on another node I see the same high locking rate and I don't see the
rate drop on the 1st node. That's a fail.
All I can say is that it did what the page said it would.
Post by Martin Schwenke
This is on a cluster where I haven't worked out the extra steps to get
lock coherence.
Post by Rowland Penny
I have come to the conclusion that you need to be a CTDB dev to set CTDB
up, only they seem to have ALL the information required.
Sorry, but that line is starting to grate. I'm concerned that
statements like this are likely to put people off using CTDB. There are
many non-CTDB-devs out there running CTDB with other cluster
filesystems.
Sorry if what I said upsets you, but I have put a lot of time into
trying to get this setup to work, but it seems to fail when I try to add
CTDB.
Post by Martin Schwenke
When the CTDB recovery lock is configured then CTDB has a hard
requirement that the cluster filesystem *must* provide lock coherence.
So the problem you have is a lack of lock coherence in OCFS2.
But it passes the ping_pong test.
Post by Martin Schwenke
I am a CTDB dev. I haven't yet got OCFS2 working, partly due to lack
of time to figure out which pieces I'm missing. I have a simple recipe
that gets me to a similar point to where you are at and I haven't even
looked at corosync. At some time I will try to go through Richard's
instructions and try to distill out the part that adds lock coherence.
I was confused by the ping pong test results so I tried to clarify the
documentation for that test.
It seems like OCFS2 is stupendously difficult to setup with lock
coherence. This is not CTDB's fault. Perhaps you need to be an OCFS2
dev to setup CTDB with OCFS2? ;-)
You could be right :-D
Post by Martin Schwenke
Post by Rowland Penny
I absolutely give up, I cannot make it work, god knows I have tried, but
I just cannot make it work with the information available. I can find
bits here and bits there, but there still seems to be something missing,
or is it just me. Debian 7.7, Pacemaker, Corosync and Ocfs2 work OK, it
is just when you try to add CTDB.
If all those other things provided lock coherence on the cluster
filesystem then CTDB would work. So adding CTDB makes you notice the
problem but CTDB does not cause it. :-)
I can well believe what you are saying, so it might help if CTDB could
print something in the logs.

Rowland
Post by Martin Schwenke
peace & happiness,
martin
Ralph Böhme
2014-12-17 09:27:16 UTC
Permalink
Hi Rowland,
Post by Richard Sharpe
Post by Ralph Böhme
can you share the bits from Debain to ocfs2? I'll set this up the next
day and see if I get ctdb to behave.
1. Create two VirtualBox VMs with enough memory and disk for your
Linux Distro. I used Debian 7.7 with 512MB and 8GB. You will also
need an extra interface on each VM for the clustering private
network. I set them to an internal type.
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size
10240 --variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd ~/VirtualBox\ VMs/SharedHD1.vdi --type shareable
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VM's.
apt-get install openais corosync pacemaker ocfs2-tools-pacemaker dlm-pcmk
5. Configure corosync
nano /etc/corosync/corosync.conf
Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.1.0
# Copy the file to the other node.
# ON BOTH NODES
service pacemaker stop # Stop these in case they were running
service corosync stop # Same here
nano /etc/default/corosync
START=no
START=yes
# now start the cluster
service corosync start
# Also start it on the other node(s).
============
Last updated: Mon Dec 15 10:46:20 2014
Last change: Mon Dec 15 10:44:18 2014 via crmd on cluster1
Stack: openais
Current DC: cluster1 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ cluster1 cluster2 ]
If you do not see all the other nodes online, then you have to debug
the problem.
6. Configure the Oracle cluster
dpkg-reconfigure ocfs2-tools
Configuring ocfs2-tools
# Would you like to start an OCFS2 cluster (O2CB) at boot time?
#
# <Yes>
#
#
# ctdbdemo
# Create the ocfs2 cluster conf file
o2cb_ctl -C -n ctdbdemo -t cluster
o2cb_ctl -C -n cluster1 -t node -a number=1 -a
ip_address=192.168.1.10 -a ip_port=7777 -a cluster=ctdbdemo
o2cb_ctl -C -n cluster2 -t node -a number=2 -a
ip_address=192.168.1.11 -a ip_port=7777 -a cluster=ctdbdemo
# ON BOTH NODES
service corosync stop
# Copy files to the other node.
service o2cb start
service corosync start
crm configure property stonith-enabled=false
mkfs.ocfs2 -L CTDBdemocommon -T datafiles -N 4 /dev/sdb
8. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
mkdir /cluster
mount -t ocfs2 /dev/sdb /cluster
This gets you a shared cluster??
For me it all goes pear shaped when I try to add CTDB.
If you find that I have missed something or done something wrong,
then I will not be surprised, info is very hard to find.
ok, this looks like you're enabling the lock manager (DLM) which is a
pita. Configuring ctdb is a breeze compared to that, I swear! ;)

The problem is a lack of documentation for configuring OCSF2 with
Pacemaker and friends on Debian and derived distros. Here are some
pointers I bookmarked over the last days:

<http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/>
<http://www.hastexo.com/resources/hints-and-kinks/ocfs2-pacemaker-debianubuntu>

Afaict even the official Pacemaker guide for Ubuntu doesn't cover
enabling the DLM:
<http://clusterlabs.org/quickstart-ubuntu.html>

The next step on my todo list for this is to head over to
<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>

and see if someone can point us the right direction.

I'll also look into enhancing ping_pong to provide a simple and
reliable test for custer-wide locking, stay tuned. :)

-Ralph
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de,mailto:***@sernet.de
Continue reading on narkive:
Loading...