Discussion:
RAFT and CTDB
(too old to reply)
Richard Sharpe
2014-11-15 18:31:30 UTC
Permalink
Hi Volker,

At SDC you mentioned that you have an implementation of RAFT and I
assumed, perhaps incorrectly, that you were thinking of using RAFT to
manage things like recovery in CTDB.

Can you tell me more about your ideas in this regard and point me at any code?

I am working on a project where I might be able to use some of this.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Volker Lendecke
2014-11-17 07:41:13 UTC
Permalink
Post by Richard Sharpe
Hi Volker,
At SDC you mentioned that you have an implementation of RAFT and I
assumed, perhaps incorrectly, that you were thinking of using RAFT to
manage things like recovery in CTDB.
Can you tell me more about your ideas in this regard and point me at any code?
It's not finished yet, sorry. I have the basic algorithm and
configuration changes done, but log compaction is still
missing, so this is nothing for general consumption yet.

Apart from that, I want to have a dbwrap_raft eventually,
the main goal is to meet the persistence requirements that resilient
and persistent file handles need.

What would your project be?

With best regards,

Volker Lendecke
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:***@sernet.de
Richard Sharpe
2014-11-17 21:20:39 UTC
Permalink
On Sun, Nov 16, 2014 at 11:41 PM, Volker Lendecke
Post by Volker Lendecke
Post by Richard Sharpe
Hi Volker,
At SDC you mentioned that you have an implementation of RAFT and I
assumed, perhaps incorrectly, that you were thinking of using RAFT to
manage things like recovery in CTDB.
Can you tell me more about your ideas in this regard and point me at any code?
It's not finished yet, sorry. I have the basic algorithm and
configuration changes done, but log compaction is still
missing, so this is nothing for general consumption yet.
Apart from that, I want to have a dbwrap_raft eventually,
the main goal is to meet the persistence requirements that resilient
and persistent file handles need.
What would your project be?
Well, I have to get CTDB working with a clustered file system that
does not currently support fcntl locks :-(

I have also thought for a while that Corosync would be used to
implement all the functionality that CTDB needs ... and might have a
look at that.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Michael Adam
2014-11-17 22:31:45 UTC
Permalink
Post by Richard Sharpe
On Sun, Nov 16, 2014 at 11:41 PM, Volker Lendecke
Post by Volker Lendecke
Post by Richard Sharpe
At SDC you mentioned that you have an implementation of RAFT and I
assumed, perhaps incorrectly, that you were thinking of using RAFT to
manage things like recovery in CTDB.
Can you tell me more about your ideas in this regard and point me at any code?
It's not finished yet, sorry. I have the basic algorithm and
configuration changes done, but log compaction is still
missing, so this is nothing for general consumption yet.
Apart from that, I want to have a dbwrap_raft eventually,
the main goal is to meet the persistence requirements that resilient
and persistent file handles need.
What would your project be?
Well, I have to get CTDB working with a clustered file system that
does not currently support fcntl locks :-(
You don't necessarily need to.
CTDB only ues the fcntl lock for the recovery lock file
which is its means for split brain prevention.

It runs without a recovery lock file.
But then you should try to implement split brain prevention for
ctdb differently. I think we still need good hooks in ctdb for
mechanisms other than the recovey lock.

Cheers - Michael
Richard Sharpe
2014-11-19 17:45:48 UTC
Permalink
On Sun, Nov 16, 2014 at 11:41 PM, Volker Lendecke
Post by Volker Lendecke
Post by Richard Sharpe
Hi Volker,
At SDC you mentioned that you have an implementation of RAFT and I
assumed, perhaps incorrectly, that you were thinking of using RAFT to
manage things like recovery in CTDB.
Can you tell me more about your ideas in this regard and point me at any code?
It's not finished yet, sorry. I have the basic algorithm and
configuration changes done, but log compaction is still
missing, so this is nothing for general consumption yet.
Apart from that, I want to have a dbwrap_raft eventually,
the main goal is to meet the persistence requirements that resilient
and persistent file handles need.
So, you guys keep saying that but never let on what the issue with
CTDB as it stands is :-( Is there some sort of secret handshake
required?

It seems to me the problem is that it is a result of a design decision
taken by CTDB where some types of TDBs are fetch on demand ...
however, if a Samba node has just recorded info that must be
persistent and no one fetches it before that node crashes then we have
just lost that persistent info.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Volker Lendecke
2014-11-19 21:29:10 UTC
Permalink
Post by Richard Sharpe
Post by Volker Lendecke
It's not finished yet, sorry. I have the basic algorithm and
configuration changes done, but log compaction is still
missing, so this is nothing for general consumption yet.
Apart from that, I want to have a dbwrap_raft eventually,
the main goal is to meet the persistence requirements that resilient
and persistent file handles need.
So, you guys keep saying that but never let on what the issue with
CTDB as it stands is :-( Is there some sort of secret handshake
required?
It seems to me the problem is that it is a result of a design decision
taken by CTDB where some types of TDBs are fetch on demand ...
however, if a Samba node has just recorded info that must be
persistent and no one fetches it before that node crashes then we have
just lost that persistent info.
Correct. The basic design decision of ctdb was an insight Tridge and
I had years ago: We CAN lose data in ctdb. The main goal was to make
locking.tdb fast. locking.tdb contains entries for all open files. If
you transfer locking.tdb data ownership to a node that holds the file
open, it does not have to tell everybody else proactively. It does not
even have to replicate the open file information for failover purposes,
because the information about open files on a node is worthless anyway
if the node crashes. All that ctdb has to make sure is that records are
transferred on demand when someone else actively asks for it. There's
some really deep subleties around this like for example getting rid
of deleted records in a correct and reasonably scalable manner without
them getting back as zombies or multiple nodes chasing the same record,
but ctdb as a pure on-demand data mover is basically it. This breaks down
of course if you want to hand out persistence guarantees for open files,
so we have to find ways somewhere between the nonpersistent (cheap/fast
writes/no guarantees) and persistent (expensive writes, all nodes always
have the same copy on rotating rust) databases.

Volker
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:***@sernet.de
Richard Sharpe
2014-11-20 23:24:39 UTC
Permalink
Post by Michael Adam
Post by Richard Sharpe
On Sun, Nov 16, 2014 at 11:41 PM, Volker Lendecke
Post by Volker Lendecke
Post by Richard Sharpe
At SDC you mentioned that you have an implementation of RAFT and I
assumed, perhaps incorrectly, that you were thinking of using RAFT to
manage things like recovery in CTDB.
Can you tell me more about your ideas in this regard and point me at any code?
It's not finished yet, sorry. I have the basic algorithm and
configuration changes done, but log compaction is still
missing, so this is nothing for general consumption yet.
Apart from that, I want to have a dbwrap_raft eventually,
the main goal is to meet the persistence requirements that resilient
and persistent file handles need.
What would your project be?
Well, I have to get CTDB working with a clustered file system that
does not currently support fcntl locks :-(
You don't necessarily need to.
CTDB only ues the fcntl lock for the recovery lock file
which is its means for split brain prevention.
It runs without a recovery lock file.
But then you should try to implement split brain prevention for
ctdb differently. I think we still need good hooks in ctdb for
mechanisms other than the recovey lock.
Hmmm, so the essential abstraction here is that any node that is no
longer a member of the cluster (because it can't get a lock on that
file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
open the recovery lock file and then take out a lock on it.

The first should/will fail if we are no longer a member of the cluster
and the second will fail if the cluster properly supports fcntl locks
but another recovery daemon has already locked the file ...

To make this support other clustering approaches would probably
involve redesigning that somewhat.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Martin Schwenke
2014-11-20 23:41:20 UTC
Permalink
On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
Post by Richard Sharpe
Hmmm, so the essential abstraction here is that any node that is no
longer a member of the cluster (because it can't get a lock on that
file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
open the recovery lock file and then take out a lock on it.
The first should/will fail if we are no longer a member of the cluster
and the second will fail if the cluster properly supports fcntl locks
but another recovery daemon has already locked the file ...
No, only the recovery master can hold the recovery lock. Other nodes
would not be able to take the lock but they are still cluster members.

Cluster membership is defined by being connected to the node that is
currently the recovery master. That is, nodes that the recovery master
knows about (i.e. connected) and are active (i.e. not stopped or
banned) will take part in recovery.

If a node becomes disconnected then it will try to become the recovery
master of its own cluster. If it can take the recovery lock then it is
allowed to do that.

So the recovery lock simply helps to stop a split brain where there are
multiple independent clusters operating independently. Each would have
a different cluster database so would have inconsistent ideas of, for
example, locking.tdb... and this can obviously lead to file data
corruption.

peace & happiness,
martin
Richard Sharpe
2014-11-20 23:55:39 UTC
Permalink
Post by Martin Schwenke
On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
Post by Richard Sharpe
Hmmm, so the essential abstraction here is that any node that is no
longer a member of the cluster (because it can't get a lock on that
file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
open the recovery lock file and then take out a lock on it.
The first should/will fail if we are no longer a member of the cluster
and the second will fail if the cluster properly supports fcntl locks
but another recovery daemon has already locked the file ...
No, only the recovery master can hold the recovery lock. Other nodes
would not be able to take the lock but they are still cluster members.
Isn't that what I said? When I said cluster above I was referring to a
GPFS cluster.
Post by Martin Schwenke
Cluster membership is defined by being connected to the node that is
currently the recovery master. That is, nodes that the recovery master
knows about (i.e. connected) and are active (i.e. not stopped or
banned) will take part in recovery.
OK, that is a wrinkle I had not thought of. What if they have lost
connection to the GPFS cluster but are still talking to the recovery
master?
Post by Martin Schwenke
If a node becomes disconnected then it will try to become the recovery
master of its own cluster. If it can take the recovery lock then it is
allowed to do that.
So the recovery lock simply helps to stop a split brain where there are
multiple independent clusters operating independently. Each would have
a different cluster database so would have inconsistent ideas of, for
example, locking.tdb... and this can obviously lead to file data
corruption.
peace & happiness,
martin
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Martin Schwenke
2014-11-21 00:04:32 UTC
Permalink
On Thu, 20 Nov 2014 15:55:39 -0800, Richard Sharpe
Post by Richard Sharpe
Post by Martin Schwenke
On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
Post by Richard Sharpe
Hmmm, so the essential abstraction here is that any node that is no
longer a member of the cluster (because it can't get a lock on that
file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
open the recovery lock file and then take out a lock on it.
The first should/will fail if we are no longer a member of the cluster
and the second will fail if the cluster properly supports fcntl locks
but another recovery daemon has already locked the file ...
No, only the recovery master can hold the recovery lock. Other nodes
would not be able to take the lock but they are still cluster members.
Isn't that what I said? When I said cluster above I was referring to a
GPFS cluster.
CTDB has its own independent notion of cluster membership and I thought
you were referring to that. I didn't notice you mentioning GPFS. :-)
Post by Richard Sharpe
Post by Martin Schwenke
Cluster membership is defined by being connected to the node that is
currently the recovery master. That is, nodes that the recovery master
knows about (i.e. connected) and are active (i.e. not stopped or
banned) will take part in recovery.
OK, that is a wrinkle I had not thought of. What if they have lost
connection to the GPFS cluster but are still talking to the recovery
master?
Then you would hope that they can't take the recovery lock. ;-)

If a node in a break-away cluster (i.e. lost CTDB connection with
main cluster - perhaps just 1 node) wins an election then it will try to
become recovery master. When it tries to take the recovery lock and
fails it will ban itself. Rinse and repeat for other nodes in the
break-away cluster.

So, provided nodes in a break-away cluster can't take the recovery lock
then they will all get banned and can do no harm.

If such nodes can still take the recovery lock after being expelled
from the GPFS cluster then you should probably have the appropriate GPFS
callback shutdown CTDB. Depending on the CTDB configuration, this will
probably take down Samba and other services, preventing any issues.

peace & happiness,
martin
Chan Min Wai
2014-11-21 02:08:53 UTC
Permalink
Dear Martin,

Since we have touch the lock.
I've some experience with it where I'd lock are define.

I point the lock to the shared ocfs2 cluster.

CTDB Will not start and kept on asking for lock.

Which is something I'm not sure.

I follow this guide.
http://linuxcostablanca.blogspot.com/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html?m=1

The different is that my ocfs2 are shared storage between the 2 node and thus no Drbd.

Does the lock really work on this scenario?

Thank you.

Ps sorry to cut in as such.

Regards,
Min Wai, Chan
Post by Martin Schwenke
On Thu, 20 Nov 2014 15:55:39 -0800, Richard Sharpe
Post by Richard Sharpe
Post by Martin Schwenke
On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
Post by Richard Sharpe
Hmmm, so the essential abstraction here is that any node that is no
longer a member of the cluster (because it can't get a lock on that
file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
open the recovery lock file and then take out a lock on it.
The first should/will fail if we are no longer a member of the cluster
and the second will fail if the cluster properly supports fcntl locks
but another recovery daemon has already locked the file ...
No, only the recovery master can hold the recovery lock. Other nodes
would not be able to take the lock but they are still cluster members.
Isn't that what I said? When I said cluster above I was referring to a
GPFS cluster.
CTDB has its own independent notion of cluster membership and I thought
you were referring to that. I didn't notice you mentioning GPFS. :-)
Post by Richard Sharpe
Post by Martin Schwenke
Cluster membership is defined by being connected to the node that is
currently the recovery master. That is, nodes that the recovery master
knows about (i.e. connected) and are active (i.e. not stopped or
banned) will take part in recovery.
OK, that is a wrinkle I had not thought of. What if they have lost
connection to the GPFS cluster but are still talking to the recovery
master?
Then you would hope that they can't take the recovery lock. ;-)
If a node in a break-away cluster (i.e. lost CTDB connection with
main cluster - perhaps just 1 node) wins an election then it will try to
become recovery master. When it tries to take the recovery lock and
fails it will ban itself. Rinse and repeat for other nodes in the
break-away cluster.
So, provided nodes in a break-away cluster can't take the recovery lock
then they will all get banned and can do no harm.
If such nodes can still take the recovery lock after being expelled
from the GPFS cluster then you should probably have the appropriate GPFS
callback shutdown CTDB. Depending on the CTDB configuration, this will
probably take down Samba and other services, preventing any issues.
peace & happiness,
martin
steve
2014-11-21 08:27:33 UTC
Permalink
Post by Chan Min Wai
Dear Martin,
Since we have touch the lock.
I've some experience with it where I'd lock are define.
I point the lock to the shared ocfs2 cluster.
CTDB Will not start and kept on asking for lock.
Which is something I'm not sure.
I follow this guide.
http://linuxcostablanca.blogspot.com/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html?m=1
The different is that my ocfs2 are shared storage between the 2 node and thus no Drbd.
Does the lock really work on this scenario?
Hi
Confirmed: CTDB will not start with the lock on the ocfs2 shared disk.
You have to do the sb separately, hence drbd. Maybe it's ocfs2's fault?
Has anyone tried with another fs?
steve
2014-11-23 19:50:07 UTC
Permalink
Post by Chan Min Wai
Dear Martin,
Since we have touch the lock.
I've some experience with it where I'd lock are define.
I point the lock to the shared ocfs2 cluster.
CTDB Will not start and kept on asking for lock.
Which is something I'm not sure.
I follow this guide.
http://linuxcostablanca.blogspot.com/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html?m=1
The different is that my ocfs2 are shared storage between the 2 node and thus no Drbd.
Does the lock really work on this scenario?
Thank you.
Ps sorry to cut in as such.
Regards,
Min Wai, Chan
Post by Martin Schwenke
On Thu, 20 Nov 2014 15:55:39 -0800, Richard Sharpe
Post by Richard Sharpe
Post by Martin Schwenke
On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
Post by Richard Sharpe
Hmmm, so the essential abstraction here is that any node that is no
longer a member of the cluster (because it can't get a lock on that
file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
open the recovery lock file and then take out a lock on it.
The first should/will fail if we are no longer a member of the cluster
and the second will fail if the cluster properly supports fcntl locks
but another recovery daemon has already locked the file ...
No, only the recovery master can hold the recovery lock. Other nodes
would not be able to take the lock but they are still cluster members.
Isn't that what I said? When I said cluster above I was referring to a
GPFS cluster.
CTDB has its own independent notion of cluster membership and I thought
you were referring to that. I didn't notice you mentioning GPFS. :-)
Post by Richard Sharpe
Post by Martin Schwenke
Cluster membership is defined by being connected to the node that is
currently the recovery master. That is, nodes that the recovery master
knows about (i.e. connected) and are active (i.e. not stopped or
banned) will take part in recovery.
OK, that is a wrinkle I had not thought of. What if they have lost
connection to the GPFS cluster but are still talking to the recovery
master?
Then you would hope that they can't take the recovery lock. ;-)
If a node in a break-away cluster (i.e. lost CTDB connection with
main cluster - perhaps just 1 node) wins an election then it will try to
become recovery master. When it tries to take the recovery lock and
fails it will ban itself. Rinse and repeat for other nodes in the
break-away cluster.
So, provided nodes in a break-away cluster can't take the recovery lock
then they will all get banned and can do no harm.
If such nodes can still take the recovery lock after being expelled
from the GPFS cluster then you should probably have the appropriate GPFS
callback shutdown CTDB. Depending on the CTDB configuration, this will
probably take down Samba and other services, preventing any issues.
peace & happiness,
martin
@Chan: Please see the thread: 'Re: posix locking on OCFS2'
We are being asked for information to solve the lock problem:)
You will most likely be able to supply:

- precise versions of software used (file system, ctdb, ...)
- exact description of what fails
- configuration (ctdb, file system, ...)
- logs (ctdb, syslog/file system ...)

Cheers,
Steve
Min Wai Chan
2014-11-24 05:31:52 UTC
Permalink
Dear Steve,

Ok ok, Will send it over then there are on one login to the server maybe
during mid night.

I saw something about the ping_ping -RW test
something like below?

ping_pong -rw test.dat N

Do you need this test result as well?

I just wonder if we mount the ocfs2 wrong?
If there is a way to mount ocfs2 via user space?


Thank You.
Post by Chan Min Wai
Post by Chan Min Wai
Dear Martin,
Since we have touch the lock.
I've some experience with it where I'd lock are define.
I point the lock to the shared ocfs2 cluster.
CTDB Will not start and kept on asking for lock.
Which is something I'm not sure.
I follow this guide.
http://linuxcostablanca.blogspot.com/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html?m=1
Post by Chan Min Wai
The different is that my ocfs2 are shared storage between the 2 node and
thus no Drbd.
Post by Chan Min Wai
Does the lock really work on this scenario?
Thank you.
Ps sorry to cut in as such.
Regards,
Min Wai, Chan
Post by Martin Schwenke
On Thu, 20 Nov 2014 15:55:39 -0800, Richard Sharpe
Post by Richard Sharpe
Post by Martin Schwenke
On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
Post by Richard Sharpe
Hmmm, so the essential abstraction here is that any node that is no
longer a member of the cluster (because it can't get a lock on that
file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
open the recovery lock file and then take out a lock on it.
The first should/will fail if we are no longer a member of the
cluster
Post by Chan Min Wai
Post by Martin Schwenke
Post by Richard Sharpe
Post by Martin Schwenke
Post by Richard Sharpe
and the second will fail if the cluster properly supports fcntl locks
but another recovery daemon has already locked the file ...
No, only the recovery master can hold the recovery lock. Other nodes
would not be able to take the lock but they are still cluster members.
Isn't that what I said? When I said cluster above I was referring to a
GPFS cluster.
CTDB has its own independent notion of cluster membership and I thought
you were referring to that. I didn't notice you mentioning GPFS. :-)
Post by Richard Sharpe
Post by Martin Schwenke
Cluster membership is defined by being connected to the node that is
currently the recovery master. That is, nodes that the recovery
master
Post by Chan Min Wai
Post by Martin Schwenke
Post by Richard Sharpe
Post by Martin Schwenke
knows about (i.e. connected) and are active (i.e. not stopped or
banned) will take part in recovery.
OK, that is a wrinkle I had not thought of. What if they have lost
connection to the GPFS cluster but are still talking to the recovery
master?
Then you would hope that they can't take the recovery lock. ;-)
If a node in a break-away cluster (i.e. lost CTDB connection with
main cluster - perhaps just 1 node) wins an election then it will try to
become recovery master. When it tries to take the recovery lock and
fails it will ban itself. Rinse and repeat for other nodes in the
break-away cluster.
So, provided nodes in a break-away cluster can't take the recovery lock
then they will all get banned and can do no harm.
If such nodes can still take the recovery lock after being expelled
from the GPFS cluster then you should probably have the appropriate GPFS
callback shutdown CTDB. Depending on the CTDB configuration, this will
probably take down Samba and other services, preventing any issues.
peace & happiness,
martin
@Chan: Please see the thread: 'Re: posix locking on OCFS2'
We are being asked for information to solve the lock problem:)
- precise versions of software used (file system, ctdb, ...)
- exact description of what fails
- configuration (ctdb, file system, ...)
- logs (ctdb, syslog/file system ...)
Cheers,
Steve
Richard Sharpe
2014-11-24 15:24:33 UTC
Permalink
Post by Min Wai Chan
Dear Steve,
Ok ok, Will send it over then there are on one login to the server maybe
during mid night.
I saw something about the ping_ping -RW test
something like below?
ping_pong -rw test.dat N
Do you need this test result as well?
I just wonder if we mount the ocfs2 wrong?
If there is a way to mount ocfs2 via user space?
It is not an issue of mounting ocfs2 incorrectly, I believe.

Rather, you have to have their user-space posix locking daemon running.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Rowland Penny
2014-11-24 15:35:06 UTC
Permalink
Post by Richard Sharpe
Post by Min Wai Chan
Dear Steve,
Ok ok, Will send it over then there are on one login to the server maybe
during mid night.
I saw something about the ping_ping -RW test
something like below?
ping_pong -rw test.dat N
Do you need this test result as well?
I just wonder if we mount the ocfs2 wrong?
If there is a way to mount ocfs2 via user space?
It is not an issue of mounting ocfs2 incorrectly, I believe.
Rather, you have to have their user-space posix locking daemon running.
I think I see a possible answer, does this mean that you would get two
locking daemons, the CTDB one AND the OCFS2 one ??

Rowland
Volker Lendecke
2014-11-24 15:38:36 UTC
Permalink
Post by Rowland Penny
Post by Richard Sharpe
Post by Min Wai Chan
Dear Steve,
Ok ok, Will send it over then there are on one login to the server maybe
during mid night.
I saw something about the ping_ping -RW test
something like below?
ping_pong -rw test.dat N
Do you need this test result as well?
I just wonder if we mount the ocfs2 wrong?
If there is a way to mount ocfs2 via user space?
It is not an issue of mounting ocfs2 incorrectly, I believe.
Rather, you have to have their user-space posix locking daemon running.
I think I see a possible answer, does this mean that you would get
two locking daemons, the CTDB one AND the OCFS2 one ??
Assuming ocfs2 really requires a daemon to do fcntl locking,
then yes, there will be two daemons. ctdb will rely on
ocfs2's daemon for the reclockfile. Both serve different
purposes.

Volker
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:***@sernet.de
Ralph Böhme
2014-11-24 15:43:29 UTC
Permalink
Post by Rowland Penny
Post by Richard Sharpe
Post by Min Wai Chan
Dear Steve,
Ok ok, Will send it over then there are on one login to the server maybe
during mid night.
I saw something about the ping_ping -RW test
something like below?
ping_pong -rw test.dat N
Do you need this test result as well?
I just wonder if we mount the ocfs2 wrong?
If there is a way to mount ocfs2 via user space?
It is not an issue of mounting ocfs2 incorrectly, I believe.
Rather, you have to have their user-space posix locking daemon running.
I think I see a possible answer, does this mean that you would get
two locking daemons, the CTDB one AND the OCFS2 one ??
probably not. CTDB is not a locking daemon. Iirc the locking facility
is referred to as DLM (Distributed Lock Manager) in ocfs lingo, cf
<http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.ocfs2.html>

Cheerio!
-Ralph
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de,mailto:***@sernet.de
Rowland Penny
2014-11-24 15:43:37 UTC
Permalink
Post by Volker Lendecke
Post by Rowland Penny
Post by Richard Sharpe
Post by Min Wai Chan
Dear Steve,
Ok ok, Will send it over then there are on one login to the server maybe
during mid night.
I saw something about the ping_ping -RW test
something like below?
ping_pong -rw test.dat N
Do you need this test result as well?
I just wonder if we mount the ocfs2 wrong?
If there is a way to mount ocfs2 via user space?
It is not an issue of mounting ocfs2 incorrectly, I believe.
Rather, you have to have their user-space posix locking daemon running.
I think I see a possible answer, does this mean that you would get
two locking daemons, the CTDB one AND the OCFS2 one ??
Assuming ocfs2 really requires a daemon to do fcntl locking,
then yes, there will be two daemons. ctdb will rely on
ocfs2's daemon for the reclockfile. Both serve different
purposes.
Volker
OK, but would/could they work together, or could they interfere with
each other and cause problems with the smbd daemon i.e stop it starting.

Rowland
Richard Sharpe
2014-11-24 17:49:34 UTC
Permalink
Post by Volker Lendecke
Post by Rowland Penny
Post by Richard Sharpe
Post by Min Wai Chan
Dear Steve,
Ok ok, Will send it over then there are on one login to the server
maybe
during mid night.
I saw something about the ping_ping -RW test
something like below?
ping_pong -rw test.dat N
Do you need this test result as well?
I just wonder if we mount the ocfs2 wrong?
If there is a way to mount ocfs2 via user space?
It is not an issue of mounting ocfs2 incorrectly, I believe.
Rather, you have to have their user-space posix locking daemon running.
I think I see a possible answer, does this mean that you would get
two locking daemons, the CTDB one AND the OCFS2 one ??
Assuming ocfs2 really requires a daemon to do fcntl locking,
then yes, there will be two daemons. ctdb will rely on
ocfs2's daemon for the reclockfile. Both serve different
purposes.
Volker
OK, but would/could they work together, or could they interfere with each
other and cause problems with the smbd daemon i.e stop it starting.
Firstly, Ralph is correct. ctdb does not use a locking daemon.

What happens is that the ctdb recovery daemon tries to take out an
fcntl lock on an ocfs2 file in the cluster. This is sent to their
user-space lock daemon that uses Corosync Closed Process Groups to
implement their distributed lock manager. If their DLM is not running
or not properly configured you will have problems and ctdb will be
unable to come up correctly.

They are two separate daemons performing separate functions.

I do not have an OCFS2 setup but it would be good to get to the bottom of this.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Richard Sharpe
2014-11-25 20:09:38 UTC
Permalink
Dear All,
What fail...
Both CTDB will start non-stop recovery...
When there is only one node, it is still working
but not on both node...
This appears to be the problem:

2014/11/26 02:18:26.363173 [recoverd: 9883]: ctdb_control error:
'managed to lock reclock file from inside daemon'
2014/11/26 02:18:26.363257 [recoverd: 9883]: ctdb_control error:
'managed to lock reclock file from inside daemon'
2014/11/26 02:18:26.363292 [recoverd: 9883]: Async operation failed
with ret=-1 res=-1 opcode=16
2014/11/26 02:18:26.363315 [recoverd: 9883]: Async wait failed - fail_count=1
2014/11/26 02:18:26.363334 [recoverd: 9883]:
server/ctdb_recoverd.c:393 Unable to set recovery mode. Recovery
failed.

Something is going wrong with locking.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Rowland Penny
2014-11-25 20:25:09 UTC
Permalink
Post by Richard Sharpe
Dear All,
What fail...
Both CTDB will start non-stop recovery...
When there is only one node, it is still working
but not on both node...
'managed to lock reclock file from inside daemon'
'managed to lock reclock file from inside daemon'
2014/11/26 02:18:26.363292 [recoverd: 9883]: Async operation failed
with ret=-1 res=-1 opcode=16
2014/11/26 02:18:26.363315 [recoverd: 9883]: Async wait failed - fail_count=1
server/ctdb_recoverd.c:393 Unable to set recovery mode. Recovery
failed.
Something is going wrong with locking.
Do you think it could have anything to do with posix file locking ??
Richard Sharpe
2014-11-25 20:53:08 UTC
Permalink
Post by Rowland Penny
Post by Richard Sharpe
Dear All,
What fail...
Both CTDB will start non-stop recovery...
When there is only one node, it is still working
but not on both node...
'managed to lock reclock file from inside daemon'
'managed to lock reclock file from inside daemon'
2014/11/26 02:18:26.363292 [recoverd: 9883]: Async operation failed
with ret=-1 res=-1 opcode=16
2014/11/26 02:18:26.363315 [recoverd: 9883]: Async wait failed -
fail_count=1
server/ctdb_recoverd.c:393 Unable to set recovery mode. Recovery
failed.
Something is going wrong with locking.
Do you think it could have anything to do with posix file locking ??
Hmmm, here is where the error is coming from:

samba-v.x.y.z/ctdb/server/ctdb_recoverd.c

/* read the childs status when trying to lock the reclock file.
child wrote 0 if everything is fine and 1 if it did manage
to lock the file, which would be a problem since that means
we got a request to exit from recovery but we could still lock
the file which at this time SHOULD be locked by the recovery
daemon on the recmaster
*/
ret = sys_read(state->fd[0], &c, 1);
if (ret != 1 || c != 0) {
ctdb_request_control_reply(state->ctdb, state->c,
NULL, -1, "managed to lock reclock file from inside daemon");
talloc_free(state);
return;
}

What version of Samba is Chan Min Wai running? There are some missing
log messages that are in Master but not in the log above, so I suspect
he/she is running a different version to the code I currently have
available.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Min Wai Chan
2014-11-26 00:54:18 UTC
Permalink
Dear Richard

Samba 4.1.12

But since I'm on gentoo so I think i better specify the build options for
good.



smbd -b
Build environment:
Built by: ***@amtbsrv01
Built on: Mon Sep 29 01:32:15 MYT 2014
Built using: x86_64-pc-linux-gnu-gcc
Build host: Linux amtbsrv01 3.14.14-gentoo #1 SMP Sun Aug 17 23:21:08
MYT 20

14 x86_64 Intel(R) Xeon(R) CPU E5-2407 0 @ 2.20GHz GenuineIntel
GNU/Linux
SRCDIR:
/var/tmp/portage/net-fs/samba-4.1.12/work/samba-4.1.12/source3
BUILDDIR:
/var/tmp/portage/net-fs/samba-4.1.12/work/samba-4.1.12/source3

Paths:
SBINDIR: /usr/sbin
BINDIR: /usr/bin
CONFIGFILE: /etc/samba/smb.conf
LOGFILEBASE: /var/log/samba
LMHOSTSFILE: /etc/samba/lmhosts
LIBDIR: /usr/lib64
MODULESDIR: /usr/lib64/samba
SHLIBEXT: so
LOCKDIR: /var/lock/samba
STATEDIR: /var/lib/samba
CACHEDIR: /var/cache/samba
PIDDIR: /var/run/samba
SMB_PASSWD_FILE: /var/lib/samba/private/smbpasswd
PRIVATE_DIR: /var/lib/samba/private

System Headers:
HAVE_SYS_ACL_H
HAVE_SYS_CAPABILITY_H
HAVE_SYS_CDEFS_H
HAVE_SYS_DIR_H
HAVE_SYS_EPOLL_H
HAVE_SYS_EVENTFD_H
HAVE_SYS_FCNTL_H
HAVE_SYS_FILE_H
HAVE_SYS_INOTIFY_H
HAVE_SYS_IOCTL_H
HAVE_SYS_IPC_H
HAVE_SYS_KERNEL_PROC_CORE_PATTERN
HAVE_SYS_MMAN_H
HAVE_SYS_MOUNT_H
HAVE_SYS_PARAM_H
HAVE_SYS_PRCTL_H
HAVE_SYS_QUOTAS
HAVE_SYS_QUOTA_H
HAVE_SYS_RESOURCE_H
HAVE_SYS_SELECT_H
HAVE_SYS_SENDFILE_H
HAVE_SYS_SHM_H
HAVE_SYS_SOCKET_H
HAVE_SYS_STATFS_H
HAVE_SYS_STATVFS_H
HAVE_SYS_STAT_H
HAVE_SYS_STROPTS_H
HAVE_SYS_SYSCALL_H
HAVE_SYS_SYSCTL_H
HAVE_SYS_SYSLOG_H
HAVE_SYS_SYSMACROS_H
HAVE_SYS_TERMIOS_H
HAVE_SYS_TIMEB_H
HAVE_SYS_TIMES_H
HAVE_SYS_TIME_H
HAVE_SYS_TYPES_H
HAVE_SYS_UCONTEXT_H
HAVE_SYS_UIO_H
HAVE_SYS_UNISTD_H
HAVE_SYS_UN_H
HAVE_SYS_UTSNAME_H
HAVE_SYS_VFS_H
HAVE_SYS_WAIT_H
HAVE_SYS_XATTR_H

Headers:
HAVE_ACL_LIBACL_H
HAVE_AIO_H
HAVE_ALLOCA_H
HAVE_ARPA_INET_H
HAVE_ARPA_NAMESER_H
HAVE_ASM_TYPES_H
HAVE_ASM_UNISTD_H
HAVE_ASN1_ERR_H
HAVE_ASSERT_H
HAVE_ATTR_ATTRIBUTES_H
HAVE_ATTR_XATTR_H
HAVE_BYTESWAP_H
HAVE_COM_ERR_H
HAVE_CONFIG_H
HAVE_CRYPT_H
HAVE_CTDB_H
HAVE_CTDB_PRIVATE_H
HAVE_CTDB_PROTOCOL_H
HAVE_CTYPE_H
HAVE_CUPS_CUPS_H
HAVE_CUPS_LANGUAGE_H
HAVE_CURSES_H
HAVE_DIRENT_H
HAVE_DLFCN_H
HAVE_ENDIAN_H
HAVE_ERRNO_H
HAVE_ERR_H
HAVE_EXECINFO_H
HAVE_FAM_H
HAVE_FCNTL_H
HAVE_FLOAT_H
HAVE_FNMATCH_H
HAVE_FORM_H
HAVE_GCRYPT_H
HAVE_GETOPT_H
HAVE_GLOB_H
HAVE_GNUTLS_GNUTLS_H
HAVE_GNUTLS_X509_H
HAVE_GRP_H
HAVE_GSSAPI_GSSAPI_H
HAVE_GSSAPI_GSSAPI_KRB5_H
HAVE_GSSAPI_GSSAPI_SPNEGO_H
HAVE_GSSAPI_H
HAVE_HCRYPTO_MD4_H
HAVE_HDB_H
HAVE_HEIMBASE_H
HAVE_HEIMNTLM_H
HAVE_HX509_H
HAVE_ICONV_H
HAVE_IFADDRS_H
HAVE_INIPARSER_H
HAVE_INTTYPES_H
HAVE_KDC_H
HAVE_KRB5_H
HAVE_KRB5_LOCATE_PLUGIN_H
HAVE_LANGINFO_H
HAVE_LASTLOG_H
HAVE_LBER_H
HAVE_LDAP_H
HAVE_LIBAIO_H
HAVE_LIBINTL_H
HAVE_LIMITS_H
HAVE_LINUX_FALLOC_H
HAVE_LINUX_FCNTL_H
HAVE_LINUX_IOCTL_H
HAVE_LINUX_NETLINK_H
HAVE_LINUX_RTNETLINK_H
HAVE_LINUX_TYPES_H
HAVE_LOCALE_H
HAVE_MALLOC_H
HAVE_MEMORY_H
HAVE_MENU_H
HAVE_MNTENT_H
HAVE_NCURSES_H
HAVE_NETDB_H
HAVE_NETINET_IN_H
HAVE_NETINET_IN_SYSTM_H
HAVE_NETINET_IP_H
HAVE_NETINET_TCP_H
HAVE_NET_IF_H
HAVE_NSS_H
HAVE_PANEL_H
HAVE_POLL_H
HAVE_POPT_H
HAVE_PTHREAD_H
HAVE_PTY_H
HAVE_PWD_H
HAVE_PYTHON_H
HAVE_READLINE_HISTORY_H
HAVE_READLINE_READLINE_H
HAVE_RESOLV_H
HAVE_ROKEN_H
HAVE_RPCSVC_NIS_H
HAVE_RPCSVC_RQUOTA_H
HAVE_RPCSVC_YPCLNT_H
HAVE_RPCSVC_YP_PROT_H
HAVE_RPC_RPC_H
HAVE_SASL_SASL_H
HAVE_SECURITY_PAM_APPL_H
HAVE_SECURITY_PAM_EXT_H
HAVE_SECURITY_PAM_MODULES_H
HAVE_SECURITY__PAM_MACROS_H
HAVE_SETJMP_H
HAVE_SHADOW_H
HAVE_SIGNAL_H
HAVE_STDARG_H
HAVE_STDBOOL_H
HAVE_STDDEF_H
HAVE_STDINT_H
HAVE_STDIO_H
HAVE_STDLIB_H
HAVE_STRINGS_H
HAVE_STRING_H
HAVE_STROPTS_H
HAVE_SYSCALL_H
HAVE_SYSLOG_H
HAVE_TERMCAP_H
HAVE_TERMIOS_H
HAVE_TERMIO_H
HAVE_TERM_H
HAVE_TIME_H
HAVE_UNISTD_H
HAVE_UTIME_H
HAVE_WIND_H
HAVE_XFS_DMAPI_H
HAVE_XFS_XQM_H
HAVE_ZLIB_H

UTMP Options:
HAVE_GETUTMPX
HAVE_UTMPX_H
HAVE_UTMP_H
HAVE_UT_UT_EXIT
HAVE_UT_UT_HOST
HAVE_UT_UT_ID
HAVE_UT_UT_NAME
HAVE_UT_UT_PID
HAVE_UT_UT_TIME
HAVE_UT_UT_TV
HAVE_UT_UT_TYPE
HAVE_UT_UT_USER
PUTUTLINE_RETURNS_UTMP
SIZEOF_UTMP_UT_LINE
WITH_UTMP

HAVE_* Defines:
HAVE_ACL_GET_FILE
HAVE_ADDR_TYPE_IN_KRB5_ADDRESS
HAVE_AIO
HAVE_AIO_CANCEL
HAVE_AIO_ERROR
HAVE_AIO_FSYNC
HAVE_AIO_READ
HAVE_AIO_RETURN
HAVE_AIO_SUSPEND
HAVE_AIO_WRITE
HAVE_AP_OPTS_USE_SUBKEY
HAVE_ASPRINTF
HAVE_ATEXIT
HAVE_ATTRIBUTE_COLD
HAVE_ATTRIBUTE_CONST
HAVE_ATTRIBUTE_NORETURN
HAVE_ATTRIBUTE_PRINTF
HAVE_ATTRIBUTE_UNUSED
HAVE_ATTRIBUTE_USED
HAVE_BACKTRACE
HAVE_BACKTRACE_SYMBOLS
HAVE_BER_SCANF
HAVE_BER_SOCKBUF_ADD_IO
HAVE_BER_TAG_T
HAVE_BINDTEXTDOMAIN
HAVE_BIND_TEXTDOMAIN_CODESET
HAVE_BLKCNT_T
HAVE_BLKSIZE_T
HAVE_BOOL
HAVE_BSWAP_64
HAVE_BUILTIN_CHOOSE_EXPR
HAVE_BUILTIN_CLZ
HAVE_BUILTIN_CLZL
HAVE_BUILTIN_CLZLL
HAVE_BUILTIN_CONSTANT_P
HAVE_BUILTIN_EXPECT
HAVE_BUILTIN_POPCOUNTL
HAVE_BUILTIN_TYPES_COMPATIBLE_P
HAVE_BZERO
HAVE_C99_VSNPRINTF
HAVE_CAP_GET_PROC
HAVE_CCAN
HAVE_CHARSET_CP850
HAVE_CHARSET_UTF_8
HAVE_CHECKSUM_IN_KRB5_CHECKSUM
HAVE_CHMOD
HAVE_CHOWN
HAVE_CHROOT
HAVE_CLOCK_GETTIME
HAVE_CLOCK_MONOTONIC
HAVE_CLOCK_PROCESS_CPUTIME_ID
HAVE_CLOCK_REALTIME
HAVE_COMPARISON_FN_T
HAVE_COMPILER_WILL_OPTIMIZE_OUT_FNS
HAVE_COMPOUND_LITERALS
HAVE_COM_ERR
HAVE_COM_RIGHT_R
HAVE_CONNECT
HAVE_CPPFUNCTION
HAVE_CRYPT
HAVE_CTDB_CONTROL_CHECK_SRVIDS_DECL
HAVE_CTDB_CONTROL_SCHEDULE_FOR_DELETION_DECL
HAVE_CTDB_CONTROL_TRANS3_COMMIT_DECL
HAVE_CTDB_WANT_READONLY_DECL
HAVE_CUPS
HAVE_DECL_ASPRINTF
HAVE_DECL_BINDTEXTDOMAIN
HAVE_DECL_BIND_TEXTDOMAIN_CODESET
HAVE_DECL_DGETTEXT
HAVE_DECL_DLOPEN
HAVE_DECL_FDATASYNC
HAVE_DECL_GETGRENT_R
HAVE_DECL_GETPWENT_R
HAVE_DECL_GETTEXT
HAVE_DECL_H_ERRNO
HAVE_DECL_KRB5_AUTH_CON_SET_REQ_CKSUMTYPE
HAVE_DECL_KRB5_GET_CREDENTIALS_FOR_USER
HAVE_DECL_READAHEAD
HAVE_DECL_RL_EVENT_HOOK
HAVE_DECL_SNPRINTF
HAVE_DECL_STRPTIME
HAVE_DECL_TEXTDOMAIN
HAVE_DECL_VASPRINTF
HAVE_DECL_VSNPRINTF
HAVE_DECL__RES
HAVE_DEVICE_MAJOR_FN
HAVE_DEVICE_MINOR_FN
HAVE_DGETTEXT
HAVE_DIRENT_D_OFF
HAVE_DIRFD
HAVE_DIRFD_DECL
HAVE_DLCLOSE
HAVE_DLERROR
HAVE_DLOPEN
HAVE_DLSYM
HAVE_DM_GET_EVENTLIST
HAVE_DN_EXPAND
HAVE_DPRINTF
HAVE_DUP2
HAVE_ENCTYPE_AES128_CTS_HMAC_SHA1_96
HAVE_ENCTYPE_AES256_CTS_HMAC_SHA1_96
HAVE_ENCTYPE_ARCFOUR_HMAC
HAVE_ENCTYPE_ARCFOUR_HMAC_MD5
HAVE_ENCTYPE_ARCFOUR_HMAC_MD5_56
HAVE_ENDHOSTENT
HAVE_ENDMNTENT
HAVE_ENDNETGRENT
HAVE_ENDNETGRENT_PROTOTYPE
HAVE_ENVIRON_DECL
HAVE_EPOLL
HAVE_EPOLL_CREATE
HAVE_ERR
HAVE_ERRNO_DECL
HAVE_ERRX
HAVE_ETYPE_IN_ENCRYPTEDDATA
HAVE_EXECL
HAVE_E_DATA_POINTER_IN_KRB5_ERROR
HAVE_FALLOCATE
HAVE_FAMNOEXISTS
HAVE_FAMOPEN2
HAVE_FAM_H_FAMCODES_TYPEDEF
HAVE_FCHMOD
HAVE_FCHOWN
HAVE_FCNTL_LOCK
HAVE_FCVT
HAVE_FDATASYNC
HAVE_FDATASYNC_DECL
HAVE_FDOPENDIR
HAVE_FGETXATTR
HAVE_FLAGS_IN_KRB5_CREDS
HAVE_FLEXIBLE_ARRAY_MEMBER
HAVE_FLISTXATTR
HAVE_FLOCK
HAVE_FREEADDRINFO
HAVE_FREEIFADDRS
HAVE_FREE_CHECKSUM
HAVE_FREMOVEXATTR
HAVE_FRSIZE
HAVE_FSEEKO
HAVE_FSETXATTR
HAVE_FSID_INT
HAVE_FSTATAT
HAVE_FSYNC
HAVE_FTRUNCATE
HAVE_FTRUNCATE_EXTEND
HAVE_FUNCTION_ATTRIBUTE_DESTRUCTOR
HAVE_FUNCTION_MACRO
HAVE_FUTIMENS
HAVE_FUTIMES
HAVE_F_SETLEASE_DECL
HAVE_GAI_STRERROR
HAVE_GCRY_CONTROL
HAVE_GETADDRINFO
HAVE_GETCWD
HAVE_GETDIRENTRIES
HAVE_GETGRENT
HAVE_GETGRENT_R
HAVE_GETGRENT_R_DECL
HAVE_GETGRGID_R
HAVE_GETGRNAM
HAVE_GETGRNAM_R
HAVE_GETGROUPLIST
HAVE_GETHOSTBYADDR
HAVE_GETHOSTBYNAME
HAVE_GETHOSTBYNAME_R
HAVE_GETHOSTENT
HAVE_GETHOSTENT_R
HAVE_GETHOSTNAME
HAVE_GETIFADDRS
HAVE_GETMNTENT
HAVE_GETNAMEINFO
HAVE_GETNETGRENT
HAVE_GETNETGRENT_PROTOTYPE
HAVE_GETPAGESIZE
HAVE_GETPGRP
HAVE_GETPWENT_R
HAVE_GETPWENT_R_DECL
HAVE_GETPWNAM
HAVE_GETPWNAM_R
HAVE_GETPWUID_R
HAVE_GETQUOTA_RSLT_GETQUOTA_RSLT_U
HAVE_GETRLIMIT
HAVE_GETSPNAM
HAVE_GETTEXT
HAVE_GETTIMEOFDAY_TZ
HAVE_GETUTXENT
HAVE_GETXATTR
HAVE_GET_CURRENT_DIR_NAME
HAVE_GLOB
HAVE_GNUTLS
HAVE_GNUTLS_DATUM
HAVE_GNUTLS_DATUM_T
HAVE_GNUTLS_GLOBAL_INIT
HAVE_GNUTLS_X509_CRT_SET_SUBJECT_KEY_ID
HAVE_GNUTLS_X509_CRT_SET_VERSION
HAVE_GPG_ERR_CODE_FROM_ERRNO
HAVE_GRANTPT
HAVE_GSSAPI
HAVE_GSSKRB5_EXTRACT_AUTHZ_DATA_FROM_SEC_CONTEXT
HAVE_GSSKRB5_GET_SUBKEY
HAVE_GSS_DISPLAY_STATUS
HAVE_GSS_EXPORT_CRED
HAVE_GSS_IMPORT_CRED
HAVE_GSS_INQUIRE_SEC_CONTEXT_BY_OID
HAVE_GSS_KRB5_EXPORT_LUCID_SEC_CONTEXT
HAVE_GSS_KRB5_IMPORT_CRED
HAVE_GSS_OID_EQUAL
HAVE_GSS_OID_TO_NAME
HAVE_GSS_WRAP_IOV
HAVE_HDB_DB_DIR
HAVE_HEIM_CMP
HAVE_HEIM_NTLM_NTLMV2_KEY
HAVE_HISTORY_LIST
HAVE_HSTRERROR
HAVE_HTTPCONNECT
HAVE_HTTPCONNECTENCRYPT
HAVE_HX509_BITSTRING_PRINT
HAVE_H_ERRNO
HAVE_ICONV
HAVE_ICONV_OPEN
HAVE_IFACE_GETIFADDRS
HAVE_IF_NAMETOINDEX
HAVE_IMMEDIATE_STRUCTURES
HAVE_INET_ATON
HAVE_INET_NTOA
HAVE_INET_NTOP
HAVE_INET_PTON
HAVE_INIPARSER_LOAD
HAVE_INITGROUPS
HAVE_INITIALIZE_ASN1_ERROR_TABLE
HAVE_INITIALIZE_KRB5_ERROR_TABLE
HAVE_INITSCR
HAVE_INNETGR
HAVE_INOTIFY
HAVE_INO_T
HAVE_INT16_T
HAVE_INT32_T
HAVE_INT64_T
HAVE_INT8_T
HAVE_INTPTR_T
HAVE_IO_SUBMIT
HAVE_IPRINT
HAVE_IPV6
HAVE_IPV6_V6ONLY
HAVE_IRUSEROK
HAVE_ISATTY
HAVE_ISBLANK
HAVE_ITEM_COUNT
HAVE_KDC_LOG
HAVE_KERNEL_CHANGE_NOTIFY
HAVE_KERNEL_OPLOCKS_LINUX
HAVE_KERNEL_SHARE_MODES
HAVE_KRB5
HAVE_KRB5_ADDRESSES
HAVE_KRB5_ANYADDR
HAVE_KRB5_AUTH_CON_SETKEY
HAVE_KRB5_CC_GET_LIFETIME
HAVE_KRB5_CREATE_CHECKSUM
HAVE_KRB5_CRYPTO
HAVE_KRB5_CRYPTO_DESTROY
HAVE_KRB5_CRYPTO_INIT
HAVE_KRB5_C_VERIFY_CHECKSUM
HAVE_KRB5_ENCTYPE_TO_STRING
HAVE_KRB5_ENCTYPE_TO_STRING_WITH_KRB5_CONTEXT_ARG
HAVE_KRB5_FREE_ERROR_CONTENTS
HAVE_KRB5_FREE_HOST_REALM
HAVE_KRB5_FREE_UNPARSED_NAME
HAVE_KRB5_FWD_TGT_CREDS
HAVE_KRB5_GET_CREDS
HAVE_KRB5_GET_CREDS_OPT_ALLOC
HAVE_KRB5_GET_CREDS_OPT_SET_IMPERSONATE
HAVE_KRB5_GET_DEFAULT_IN_TKT_ETYPES
HAVE_KRB5_GET_HOST_REALM
HAVE_KRB5_GET_INIT_CREDS_KEYBLOCK
HAVE_KRB5_GET_INIT_CREDS_OPT_ALLOC
HAVE_KRB5_GET_INIT_CREDS_OPT_FREE
HAVE_KRB5_GET_INIT_CREDS_OPT_GET_ERROR
HAVE_KRB5_GET_INIT_CREDS_OPT_SET_PAC_REQUEST
HAVE_KRB5_GET_PW_SALT
HAVE_KRB5_GET_RENEWED_CREDS
HAVE_KRB5_KEYBLOCK_INIT
HAVE_KRB5_KEYBLOCK_KEYVALUE
HAVE_KRB5_KEYTAB_ENTRY_KEYBLOCK
HAVE_KRB5_KRBHST_GET_ADDRINFO
HAVE_KRB5_KRBHST_INIT
HAVE_KRB5_KT_COMPARE
HAVE_KRB5_KT_FREE_ENTRY
HAVE_KRB5_KU_OTHER_CKSUM
HAVE_KRB5_MAKE_PRINCIPAL
HAVE_KRB5_MK_REQ_EXTENDED
HAVE_KRB5_PDU_NONE_DECL
HAVE_KRB5_PRINCIPAL_COMPARE_ANY_REALM
HAVE_KRB5_PRINCIPAL_GET_COMP_STRING
HAVE_KRB5_PRINCIPAL_GET_NUM_COMP
HAVE_KRB5_PRINCIPAL_GET_REALM
HAVE_KRB5_REALM_TYPE
HAVE_KRB5_SET_DEFAULT_IN_TKT_ETYPES
HAVE_KRB5_SET_REAL_TIME
HAVE_KRB5_STRING_TO_KEY
HAVE_KRB5_STRING_TO_KEY_SALT
HAVE_KRB_STRUCT_WINSIZE
HAVE_LARGEFILE
HAVE_LBER_LOG_PRINT_FN
HAVE_LCHOWN
HAVE_LDAP
HAVE_LDAP_ADD_RESULT_ENTRY
HAVE_LDAP_INIT
HAVE_LDAP_INITIALIZE
HAVE_LDAP_INIT_FD
HAVE_LDAP_OPT_SOCKBUF
HAVE_LDAP_SASL_WRAPPING
HAVE_LDAP_SET_REBIND_PROC
HAVE_LDB
HAVE_LIBACL
HAVE_LIBAIO
HAVE_LIBASN1
HAVE_LIBATTR
HAVE_LIBCAP
HAVE_LIBCOM_ERR
HAVE_LIBCRYPT
HAVE_LIBCUPS
HAVE_LIBDL
HAVE_LIBDM
HAVE_LIBFAM
HAVE_LIBFORM
HAVE_LIBGCRYPT
HAVE_LIBGNUTLS
HAVE_LIBGPG_ERROR
HAVE_LIBGSSAPI
HAVE_LIBHCRYPTO
HAVE_LIBHDB
HAVE_LIBHEIMBASE
HAVE_LIBHEIMNTLM
HAVE_LIBHX509
HAVE_LIBINIPARSER
HAVE_LIBKDC
HAVE_LIBKRB5
HAVE_LIBLBER
HAVE_LIBLDAP
HAVE_LIBMENU
HAVE_LIBNCURSES
HAVE_LIBNSL
HAVE_LIBPAM
HAVE_LIBPANEL
HAVE_LIBPOPT
HAVE_LIBPTHREAD
HAVE_LIBREADLINE
HAVE_LIBREPLACE
HAVE_LIBRESOLV
HAVE_LIBROKEN
HAVE_LIBRT
HAVE_LIBSASL2
HAVE_LIBUTIL
HAVE_LIBWIND
HAVE_LIBZ
HAVE_LINK
HAVE_LINUX_FALLOCATE
HAVE_LINUX_INOTIFY
HAVE_LINUX_IOCTL
HAVE_LINUX_KERNEL_AIO
HAVE_LINUX_READAHEAD
HAVE_LINUX_SPLICE
HAVE_LISTXATTR
HAVE_LITTLE_ENDIAN
HAVE_LLSEEK
HAVE_LOFF_T
HAVE_LONGLONG
HAVE_LONG_LONG
HAVE_LSTAT
HAVE_LUTIMES
HAVE_MAKEDEV
HAVE_MD4_INIT
HAVE_MEMALIGN
HAVE_MEMCPY
HAVE_MEMMEM
HAVE_MEMMOVE
HAVE_MEMSET
HAVE_MKDIR_MODE
HAVE_MKDTEMP
HAVE_MKNOD
HAVE_MKTIME
HAVE_MLOCK
HAVE_MLOCKALL
HAVE_MMAP
HAVE_MREMAP
HAVE_MSGHDR_MSG_CONTROL
HAVE_MUNLOCK
HAVE_MUNLOCKALL
HAVE_NANOSLEEP
HAVE_NATIVE_ICONV
HAVE_NCURSES
HAVE_NEW_FIELD
HAVE_NEW_FORM
HAVE_NEW_LIBREADLINE
HAVE_NEW_PANEL
HAVE_NFS_QUOTAS
HAVE_NTDB
HAVE_OPENAT
HAVE_OPENPTY
HAVE_OPEN_O_DIRECT
HAVE_PAM_GET_DATA
HAVE_PAM_RADIO_TYPE
HAVE_PAM_RHOST
HAVE_PAM_START
HAVE_PAM_TTY
HAVE_PAM_VSYSLOG
HAVE_PATHCONF
HAVE_PEERCRED
HAVE_PERL_MAKEMAKER
HAVE_PIPE
HAVE_POLL
HAVE_POPT
HAVE_POPTGETCONTEXT
HAVE_POSIX_ACLS
HAVE_POSIX_CAPABILITIES
HAVE_POSIX_FADVISE
HAVE_POSIX_FALLOCATE
HAVE_POSIX_MEMALIGN
HAVE_POSIX_OPENPT
HAVE_PRCTL
HAVE_PREAD
HAVE_PREAD_DECL
HAVE_PRINTF
HAVE_PTHREAD
HAVE_PTHREAD_ATTR_INIT
HAVE_PTHREAD_CREATE
HAVE_PTRDIFF_T
HAVE_PUTENV
HAVE_PUTUTLINE
HAVE_PUTUTXLINE
HAVE_PWRITE
HAVE_PWRITE_DECL
HAVE_PYLDB_UTIL
HAVE_PYTALLOC_UTIL
HAVE_QUOTACTL_LINUX
HAVE_RAND
HAVE_RANDOM
HAVE_RCMD
HAVE_READAHEAD_DECL
HAVE_READLINK
HAVE_READV
HAVE_REALPATH
HAVE_REMOVEXATTR
HAVE_RENAME
HAVE_RES_NSEARCH
HAVE_RES_SEARCH
HAVE_RK_SOCKET_SET_REUSEADDR
HAVE_RL_COMPLETION_MATCHES
HAVE_SASL
HAVE_SASL_CLIENT_INIT
HAVE_SA_FAMILY_T
HAVE_SA_SIGINFO_DECL
HAVE_SECURE_MKSTEMP
HAVE_SELECT
HAVE_SENDFILE
HAVE_SENDMSG
HAVE_SETBUFFER
HAVE_SETEGID
HAVE_SETENV
HAVE_SETENV_DECL
HAVE_SETEUID
HAVE_SETGID
HAVE_SETGROUPS
HAVE_SETHOSTENT
HAVE_SETITIMER
HAVE_SETLINEBUF
HAVE_SETLOCALE
HAVE_SETMNTENT
HAVE_SETNETGRENT
HAVE_SETNETGRENT_PROTOTYPE
HAVE_SETPGID
HAVE_SETREGID
HAVE_SETRESGID
HAVE_SETRESGID_DECL
HAVE_SETRESUID
HAVE_SETRESUID_DECL
HAVE_SETREUID
HAVE_SETSID
HAVE_SETUID
HAVE_SETXATTR
HAVE_SET_MENU_ITEMS
HAVE_SHARED_MMAP
HAVE_SHMGET
HAVE_SHM_OPEN
HAVE_SHOW_PANEL
HAVE_SIGACTION
HAVE_SIGBLOCK
HAVE_SIGPROCMASK
HAVE_SIGSET
HAVE_SIG_ATOMIC_T_TYPE
HAVE_SIMPLE_C_PROG
HAVE_SIZE_T
HAVE_SNPRINTF
HAVE_SOCKET
HAVE_SOCKETPAIR
HAVE_SOCKLEN_T
HAVE_SPLICE_DECL
HAVE_SRAND
HAVE_SRANDOM
HAVE_SSIZE_T
HAVE_SS_FAMILY
HAVE_STATFS_F_FSID
HAVE_STATVFS
HAVE_STATVFS_F_FLAG
HAVE_STAT_HIRES_TIMESTAMPS
HAVE_STAT_ST_BLKSIZE
HAVE_STAT_ST_BLOCKS
HAVE_STAT_TV_NSEC
HAVE_STRCASECMP
HAVE_STRCASESTR
HAVE_STRCHR
HAVE_STRCPY
HAVE_STRDUP
HAVE_STRERROR
HAVE_STRERROR_R
HAVE_STRFTIME
HAVE_STRNCASECMP
HAVE_STRNCPY
HAVE_STRNDUP
HAVE_STRNLEN
HAVE_STRPBRK
HAVE_STRPTIME
HAVE_STRSEP
HAVE_STRSIGNAL
HAVE_STRTOK_R
HAVE_STRTOL
HAVE_STRTOLL
HAVE_STRTOQ
HAVE_STRTOULL
HAVE_STRTOUQ
HAVE_STRUCT_ADDRINFO
HAVE_STRUCT_CTDB_CONTROL_TCP
HAVE_STRUCT_CTDB_CONTROL_TCP_ADDR
HAVE_STRUCT_IFADDRS
HAVE_STRUCT_SIGEVENT
HAVE_STRUCT_SIGEVENT_SIGEV_VALUE_SIVAL_PTR
HAVE_STRUCT_SOCKADDR
HAVE_STRUCT_SOCKADDR_IN6
HAVE_STRUCT_SOCKADDR_STORAGE
HAVE_STRUCT_STAT_ST_MTIM_TV_NSEC
HAVE_STRUCT_STAT_ST_RDEV
HAVE_STRUCT_TIMESPEC
HAVE_STRUCT_WINSIZE
HAVE_ST_RDEV
HAVE_SUBUNIT
HAVE_SWAB
HAVE_SYMLINK
HAVE_SYSCALL
HAVE_SYSCONF
HAVE_SYSCTL
HAVE_SYSLOG
HAVE_TALLOC
HAVE_TDB
HAVE_TEVENT
HAVE_TEXTDOMAIN
HAVE_TGETENT
HAVE_TIMEGM
HAVE_TYPEOF
HAVE_UCONTEXT_T
HAVE_UINT16_T
HAVE_UINT32_T
HAVE_UINT64_T
HAVE_UINT8_T
HAVE_UINTPTR_T
HAVE_UMASK
HAVE_UNAME
HAVE_UNIXSOCKET
HAVE_UNSETENV
HAVE_UPDWTMP
HAVE_UPDWTMPX
HAVE_USLEEP
HAVE_UTIMBUF
HAVE_UTIME
HAVE_UTIMENSAT
HAVE_UTIMES
HAVE_U_CHAR
HAVE_U_INT32_T
HAVE_VASPRINTF
HAVE_VA_COPY
HAVE_VDPRINTF
HAVE_VISIBILITY_ATTR
HAVE_VOLATILE
HAVE_VSNPRINTF
HAVE_VSYSLOG
HAVE_WAIT4
HAVE_WAITPID
HAVE_WARN
HAVE_WARNX
HAVE_WARN_UNUSED_RESULT
HAVE_WIND_STRINGPREP
HAVE_WORKING_STRPTIME
HAVE_WRITEV
HAVE_WS_XPIXEL
HAVE_WS_YPIXEL
HAVE_XATTR_SUPPORT
HAVE_XFS_QUOTAS
HAVE_YP_GET_DEFAULT_DOMAIN
HAVE_ZLIB
HAVE_ZLIBVERSION
HAVE__Bool
HAVE__RES
HAVE__VA_ARGS__MACRO
HAVE___CLOSE
HAVE___DN_EXPAND
HAVE___DUP2
HAVE___FCNTL
HAVE___FORK
HAVE___FSTAT
HAVE___FXSTAT
HAVE___LSEEK
HAVE___LSTAT
HAVE___LXSTAT
HAVE___OPEN
HAVE___READ
HAVE___STAT
HAVE___SYNC_FETCH_AND_ADD
HAVE___WRITE
HAVE___XSTAT

--with Options:
WITH_ADS
WITH_AUTOMOUNT
WITH_DNS_UPDATES
WITH_PAM
WITH_PAM_MODULES
WITH_PTHREADPOOL
WITH_QUOTAS
WITH_SENDFILE
WITH_SYSLOG
WITH_WINBIND

Build Options:
AD_DC_BUILD_IS_ENABLED
BROKEN_NISPLUS_INCLUDE_FILES
BUILD_SYSTEM
CLUSTER_SUPPORT
COMPILER_SUPPORTS_LL
CONFIG_H_IS_FROM_SAMBA
DEFAULT_DOS_CHARSET
DEFAULT_UNIX_CHARSET
ENABLE_GNUTLS
GETCWD_TAKES_NULL
INLINE_MACRO
KRB5_CREDS_OPT_FREE_REQUIRES_CONTEXT
KRB5_PRINC_REALM_RETURNS_REALM
LDAP_DEPRECATED
LDAP_SET_REBIND_PROC_ARGS
LIBREPLACE_NETWORK_CHECKS
LINUX
LINUX_SENDFILE_API
REALPATH_TAKES_NULL
RETSIGTYPE
SAMBA4_USES_HEIMDAL
SAMBA_FAM_LIBS
SEEKDIR_RETURNS_VOID
SHLIBEXT
SIZEOF_BLKCNT_T_8
SIZEOF_BOOL
SIZEOF_CHAR
SIZEOF_DEV_T
SIZEOF_INO_T
SIZEOF_INT
SIZEOF_INT16_T
SIZEOF_INT32_T
SIZEOF_INT64_T
SIZEOF_INT8_T
SIZEOF_LONG
SIZEOF_LONG_LONG
SIZEOF_OFF_T
SIZEOF_SHORT
SIZEOF_SIZE_T
SIZEOF_SSIZE_T
SIZEOF_TIME_T
SIZEOF_UINT16_T
SIZEOF_UINT32_T
SIZEOF_UINT64_T
SIZEOF_UINT8_T
SIZEOF_VOID_P
STAT_STATVFS
STAT_ST_BLOCKSIZE
STDC_HEADERS
STRING_STATIC_MODULES
SUMMARY_PASSES
SYSCONF_SC_NGROUPS_MAX
SYSCONF_SC_NPROCESSORS_ONLN
SYSCONF_SC_PAGESIZE
SYSTEM_UNAME_MACHINE
SYSTEM_UNAME_RELEASE
SYSTEM_UNAME_SYSNAME
SYSTEM_UNAME_VERSION
TIME_T_MAX
TIME_WITH_SYS_TIME
USEABLE_DMAPI_LIBRARY
USE_DMAPI
USE_LINUX_THREAD_CREDENTIALS
USING_SYSTEM_ASN1
USING_SYSTEM_COMPILE_ET
USING_SYSTEM_COM_ERR
USING_SYSTEM_GSSAPI
USING_SYSTEM_HCRYPTO
USING_SYSTEM_HDB
USING_SYSTEM_HEIMBASE
USING_SYSTEM_HEIMNTLM
USING_SYSTEM_HX509
USING_SYSTEM_INIPARSER
USING_SYSTEM_KDC
USING_SYSTEM_KRB5
USING_SYSTEM_LDB
USING_SYSTEM_NTDB
USING_SYSTEM_PARSE_YAPP_DRIVER
USING_SYSTEM_POPT
USING_SYSTEM_PYLDB_UTIL
USING_SYSTEM_PYNTDB
USING_SYSTEM_PYTALLOC_UTIL
USING_SYSTEM_PYTDB
USING_SYSTEM_PYTEVENT
USING_SYSTEM_ROKEN
USING_SYSTEM_SUBUNIT
USING_SYSTEM_TALLOC
USING_SYSTEM_TDB
USING_SYSTEM_TEVENT
USING_SYSTEM_WIND
VALUEOF_NSIG
VALUEOF_SIGRTMAX
VALUEOF_SIGRTMIN
VALUEOF__NSIG
VOID_RETSIGTYPE
WORKING_GETCONF_LFS_CFLAGS
XSLTPROC_MANPAGES
_GNU_SOURCE
_HAVE_SENDFILE
_HAVE_UNBROKEN_POSIX_FALLOCATE
_SAMBA_BUILD_
_XOPEN_SOURCE_EXTENDED
__TIME_T_MAX
auth_script_init
idmap_ad_init
idmap_autorid_init
idmap_hash_init
idmap_rfc2307_init
idmap_rid_init
idmap_tdb2_init
offset_t
static_decl_auth
static_decl_charset
static_decl_gpext
static_decl_idmap
static_decl_nss_info
static_decl_pdb
static_decl_perfcount
static_decl_vfs
static_init_auth
static_init_charset
static_init_gpext
static_init_idmap
static_init_nss_info
static_init_pdb
static_init_perfcount
static_init_vfs
uint_t
vfs_acl_tdb_init
vfs_acl_xattr_init
vfs_aio_fork_init
vfs_aio_linux_init
vfs_aio_posix_init
vfs_aio_pthread_init
vfs_audit_init
vfs_btrfs_init
vfs_cap_init
vfs_catia_init
vfs_commit_init
vfs_crossrename_init
vfs_default_quota_init
vfs_dirsort_init
vfs_expand_msdfs_init
vfs_extd_audit_init
vfs_fake_perms_init
vfs_fileid_init
vfs_full_audit_init
vfs_linux_xfs_sgid_init
vfs_media_harmony_init
vfs_netatalk_init
vfs_notify_fam_init
vfs_posix_eadb_init
vfs_preopen_init
vfs_readahead_init
vfs_readonly_init
vfs_recycle_init
vfs_scannedonly_init
vfs_shadow_copy2_init
vfs_shadow_copy_init
vfs_smb_traffic_analyzer_init
vfs_streams_depot_init
vfs_streams_xattr_init
vfs_syncops_init
vfs_time_audit_init
vfs_xattr_tdb_init

Type sizes:
sizeof(char): 1
sizeof(int): 4
sizeof(long): 8
sizeof(long long): 8
sizeof(uint8): 1
sizeof(uint16): 2
sizeof(uint32): 4
sizeof(short): 2
sizeof(void*): 8
sizeof(size_t): 8
sizeof(off_t): 8
sizeof(ino_t): 8
sizeof(dev_t): 8

Builtin modules:
vfs_posixacl pdb_smbpasswd pdb_tdbsam pdb_wbc_sam auth_sam auth_unix
auth_winbind auth_wbc auth_domain auth_builtin vfs_default
nss_info_template idmap_tdb idmap_passdb idmap_nss pdb_samba_dsdb
auth_samba4 vfs_dfs_samba4 pdb_ldapsam idmap_ldap
Post by Richard Sharpe
Post by Rowland Penny
Post by Richard Sharpe
Dear All,
What fail...
Both CTDB will start non-stop recovery...
When there is only one node, it is still working
but not on both node...
'managed to lock reclock file from inside daemon'
'managed to lock reclock file from inside daemon'
2014/11/26 02:18:26.363292 [recoverd: 9883]: Async operation failed
with ret=-1 res=-1 opcode=16
2014/11/26 02:18:26.363315 [recoverd: 9883]: Async wait failed -
fail_count=1
server/ctdb_recoverd.c:393 Unable to set recovery mode. Recovery
failed.
Something is going wrong with locking.
Do you think it could have anything to do with posix file locking ??
samba-v.x.y.z/ctdb/server/ctdb_recoverd.c
/* read the childs status when trying to lock the reclock file.
child wrote 0 if everything is fine and 1 if it did manage
to lock the file, which would be a problem since that means
we got a request to exit from recovery but we could still lock
the file which at this time SHOULD be locked by the recovery
daemon on the recmaster
*/
ret = sys_read(state->fd[0], &c, 1);
if (ret != 1 || c != 0) {
ctdb_request_control_reply(state->ctdb, state->c,
NULL, -1, "managed to lock reclock file from inside daemon");
talloc_free(state);
return;
}
What version of Samba is Chan Min Wai running? There are some missing
log messages that are in Master but not in the log above, so I suspect
he/she is running a different version to the code I currently have
available.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Richard Sharpe
2014-11-26 20:31:15 UTC
Permalink
Dear Richard,
Added back original recipients ...
The CTDB version is 2.5.4.
The latest.
I wrote this
http://wiki.gentoo.org/wiki/Samba_Cluster
You can still use the version on poly-c temporary portage.
We are still looking for a better way to start and stop the process.
OK, this appears to be more complex than I first thought.

1. At least one recovery has occurred,

2. During a second, or subsequent recovery, attempt, it appears that
the posix lock on the reclock file was lost.

Was there anything in the OCFS2 logs indicating that a daemon died
around 02:18:25?
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Richard Sharpe
2014-11-27 00:08:46 UTC
Permalink
On Wed, Nov 26, 2014 at 12:31 PM, Richard Sharpe
Post by Richard Sharpe
Dear Richard,
Added back original recipients ...
The CTDB version is 2.5.4.
The latest.
I wrote this
http://wiki.gentoo.org/wiki/Samba_Cluster
You can still use the version on poly-c temporary portage.
We are still looking for a better way to start and stop the process.
OK, this appears to be more complex than I first thought.
1. At least one recovery has occurred,
2. During a second, or subsequent recovery, attempt, it appears that
the posix lock on the reclock file was lost.
Was there anything in the OCFS2 logs indicating that a daemon died
around 02:18:25?
Interestingly, it seems that OCFS2 and GFS use the same
cluster-enabled DLM. I am going to see if I can bring up an OCFS-2
cluster over the Thanksgiving break to see if I can figure this
problem out.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Min Wai Chan
2014-11-27 17:17:50 UTC
Permalink
Deal Richard,

I've try it again.

There is no OCFS error message on kernel...

and the files directory seem to be working fine...
Post by Richard Sharpe
On Wed, Nov 26, 2014 at 12:31 PM, Richard Sharpe
Post by Richard Sharpe
Dear Richard,
Added back original recipients ...
The CTDB version is 2.5.4.
The latest.
I wrote this
http://wiki.gentoo.org/wiki/Samba_Cluster
You can still use the version on poly-c temporary portage.
We are still looking for a better way to start and stop the process.
OK, this appears to be more complex than I first thought.
1. At least one recovery has occurred,
2. During a second, or subsequent recovery, attempt, it appears that
the posix lock on the reclock file was lost.
Was there anything in the OCFS2 logs indicating that a daemon died
around 02:18:25?
Interestingly, it seems that OCFS2 and GFS use the same
cluster-enabled DLM. I am going to see if I can bring up an OCFS-2
cluster over the Thanksgiving break to see if I can figure this
problem out.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Richard Sharpe
2014-11-27 17:58:32 UTC
Permalink
Post by Min Wai Chan
Deal Richard,
I've try it again.
There is no OCFS error message on kernel...
and the files directory seem to be working fine...
OK.

I have a pair of CentOS 6.6 VMs installed with OCFS2 and the Oracle
kernel and the dlm-pcmk component installed.

Now I just have to fight with the cluster config to get it all working
and then build and install Samba 4.1.12 and CTDB 2.5.4.
Post by Min Wai Chan
On Thu, Nov 27, 2014 at 8:08 AM, Richard Sharpe
Post by Richard Sharpe
On Wed, Nov 26, 2014 at 12:31 PM, Richard Sharpe
Post by Richard Sharpe
Dear Richard,
Added back original recipients ...
The CTDB version is 2.5.4.
The latest.
I wrote this
http://wiki.gentoo.org/wiki/Samba_Cluster
You can still use the version on poly-c temporary portage.
We are still looking for a better way to start and stop the process.
OK, this appears to be more complex than I first thought.
1. At least one recovery has occurred,
2. During a second, or subsequent recovery, attempt, it appears that
the posix lock on the reclock file was lost.
Was there anything in the OCFS2 logs indicating that a daemon died
around 02:18:25?
Interestingly, it seems that OCFS2 and GFS use the same
cluster-enabled DLM. I am going to see if I can bring up an OCFS-2
cluster over the Thanksgiving break to see if I can figure this
problem out.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-11-27 18:27:37 UTC
Permalink
Post by Richard Sharpe
I have a pair of CentOS 6.6 VMs installed with OCFS2 and the Oracle
kernel and the dlm-pcmk component installed.
Now I just have to fight with the cluster config to get it all working
and then build and install Samba 4.1.12 and CTDB 2.5.4.
Phew. Good job you've got that documentation sorted. Best of luck from
all of us here in Alicante:)
steve
2014-12-01 21:14:02 UTC
Permalink
CTDB_SOCKET=/var/lib/ctdb/ctdb.socke
We have:
ctdb.socket
Min Wai Chan
2014-12-02 02:56:35 UTC
Permalink
Dear Steve,

I think that is a typo...

Will correct that :)
CTDB_SOCKET=/var/lib/ctdb/ctdb.socke
ctdb.socket
Richard Sharpe
2014-12-02 20:45:47 UTC
Permalink
On Thu, Nov 27, 2014 at 9:58 AM, Richard Sharpe
Post by Richard Sharpe
Post by Min Wai Chan
Deal Richard,
I've try it again.
There is no OCFS error message on kernel...
and the files directory seem to be working fine...
OK.
I have a pair of CentOS 6.6 VMs installed with OCFS2 and the Oracle
kernel and the dlm-pcmk component installed.
Now I just have to fight with the cluster config to get it all working
and then build and install Samba 4.1.12 and CTDB 2.5.4.
Just an update on this. After an almost Herculean struggle (somewhat
like the Augean Stables) with OCFS2 and CentOS 6.6 and CMAN/PACEMAKER
I now have a 2-node cluster set up and sharing a file system via
VirtualBox.

The problem was that the latest versions of ocfs2-tools does not ship
ocfs2_controld.cman but if you build from sources you get one of
those, even though the build actually fails later on. And,
importantly, things work, at least so far.

Now to build ctdb and Samba.

However, I suspect people's problem is not having the correct setup.
We will see.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Richard Sharpe
2014-12-03 17:26:54 UTC
Permalink
On Tue, Dec 2, 2014 at 12:45 PM, Richard Sharpe
Post by Richard Sharpe
On Thu, Nov 27, 2014 at 9:58 AM, Richard Sharpe
Post by Richard Sharpe
Post by Min Wai Chan
Deal Richard,
I've try it again.
There is no OCFS error message on kernel...
and the files directory seem to be working fine...
OK.
I have a pair of CentOS 6.6 VMs installed with OCFS2 and the Oracle
kernel and the dlm-pcmk component installed.
Now I just have to fight with the cluster config to get it all working
and then build and install Samba 4.1.12 and CTDB 2.5.4.
Just an update on this. After an almost Herculean struggle (somewhat
like the Augean Stables) with OCFS2 and CentOS 6.6 and CMAN/PACEMAKER
I now have a 2-node cluster set up and sharing a file system via
VirtualBox.
The problem was that the latest versions of ocfs2-tools does not ship
ocfs2_controld.cman but if you build from sources you get one of
those, even though the build actually fails later on. And,
importantly, things work, at least so far.
Now to build ctdb and Samba.
However, I suspect people's problem is not having the correct setup.
We will see.
In case people haven't seen my other posting about this, I have
succeeded in getting CTDB working with OCFS2 on CentOS 6.6.

There were some extra steps I had to take and the key seems to be to
ensure that the Corosync-based lock manager is working.

However, it works!
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-03 20:46:01 UTC
Permalink
Post by Richard Sharpe
On Tue, Dec 2, 2014 at 12:45 PM, Richard Sharpe
Post by Richard Sharpe
On Thu, Nov 27, 2014 at 9:58 AM, Richard Sharpe
Post by Richard Sharpe
Post by Min Wai Chan
Deal Richard,
I've try it again.
There is no OCFS error message on kernel...
and the files directory seem to be working fine...
OK.
I have a pair of CentOS 6.6 VMs installed with OCFS2 and the Oracle
kernel and the dlm-pcmk component installed.
Now I just have to fight with the cluster config to get it all working
and then build and install Samba 4.1.12 and CTDB 2.5.4.
Just an update on this. After an almost Herculean struggle (somewhat
like the Augean Stables) with OCFS2 and CentOS 6.6 and CMAN/PACEMAKER
I now have a 2-node cluster set up and sharing a file system via
VirtualBox.
The problem was that the latest versions of ocfs2-tools does not ship
ocfs2_controld.cman but if you build from sources you get one of
those, even though the build actually fails later on. And,
importantly, things work, at least so far.
Now to build ctdb and Samba.
However, I suspect people's problem is not having the correct setup.
We will see.
In case people haven't seen my other posting about this, I have
succeeded in getting CTDB working with OCFS2 on CentOS 6.6.
There were some extra steps I had to take and the key seems to be to
ensure that the Corosync-based lock manager is working.
However, it works!
It also works with drbd taking care of the lock. Does it work with ctdb
using a lockfile on shared storage?
Cheers and well done. Easy isn't it?
Steve
Richard Sharpe
2014-12-03 20:54:50 UTC
Permalink
Post by steve
Post by Richard Sharpe
On Tue, Dec 2, 2014 at 12:45 PM, Richard Sharpe
Post by Richard Sharpe
On Thu, Nov 27, 2014 at 9:58 AM, Richard Sharpe
Post by Richard Sharpe
Post by Min Wai Chan
Deal Richard,
I've try it again.
There is no OCFS error message on kernel...
and the files directory seem to be working fine...
OK.
I have a pair of CentOS 6.6 VMs installed with OCFS2 and the Oracle
kernel and the dlm-pcmk component installed.
Now I just have to fight with the cluster config to get it all working
and then build and install Samba 4.1.12 and CTDB 2.5.4.
Just an update on this. After an almost Herculean struggle (somewhat
like the Augean Stables) with OCFS2 and CentOS 6.6 and CMAN/PACEMAKER
I now have a 2-node cluster set up and sharing a file system via
VirtualBox.
The problem was that the latest versions of ocfs2-tools does not ship
ocfs2_controld.cman but if you build from sources you get one of
those, even though the build actually fails later on. And,
importantly, things work, at least so far.
Now to build ctdb and Samba.
However, I suspect people's problem is not having the correct setup.
We will see.
In case people haven't seen my other posting about this, I have
succeeded in getting CTDB working with OCFS2 on CentOS 6.6.
There were some extra steps I had to take and the key seems to be to
ensure that the Corosync-based lock manager is working.
However, it works!
It also works with drbd taking care of the lock.
What works with drbd? Since a local file system does not store FCNTL
state on disk I cannot imagine that you are talking about CTDB.
Post by steve
Does it work with ctdb
using a lockfile on shared storage?
Well, yes, at least with GPFS and OCFS2 (as I have now proven.) I am
lead to believe that it woks with GFS2 as well.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-03 21:42:22 UTC
Permalink
Post by Richard Sharpe
What works with drbd?
Hi
ctdb. And you can have more than one node up at the same time. Is that
what your setup does too?
Richard Sharpe
2014-12-03 21:46:00 UTC
Permalink
Post by Richard Sharpe
What works with drbd?
Hi
ctdb.
OK, you had better explain your config, because as far as I am aware,
drdb is a block-layer driver, and, as I said, FCNTL lock state is not
saved to the file system.
And you can have more than one node up at the same time. Is that what
your setup does too?
Indeed. Both nodes are up at this very instant and I can see the same
files from both nodes and can use things like net conf list and see
the same registry-based config on both nodes.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-03 22:03:49 UTC
Permalink
Post by Richard Sharpe
Post by Richard Sharpe
What works with drbd?
Hi
ctdb.
OK, you had better explain your config,
You already have ours. It's what the OP used. But he wants only ctdb and
ocfs2. We think that it can't be done. We use drbd. You use
corosomething. Maybe you could explain your config too.
Thanks,
Steve
steve
2014-12-03 22:09:50 UTC
Permalink
Post by Richard Sharpe
And you can have more than one node up at the same time. Is that what
your setup does too?
Indeed. Both nodes are up at this very instant
Is it better to have both nodes up at the same time? It doesn't seem to
make much difference here. Unless we have a big jpg.
Richard Sharpe
2014-12-03 22:22:34 UTC
Permalink
Post by Richard Sharpe
And you can have more than one node up at the same time. Is that what
your setup does too?
Indeed. Both nodes are up at this very instant
Is it better to have both nodes up at the same time? It doesn't seem to make
much difference here. Unless we have a big jpg.
The whole point of CTDB is so that more than one node can be up at the
same time and the load is shared across those multiple nodes.

Do try to keep up.

A big jpg should not make any difference because a single client only
connects to single member of the cluster.

The difference it makes is that a cluster of nodes, all of which are
up at the same time, can handle a larger number of clients than a
single node can and there is some evidence that CTDB scales reasonably
well with respect to throughput as well.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
steve
2014-12-04 06:42:57 UTC
Permalink
Post by Richard Sharpe
Post by Richard Sharpe
And you can have more than one node up at the same time. Is that what
your setup does too?
Indeed. Both nodes are up at this very instant
Is it better to have both nodes up at the same time? It doesn't seem to make
much difference here. Unless we have a big jpg.
The whole point of CTDB is so that more than one node can be up at the
same time and the load is shared across those multiple nodes.
You're almost there. A few more digs at the documentation and you'll
have made it. ctdb provides fail-over. Important for but not HA. As you
describe it.
Post by Richard Sharpe
Do try to keep up.
You too.
ronnie sahlberg
2014-12-04 14:28:21 UTC
Permalink
Seriously.

Richard Sharpe has just invested a lot of his time to try to help you with
OCFS2 and have demonstrated how/what you need to do to get OCFS2 working
with CTDB.
And this is how you reward him?

Your email is very disrespectful.
Post by Richard Sharpe
Post by steve
And you can have more than one node up at the same time. Is that what
Post by steve
your setup does too?
Indeed. Both nodes are up at this very instant
Is it better to have both nodes up at the same time? It doesn't seem to
make
much difference here. Unless we have a big jpg.
The whole point of CTDB is so that more than one node can be up at the
same time and the load is shared across those multiple nodes.
You're almost there. A few more digs at the documentation and you'll have
made it. ctdb provides fail-over. Important for but not HA. As you describe
it.
Post by Richard Sharpe
Do try to keep up.
You too.
Rowland Penny
2014-12-04 14:55:06 UTC
Permalink
Post by ronnie sahlberg
Seriously.
Richard Sharpe has just invested a lot of his time to try to help you with
OCFS2 and have demonstrated how/what you need to do to get OCFS2 working
with CTDB.
And this is how you reward him?
Your email is very disrespectful.
Post by Richard Sharpe
Post by steve
And you can have more than one node up at the same time. Is that what
Post by steve
your setup does too?
Indeed. Both nodes are up at this very instant
Is it better to have both nodes up at the same time? It doesn't seem to
make
much difference here. Unless we have a big jpg.
The whole point of CTDB is so that more than one node can be up at the
same time and the load is shared across those multiple nodes.
You're almost there. A few more digs at the documentation and you'll have
made it. ctdb provides fail-over. Important for but not HA. As you describe
it.
Post by Richard Sharpe
Do try to keep up.
You too.
Steve's replies could have been worded better and Richard has spent a
considerable time showing that OCFS2 will work with CTDB. The only
problem that I can see is, whilst Richard used Centos and the Oracle
kernel, neither of the two people who are having/have had problems with
CTDB use this distro or kernel. If CTDB will only work with Centos and
the Oracle kernel, then in my opinion, it has problems. The only way to
prove this one way or the other, is for Richard to post his setup and
for someone to try and set it up on Debian (other distros are available
;-) ).

Rowland
Richard Sharpe
2014-12-04 15:11:28 UTC
Permalink
Post by Rowland Penny
Post by ronnie sahlberg
Seriously.
Richard Sharpe has just invested a lot of his time to try to help you with
OCFS2 and have demonstrated how/what you need to do to get OCFS2 working
with CTDB.
And this is how you reward him?
Your email is very disrespectful.
Post by Richard Sharpe
Post by steve
Post by steve
And you can have more than one node up at the same time. Is that
what
Post by steve
your setup does too?
Indeed. Both nodes are up at this very instant
Is it better to have both nodes up at the same time? It doesn't seem to
make
much difference here. Unless we have a big jpg.
The whole point of CTDB is so that more than one node can be up at the
same time and the load is shared across those multiple nodes.
You're almost there. A few more digs at the documentation and you'll have
made it. ctdb provides fail-over. Important for but not HA. As you
describe
it.
Post by Richard Sharpe
Do try to keep up.
You too.
Steve's replies could have been worded better and Richard has spent a
considerable time showing that OCFS2 will work with CTDB. The only problem
that I can see is, whilst Richard used Centos and the Oracle kernel, neither
of the two people who are having/have had problems with CTDB use this distro
or kernel. If CTDB will only work with Centos and the Oracle kernel, then in
my opinion, it has problems. The only way to prove this one way or the
other, is for Richard to post his setup and for someone to try and set it up
on Debian (other distros are available ;-) ).
What part of my setup do you need?

As to UEK, that was a convenience to get the prebuilt ofcs2 modules etc.

There are problems, of course, because I had to build the ocfs2-tools
package from scratch and ignore the failure ...

If I find the time soon, I will perhaps post a step-by-step or even
try to eliminate the UEK step, however, often all that is needed is a
demonstration that something is indeed possible.

Also, I am more kindly disposed towards some people than others ...
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Rowland Penny
2014-12-04 15:25:55 UTC
Permalink
Post by Richard Sharpe
Post by Rowland Penny
Post by ronnie sahlberg
Seriously.
Richard Sharpe has just invested a lot of his time to try to help you with
OCFS2 and have demonstrated how/what you need to do to get OCFS2 working
with CTDB.
And this is how you reward him?
Your email is very disrespectful.
Post by Richard Sharpe
Post by steve
Post by steve
And you can have more than one node up at the same time. Is that
what
Post by steve
your setup does too?
Indeed. Both nodes are up at this very instant
Is it better to have both nodes up at the same time? It doesn't seem to
make
much difference here. Unless we have a big jpg.
The whole point of CTDB is so that more than one node can be up at the
same time and the load is shared across those multiple nodes.
You're almost there. A few more digs at the documentation and you'll have
made it. ctdb provides fail-over. Important for but not HA. As you
describe
it.
Post by Richard Sharpe
Do try to keep up.
You too.
Steve's replies could have been worded better and Richard has spent a
considerable time showing that OCFS2 will work with CTDB. The only problem
that I can see is, whilst Richard used Centos and the Oracle kernel, neither
of the two people who are having/have had problems with CTDB use this distro
or kernel. If CTDB will only work with Centos and the Oracle kernel, then in
my opinion, it has problems. The only way to prove this one way or the
other, is for Richard to post his setup and for someone to try and set it up
on Debian (other distros are available ;-) ).
What part of my setup do you need?
As to UEK, that was a convenience to get the prebuilt ofcs2 modules etc.
There are problems, of course, because I had to build the ocfs2-tools
package from scratch and ignore the failure ...
If I find the time soon, I will perhaps post a step-by-step or even
try to eliminate the UEK step, however, often all that is needed is a
demonstration that something is indeed possible.
Also, I am more kindly disposed towards some people than others ...
I am quite prepared to try to set this up in VM's but as I have never
even attempted to use CTDB, I do not know where to start (whether this
is an advantage or disadvantage, I am not sure).

If you could just post a small 'this is how I did it on Centos' howto, I
will then attempt it on Debian. One thing I have found out, is that
OCFS2 is supposed to be built into the Linux kernel.

Rowland
Scott Lovenberg
2014-12-04 15:27:06 UTC
Permalink
Post by Rowland Penny
Post by ronnie sahlberg
Seriously.
Richard Sharpe has just invested a lot of his time to try to help you with
OCFS2 and have demonstrated how/what you need to do to get OCFS2 working
with CTDB.
And this is how you reward him?
Your email is very disrespectful.
Post by steve
Post by Richard Sharpe
Post by steve
And you can have more than one node up at the same time. Is that
Post by Richard Sharpe
what
Post by steve
your setup does too?
Indeed. Both nodes are up at this very instant
Is it better to have both nodes up at the same time? It doesn't seem to
make
much difference here. Unless we have a big jpg.
The whole point of CTDB is so that more than one node can be up at the
same time and the load is shared across those multiple nodes.
You're almost there. A few more digs at the documentation and you'll
have
made it. ctdb provides fail-over. Important for but not HA. As you
describe
it.
Do try to keep up.
Post by Richard Sharpe
You too.
Steve's replies could have been worded better and Richard has spent a
considerable time showing that OCFS2 will work with CTDB. The only problem
that I can see is, whilst Richard used Centos and the Oracle kernel,
neither of the two people who are having/have had problems with CTDB use
this distro or kernel. If CTDB will only work with Centos and the Oracle
kernel, then in my opinion, it has problems. The only way to prove this one
way or the other, is for Richard to post his setup and for someone to try
and set it up on Debian (other distros are available ;-) ).
Rowland
I'll skip past the manners and etiquette discussion for fear of turning
this into a bike shed thread; for the most part we're all usually pretty
polite here. Having just moved to the Twin Cities (read: southern Canada),
I now suspect we have a lot of Canadian developers. :) I can point to a
few other open source communities for comparison if anyone thinks we're
uncivil at times. Having gotten that out of the way...

I think Richard's test probably more models what you'd normally see in this
kind of setup. RHEL or SLED are most likely the majority of Linux servers
distributions used in a company where you'd be setting up clustering. That
being said, my experience with Red Hat's cluster suite has taught me that
it's a _very_ difficult environment to get setup correctly even with Red
Hat's documentation and plenty of forum help. Especially if you try to be
clever and customize your environment at all. Red Hat also has so many
upsteam, downstream and backported patches mixed together in any package
release that those packages might as well be considered another product all
together. Nothing they release is even close to the original upstream
'vanilla' release.

However, having a clean working virtual machine base build as a starting
point for people trying to get this up and running on other distributions
would probably be very helpful for them and lower the mailing list volume.
--
Peace and Blessings,
-Scott.
Jeremy Allison
2014-12-04 17:13:48 UTC
Permalink
Post by ronnie sahlberg
Seriously.
Richard Sharpe has just invested a lot of his time to try to help you with
OCFS2 and have demonstrated how/what you need to do to get OCFS2 working
with CTDB.
And this is how you reward him?
Your email is very disrespectful.
Indeed. steve, you stepped over the line with
this email.

We now have to decide if your continued participation
on this list is worth the abuse you are giving
to people trying to help.

One thing to note, as there are many people
using the 'steve' alias to contribute to
this list, one bad apple in your collective
can spoil things for everyone on the steve
alias.

You seriously might now want to think about
splitting your alias into individual people,
most of whom I'm sure are friendly and collaborative
individuals.

But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.

The Samba lists need to be open to all without
fear of abuse.

Jeremy.
Richard Sharpe
2014-12-04 18:16:41 UTC
Permalink
I am quite prepared to try to set this up in VM's but as I have never even
attempted to use CTDB, I do not know where to start (whether this is an
advantage or disadvantage, I am not sure).
If you could just post a small 'this is how I did it on Centos' howto, I
will then attempt it on Debian. One thing I have found out, is that OCFS2 is
supposed to be built into the Linux kernel.
I have posted a lengthy email that details pretty much what I did.

I have limited time to help with this but may be able to give hints if
you get into trouble.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Rowland Penny
2014-12-04 18:21:54 UTC
Permalink
Post by Richard Sharpe
I am quite prepared to try to set this up in VM's but as I have never even
attempted to use CTDB, I do not know where to start (whether this is an
advantage or disadvantage, I am not sure).
If you could just post a small 'this is how I did it on Centos' howto, I
will then attempt it on Debian. One thing I have found out, is that OCFS2 is
supposed to be built into the Linux kernel.
I have posted a lengthy email that details pretty much what I did.
I have limited time to help with this but may be able to give hints if
you get into trouble.
OK, seen it, thanks. I will give it a go tomorrow :-)

Rowland
steve
2014-12-04 18:23:47 UTC
Permalink
Post by Jeremy Allison
Post by ronnie sahlberg
Seriously.
Richard Sharpe has just invested a lot of his time to try to help you with
OCFS2 and have demonstrated how/what you need to do to get OCFS2 working
with CTDB.
And this is how you reward him?
Your email is very disrespectful.
Indeed. steve, you stepped over the line with
this email.
We now have to decide if your continued participation
on this list is worth the abuse you are giving
to people trying to help.
One thing to note, as there are many people
using the 'steve' alias to contribute to
this list, one bad apple in your collective
can spoil things for everyone on the steve
alias.
You seriously might now want to think about
splitting your alias into individual people,
most of whom I'm sure are friendly and collaborative
individuals.
As it would involve censoring many active and intelligent teenagers,
most of whom think we're passed our sell by date, I'd rather not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
Post by Jeremy Allison
The Samba lists need to be open to all without
fear of abuse.
Jeremy.
Jeremy Allison
2014-12-04 18:59:10 UTC
Permalink
Post by steve
As it would involve censoring many active and intelligent teenagers,
most of whom think we're passed our sell by date, I'd rather not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
That is a great incentive to ban 'steve', you
realize ? We'd much rather have individuals
on a list than a collective.
steve
2014-12-04 19:22:10 UTC
Permalink
Post by Jeremy Allison
Post by steve
As it would involve censoring many active and intelligent teenagers,
most of whom think we're passed our sell by date, I'd rather not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
That is a great incentive to ban 'steve', you
realize ? We'd much rather have individuals
on a list than a collective.
No idea. Is it your decision? Are you in charge of the lists? I wouldn't
want many of our individuals on any list. We offer painstaking help and
advice on the users list and have solved many a problem thereon. Put us
on your personal ignore list, and ban us from samba-technical. It's the
developers that the kids have no time for.
Rowland Penny
2014-12-04 19:42:21 UTC
Permalink
Post by steve
Post by Jeremy Allison
Post by steve
As it would involve censoring many active and intelligent teenagers,
most of whom think we're passed our sell by date, I'd rather not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
That is a great incentive to ban 'steve', you
realize ? We'd much rather have individuals
on a list than a collective.
No idea. Is it your decision? Are you in charge of the lists? I
wouldn't want many of our individuals on any list. We offer
painstaking help and advice on the users list and have solved many a
problem thereon. Put us on your personal ignore list, and ban us from
samba-technical. It's the developers that the kids have no time for.
Good grief Steve, are you going out of your way to upset Jeremy ?? As
for Jeremy having the power to ban you, probably not by himself, but I
wouldn't lay money on it, if he was to suggest banning you, I am very
sure the rest of the devs would support him.

I suggest that you apologise (even if you think you shouldn't) and stop
allowing everybody and anybody to post as 'Steve'. It gets very
confusing when 'Steve' posts and it turns out not to actually be
'Steve'. If you do get banned, they will probably ban you from the samba
list as well and it wouldn't be the same without you.

Rowland
Volker Lendecke
2014-12-04 19:52:01 UTC
Permalink
stop allowing everybody and anybody to post as 'Steve'. It gets very
confusing when 'Steve' posts and it turns out not to actually be
'Steve'. If you do get banned, they will probably ban you from the
samba list as well and it wouldn't be the same without you.
The Samba Team maintains the lists. Even if Jeremy does not have root
on the listserver himself, others do.

Jeremy is a key member of the Samba Team, and we have discussed this
before. Jeremy's mail did not come out of the blue for the other Samba
Team people. If we decide to ban ***@steve-ss.com, it will be on
***@samba.org and on samba-***@samba.org.

Volker
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:***@sernet.de
steve
2014-12-04 20:21:06 UTC
Permalink
Post by Rowland Penny
Post by steve
Post by Jeremy Allison
Post by steve
As it would involve censoring many active and intelligent teenagers,
most of whom think we're passed our sell by date, I'd rather not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
That is a great incentive to ban 'steve', you
realize ? We'd much rather have individuals
on a list than a collective.
No idea. Is it your decision? Are you in charge of the lists? I
wouldn't want many of our individuals on any list. We offer
painstaking help and advice on the users list and have solved many a
problem thereon. Put us on your personal ignore list, and ban us from
samba-technical. It's the developers that the kids have no time for.
Good grief Steve, are you going out of your way to upset Jeremy ?? As
for Jeremy having the power to ban you, probably not by himself, but I
wouldn't lay money on it, if he was to suggest banning you, I am very
sure the rest of the devs would support him.
I suggest that you apologise (even if you think you shouldn't) and stop
allowing everybody and anybody to post as 'Steve'. It gets very
confusing when 'Steve' posts and it turns out not to actually be
'Steve'. If you do get banned, they will probably ban you from the samba
list as well and it wouldn't be the same without you.
Rowland
I can't apologise, much though I am ashamed of what has happened. As
translator, it is a difficult situation to be in. I am not allowed to
join myself and can only do what I am ordered to do as none of the
domain is mine. I would like to continue to contribute if possible.
Volker Lendecke
2014-12-04 20:29:50 UTC
Permalink
Post by steve
I can't apologise, much though I am ashamed of what has happened. As
translator, it is a difficult situation to be in. I am not allowed
to join myself and can only do what I am ordered to do as none of
the domain is mine. I would like to continue to contribute if
possible.
You don't have an email address of you own? Just register
with a free email provider and you're in. Or is this
prohibited where you are?

Volker
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:***@sernet.de
Rowland Penny
2014-12-04 20:38:49 UTC
Permalink
Post by steve
Post by Rowland Penny
Post by steve
Post by Jeremy Allison
Post by steve
As it would involve censoring many active and intelligent teenagers,
most of whom think we're passed our sell by date, I'd rather not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
That is a great incentive to ban 'steve', you
realize ? We'd much rather have individuals
on a list than a collective.
No idea. Is it your decision? Are you in charge of the lists? I
wouldn't want many of our individuals on any list. We offer
painstaking help and advice on the users list and have solved many a
problem thereon. Put us on your personal ignore list, and ban us from
samba-technical. It's the developers that the kids have no time for.
Good grief Steve, are you going out of your way to upset Jeremy ?? As
for Jeremy having the power to ban you, probably not by himself, but I
wouldn't lay money on it, if he was to suggest banning you, I am very
sure the rest of the devs would support him.
I suggest that you apologise (even if you think you shouldn't) and stop
allowing everybody and anybody to post as 'Steve'. It gets very
confusing when 'Steve' posts and it turns out not to actually be
'Steve'. If you do get banned, they will probably ban you from the samba
list as well and it wouldn't be the same without you.
Rowland
I can't apologise, much though I am ashamed of what has happened. As
translator, it is a difficult situation to be in. I am not allowed to
join myself and can only do what I am ordered to do as none of the
domain is mine. I would like to continue to contribute if possible.
What!! are you a slave ?? just tell whoever is giving you orders, that
what has been going on, must stop, register your own private email
address and use that.

Rowland
steve
2014-12-04 20:51:58 UTC
Permalink
Post by Rowland Penny
Post by steve
Post by Rowland Penny
Post by steve
Post by Jeremy Allison
Post by steve
As it would involve censoring many active and intelligent teenagers,
most of whom think we're passed our sell by date, I'd rather not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
That is a great incentive to ban 'steve', you
realize ? We'd much rather have individuals
on a list than a collective.
No idea. Is it your decision? Are you in charge of the lists? I
wouldn't want many of our individuals on any list. We offer
painstaking help and advice on the users list and have solved many a
problem thereon. Put us on your personal ignore list, and ban us from
samba-technical. It's the developers that the kids have no time for.
Good grief Steve, are you going out of your way to upset Jeremy ?? As
for Jeremy having the power to ban you, probably not by himself, but I
wouldn't lay money on it, if he was to suggest banning you, I am very
sure the rest of the devs would support him.
I suggest that you apologise (even if you think you shouldn't) and stop
allowing everybody and anybody to post as 'Steve'. It gets very
confusing when 'Steve' posts and it turns out not to actually be
'Steve'. If you do get banned, they will probably ban you from the samba
list as well and it wouldn't be the same without you.
Rowland
I can't apologise, much though I am ashamed of what has happened. As
translator, it is a difficult situation to be in. I am not allowed to
join myself and can only do what I am ordered to do as none of the
domain is mine. I would like to continue to contribute if possible.
What!! are you a slave ?? just tell whoever is giving you orders, that
what has been going on, must stop, register your own private email
address and use that.
Rowland
You can't use private e-mail addresses at places of work. How would I
paste log.smbd from a 'phone or test anything? Thanks for your support
and advice anyway.
Rowland Penny
2014-12-04 21:06:16 UTC
Permalink
Post by steve
Post by Rowland Penny
Post by steve
Post by Rowland Penny
Post by steve
Post by Jeremy Allison
Post by steve
As it would involve censoring many active and intelligent
teenagers,
most of whom think we're passed our sell by date, I'd rather
not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
That is a great incentive to ban 'steve', you
realize ? We'd much rather have individuals
on a list than a collective.
No idea. Is it your decision? Are you in charge of the lists? I
wouldn't want many of our individuals on any list. We offer
painstaking help and advice on the users list and have solved many a
problem thereon. Put us on your personal ignore list, and ban us from
samba-technical. It's the developers that the kids have no time for.
Good grief Steve, are you going out of your way to upset Jeremy ?? As
for Jeremy having the power to ban you, probably not by himself, but I
wouldn't lay money on it, if he was to suggest banning you, I am very
sure the rest of the devs would support him.
I suggest that you apologise (even if you think you shouldn't) and
stop
allowing everybody and anybody to post as 'Steve'. It gets very
confusing when 'Steve' posts and it turns out not to actually be
'Steve'. If you do get banned, they will probably ban you from the
samba
list as well and it wouldn't be the same without you.
Rowland
I can't apologise, much though I am ashamed of what has happened. As
translator, it is a difficult situation to be in. I am not allowed to
join myself and can only do what I am ordered to do as none of the
domain is mine. I would like to continue to contribute if possible.
What!! are you a slave ?? just tell whoever is giving you orders, that
what has been going on, must stop, register your own private email
address and use that.
Rowland
You can't use private e-mail addresses at places of work. How would I
paste log.smbd from a 'phone or test anything? Thanks for your support
and advice anyway.
Why not, I did.

Rowland
steve
2014-12-04 21:34:35 UTC
Permalink
Post by Rowland Penny
Post by steve
Post by Rowland Penny
Post by steve
Post by Rowland Penny
Post by steve
Post by Jeremy Allison
Post by steve
As it would involve censoring many active and intelligent
teenagers,
most of whom think we're passed our sell by date, I'd rather
not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
That is a great incentive to ban 'steve', you
realize ? We'd much rather have individuals
on a list than a collective.
No idea. Is it your decision? Are you in charge of the lists? I
wouldn't want many of our individuals on any list. We offer
painstaking help and advice on the users list and have solved many a
problem thereon. Put us on your personal ignore list, and ban us from
samba-technical. It's the developers that the kids have no time for.
Good grief Steve, are you going out of your way to upset Jeremy ?? As
for Jeremy having the power to ban you, probably not by himself, but I
wouldn't lay money on it, if he was to suggest banning you, I am very
sure the rest of the devs would support him.
I suggest that you apologise (even if you think you shouldn't) and
stop
allowing everybody and anybody to post as 'Steve'. It gets very
confusing when 'Steve' posts and it turns out not to actually be
'Steve'. If you do get banned, they will probably ban you from the
samba
list as well and it wouldn't be the same without you.
Rowland
I can't apologise, much though I am ashamed of what has happened. As
translator, it is a difficult situation to be in. I am not allowed to
join myself and can only do what I am ordered to do as none of the
domain is mine. I would like to continue to contribute if possible.
What!! are you a slave ?? just tell whoever is giving you orders, that
what has been going on, must stop, register your own private email
address and use that.
Rowland
You can't use private e-mail addresses at places of work. How would I
paste log.smbd from a 'phone or test anything? Thanks for your support
and advice anyway.
Why not, I did.
Rowland
Nothing is hidden here. I suppose that when we are banned we can still
get the list archive. Just not contribute. We'll be going google or
dropbox or something soon anyway. I suppose that that represents some
sort of progress. A weekend off? Home before midnight? Only joking...
Andrew Bartlett
2014-12-05 10:17:37 UTC
Permalink
Post by steve
Post by Jeremy Allison
Post by steve
As it would involve censoring many active and intelligent teenagers,
most of whom think we're passed our sell by date, I'd rather not. It
shouldn't be too long before we have fibre and go cloud anyway.
Post by Jeremy Allison
But right now if we see one more email like
this we will ban the 'steve' alias from our
lists.
If you do, they'll simply join themselves.
That is a great incentive to ban 'steve', you
realize ? We'd much rather have individuals
on a list than a collective.
No idea. Is it your decision? Are you in charge of the lists?
Yes, Jeremy acts with the authority of the Samba Team in this matter,
and the Samba Team is in charge of these mailing lists.

We (the Samba Team, and many others in the Samba community no doubt) are
sick and tired of the disruption.

Andrew Bartlett
--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba
Martin Schwenke
2014-12-06 04:55:28 UTC
Permalink
On Thu, 04 Dec 2014 14:55:06 +0000, Rowland Penny
Post by Rowland Penny
Steve's replies could have been worded better and Richard has spent a
considerable time showing that OCFS2 will work with CTDB. The only
problem that I can see is, whilst Richard used Centos and the Oracle
kernel, neither of the two people who are having/have had problems with
CTDB use this distro or kernel. If CTDB will only work with Centos and
the Oracle kernel, then in my opinion, it has problems. The only way to
prove this one way or the other, is for Richard to post his setup and
for someone to try and set it up on Debian (other distros are available
;-) ).
We try to make CTDB as filesystem and distribution agnostic as we can.
Distribution-wise there simply aren't any hard dependencies, though
there may be minor bugs. The daemon has been tested on Linux, FreeBSD
and AIX. Unfortunately, the eventscripts contain a lot of
Linux-specific code so they're not as portable.

As various people have mentioned, the default filesystem requirement is fcntl(2)
locking support to support the CTDB recovery lock. There's an assumption
that if the ping_pong test succeeds then CTDB's recovery lock will
work. If that's not true then we need to create a new test. However,
I don't believe anyone has conclusively shown ping_pong not to be a
reliable test.

We do a lot of testing on RHEL because that has been the platform that
quite a few developers have been paid to work on.

However, my laptop runs Debian and I semi-regularly run a subset of the
CTDB test suite on my laptop. This subset only tests a subset of
functionality though, not including anything to do with a cluster
filesystem. So I haven't tried to test anything to do with OCFS2 on
Debian...

To support testing on RHEL we have autocluster
(git://git.samba.org/autocluster.git), which Tridge started back in
2008 and I have been hacking on since. It generates
virtual clusters of RHEL nodes running clustered Samba. I
have previously (though not recently) tested it with
CentOS. Unfortunately we've never got around to creating a web site or
writing extensive documentation. Earlier this year I did quite a lot of
modularisation work on autocluster. A couple of weeks ago, when this
thread started, Amitay and I spent a couple of hours trying to add
OCFS2 support to autocluster. It was quite trivial to add something
that we think will work but we haven't been able to test it because we
simply couldn't make OCFS2 work with RHEL6. With Richard's
instructions, posted elsewhere, we'll try to finish this effort when we
get time. Perhaps that will then encourage others to provide scripts to
support other cluster filesystems.

Can autocluster work with other (e.g. Debian-based) distros? Sorry, not
really. There are a lot of assumptions about kickstart, network
configuration and location of configuration files, along with copious
use of yum. :-(

However, the modularisation work I did should allow much of the
post-boot package installation and configuration to be replaced by
something like Chef (from OpenStack). Someone would need a
non-trivial slab of time to make that change. When we've done that it
should be a little easier for autocluster to be distribution
agnostic... and I'm guessing that the part that people are really
interested in is the post-boot configuration, since that doesn't
depend on the cluster being virtual.

It all takes time...

peace & happiness,
martin
Scott Lovenberg
2014-12-06 05:57:08 UTC
Permalink
Post by Martin Schwenke
On Thu, 04 Dec 2014 14:55:06 +0000, Rowland Penny
Post by Rowland Penny
Steve's replies could have been worded better and Richard has spent a
considerable time showing that OCFS2 will work with CTDB. The only
problem that I can see is, whilst Richard used Centos and the Oracle
kernel, neither of the two people who are having/have had problems with
CTDB use this distro or kernel. If CTDB will only work with Centos and
the Oracle kernel, then in my opinion, it has problems. The only way to
prove this one way or the other, is for Richard to post his setup and
for someone to try and set it up on Debian (other distros are available
;-) ).
We try to make CTDB as filesystem and distribution agnostic as we can.
Distribution-wise there simply aren't any hard dependencies, though
there may be minor bugs. The daemon has been tested on Linux, FreeBSD
and AIX. Unfortunately, the eventscripts contain a lot of
Linux-specific code so they're not as portable.
As various people have mentioned, the default filesystem requirement is fcntl(2)
locking support to support the CTDB recovery lock. There's an assumption
that if the ping_pong test succeeds then CTDB's recovery lock will
work. If that's not true then we need to create a new test. However,
I don't believe anyone has conclusively shown ping_pong not to be a
reliable test.
To support testing on RHEL we have autocluster
(git://git.samba.org/autocluster.git), which Tridge started back in
2008 and I have been hacking on since. It generates
virtual clusters of RHEL nodes running clustered Samba. I
have previously (though not recently) tested it with
CentOS. Unfortunately we've never got around to creating a web site or
writing extensive documentation. Earlier this year I did quite a lot of
modularisation work on autocluster. A couple of weeks ago, when this
thread started, Amitay and I spent a couple of hours trying to add
OCFS2 support to autocluster. It was quite trivial to add something
that we think will work but we haven't been able to test it because we
simply couldn't make OCFS2 work with RHEL6. With Richard's
instructions, posted elsewhere, we'll try to finish this effort when we
get time. Perhaps that will then encourage others to provide scripts to
support other cluster filesystems.
Can autocluster work with other (e.g. Debian-based) distros? Sorry, not
really. There are a lot of assumptions about kickstart, network
configuration and location of configuration files, along with copious
use of yum. :-(
However, the modularisation work I did should allow much of the
post-boot package installation and configuration to be replaced by
something like Chef (from OpenStack). Someone would need a
non-trivial slab of time to make that change. When we've done that it
should be a little easier for autocluster to be distribution
agnostic... and I'm guessing that the part that people are really
interested in is the post-boot configuration, since that doesn't
depend on the cluster being virtual.
It all takes time...
peace & happiness,
martin
As luck would have it, at work I wrote a chef cookbook for clustered
MySQL (I'm getting around to fixing some serious problems with it
before I put it on Github), so I'd like to give some advice. RUN!
Don't look back, just run!

Now that I have that out of the way... My cookbook was only really for
RHEL. Chef does provide a few nice abstractions for dealing with
stuff like package management and such, but at some point you're going
to have to cut some Ruby libs if you want to stay sane. If you're
into Ruby this might not be a huge deal, but I stumbled through stuff
like patching cookbooks I was dependent on. You will be relying on
other cookbooks if you use the "wrapper cookbook" method, which I
would recommend.

After fixing glaring bugs in other cookbooks I had really mixed
results upstreaming my patches; some authors/groups were great to deal
with (mostly those who work at companies that fund the development),
while other pull requests will sit and finally there's the "here's a
bug fix for a glaring piece of code that never worked" to which the
response will be "yeah, this isn't really the {chef, ruby, berkshelf,
etc} way to fix that." You'll spin up something that works and they
won't like that either. At this point you resign yourself to the fact
that they aren't going to fix the problem where they reference a
variable that doesn't exist that you need in a template, so now you're
maintaining a fork of another cookbook. Yes, that did happen and
nearly three months later it hasn't been fixed. Yes, the cookbook
will still give you an invalid config for Apache and Apache will never
start on certain enterprise distros that rhyme with FELL.

Now that you've got a working cookbook, there's stateful data that you
have to keep track of and chef doesn't deal with race conditions (for
instance, you want to create a bunch of nodes at once and they need
config info from other nodes that are still bootstrapping chef),
that's your problem. For this I'd recommend to just create a
Zookkeeper server from the start and write a lib for your cookbook to
interface it. You'll try databags and and vaults, but eventually
you'll end up realizing that they're not the correct tool for the job
regardless of how hard you try to pound that square into a round hole.

I truly don't want to discourage anyone from getting this working on
Chef, but you should know ahead of time that there are some hurdles
that you don't even know you'll deal with until you are confronted by
them. I could go into more detail and provide code (I've still got
about a dozen patches for the zookkeeper-bridge library to make it
functional that I haven't upstreamed yet because they're hackish) for
anyone that is curious, but I'm sure I lost almost all of you by this
point in the post :). I spent the better part of three months dealing
with this as a side project at work, so I might be able to save you a
bit of time and hassle. If anyone wants to take this on, feel free to
ping me.
--
Peace and Blessings,
-Scott.
Martin Schwenke
2014-12-08 05:26:49 UTC
Permalink
Post by Martin Schwenke
As various people have mentioned, the default filesystem requirement
is fcntl(2) locking support to support the CTDB recovery lock.
There's an assumption that if the ping_pong test succeeds then CTDB's
recovery lock will work. If that's not true then we need to create a
new test. However, I don't believe anyone has conclusively shown
ping_pong not to be a reliable test.
I rushed through some of this late on Saturday night and was actually
fooled. I thought the ping_pong test had passed and then CTDB failed
in the same way that some people have described on this mailing list.

I have since updated the ping_pong wiki page at:

https://wiki.samba.org/index.php/Ping_pong

by adding some bold and an extra section.

The summary is that you can't race through and simply confirm that the
test prints the correct data_increment value when running with -rw.

For the recovery lock to work you need to run the non -rw version and
actually confirm that *the locking rate drops dramatically*. If it
doesn't then it is *not* working!

peace & happiness,
martin
steve
2014-12-08 11:11:03 UTC
Permalink
Post by Martin Schwenke
Post by Martin Schwenke
As various people have mentioned, the default filesystem requirement
is fcntl(2) locking support to support the CTDB recovery lock.
There's an assumption that if the ping_pong test succeeds then CTDB's
recovery lock will work. If that's not true then we need to create a
new test. However, I don't believe anyone has conclusively shown
ping_pong not to be a reliable test.
I rushed through some of this late on Saturday night and was actually
fooled. I thought the ping_pong test had passed and then CTDB failed
in the same way that some people have described on this mailing list.
https://wiki.samba.org/index.php/Ping_pong
by adding some bold and an extra section.
The summary is that you can't race through and simply confirm that the
test prints the correct data_increment value when running with -rw.
For the recovery lock to work you need to run the non -rw version and
actually confirm that *the locking rate drops dramatically*. If it
doesn't then it is *not* working!
peace & happiness,
martin
Hi
Thanks.
The OP has confirmed that the test works but the lock doesn't.
Martin Schwenke
2014-12-08 11:39:40 UTC
Permalink
Post by Martin Schwenke
On Sat, 6 Dec 2014 15:55:28 +1100, Martin Schwenke
Post by Martin Schwenke
As various people have mentioned, the default filesystem requirement
is fcntl(2) locking support to support the CTDB recovery lock.
There's an assumption that if the ping_pong test succeeds then
CTDB's
Post by Martin Schwenke
recovery lock will work. If that's not true then we need to create
a
Post by Martin Schwenke
new test. However, I don't believe anyone has conclusively shown
ping_pong not to be a reliable test.
I rushed through some of this late on Saturday night and was actually
fooled. I thought the ping_pong test had passed and then CTDB failed
in the same way that some people have described on this mailing list.
https://wiki.samba.org/index.php/Ping_pong
by adding some bold and an extra section.
The summary is that you can't race through and simply confirm that
the
test prints the correct data_increment value when running with -rw.
For the recovery lock to work you need to run the non -rw version and
actually confirm that *the locking rate drops dramatically*. If it
doesn't then it is *not* working!
Thanks.
The OP has confirmed that the test works but the lock doesn't.
Can the OP please describe the steps they took to run the test and the output they saw at each stage?

Thanks...

peace & happiness,
martin
Michael Adam
2014-12-08 16:56:53 UTC
Permalink
Post by Martin Schwenke
Post by Martin Schwenke
As various people have mentioned, the default filesystem requirement
is fcntl(2) locking support to support the CTDB recovery lock.
There's an assumption that if the ping_pong test succeeds then CTDB's
recovery lock will work. If that's not true then we need to create a
new test. However, I don't believe anyone has conclusively shown
ping_pong not to be a reliable test.
I rushed through some of this late on Saturday night and was actually
fooled. I thought the ping_pong test had passed and then CTDB failed
in the same way that some people have described on this mailing list.
https://wiki.samba.org/index.php/Ping_pong
by adding some bold and an extra section.
The summary is that you can't race through and simply confirm that the
test prints the correct data_increment value when running with -rw.
For the recovery lock to work you need to run the non -rw version and
actually confirm that *the locking rate drops dramatically*. If it
doesn't then it is *not* working!
This is not necessarily true!

For instance I remember that a few years ago, for GFS2 with the default
configuration, I observed a constant lock rate until I reached 5
nodes or so. This was due to the fact, that GFS' lock manager by
default restricted locks to 100/second. Only if you removed that
limit, you could see that dramatic drop.

Also the drop will not be as dramatic with every file system,
since file systems seem to have different levels of optimization
when only one node is involed.

I also remember (I think also with GFS), that initial lock rate
was pretty high for 1 node (with custom config), and dropped
drastically when I added a node. But when I removed the but-last
node, the rate did not raise as drastically as it initially
dropped, i.e. not to the orignal high lock rate.
The explanation was that the lock manager stayed in the special
mode for a single locking node only until a second locking node
was added, but it did not revert back to the special scheme
after the last had left (presumably based on a heuristic that
probably more lockers would come back later).

So I'd say that ping_pong without -rw is generally good for
seing possible lock rates, but if you want to verify real
behaviour, then you should test with -rw (of course only if
the file system implements coherence of data operations under
locks, which hopefully all file systems that we can seriously
take into account do...). :-)

Cheers - Michael
Ralph Böhme
2014-12-09 21:17:21 UTC
Permalink
Hi all,
Post by Scott Lovenberg
As luck would have it, at work I wrote a chef cookbook for clustered
MySQL (I'm getting around to fixing some serious problems with it
before I put it on Github), so I'd like to give some advice. RUN!
Don't look back, just run!
that's what I did! :)

Vagrant+Puppet+GPFS+Samba -> voila, Samba cluster

Still work in progress, Samba is not yet configured, but `vagrant up`
brings up a nice two node GPFS cluster:

<https://github.com/slowfranklin/samba-cluster>

Cheerio!
-slow
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de,mailto:***@sernet.de
Martin Schwenke
2014-12-10 08:19:35 UTC
Permalink
Post by Michael Adam
Post by Martin Schwenke
The summary is that you can't race through and simply confirm that the
test prints the correct data_increment value when running with -rw.
For the recovery lock to work you need to run the non -rw version and
actually confirm that *the locking rate drops dramatically*. If it
doesn't then it is *not* working!
This is not necessarily true!
Then we need a better test and/or better documentation... and also more
hours in each day to make that happen. ;-)
Post by Michael Adam
For instance I remember that a few years ago, for GFS2 with the default
configuration, I observed a constant lock rate until I reached 5
nodes or so. This was due to the fact, that GFS' lock manager by
default restricted locks to 100/second. Only if you removed that
limit, you could see that dramatic drop.
Also the drop will not be as dramatic with every file system,
since file systems seem to have different levels of optimization
when only one node is involed.
I also remember (I think also with GFS), that initial lock rate
was pretty high for 1 node (with custom config), and dropped
drastically when I added a node. But when I removed the but-last
node, the rate did not raise as drastically as it initially
dropped, i.e. not to the orignal high lock rate.
The explanation was that the lock manager stayed in the special
mode for a single locking node only until a second locking node
was added, but it did not revert back to the special scheme
after the last had left (presumably based on a heuristic that
probably more lockers would come back later).
So I'd say that ping_pong without -rw is generally good for
seing possible lock rates, but if you want to verify real
behaviour, then you should test with -rw (of course only if
the file system implements coherence of data operations under
locks, which hopefully all file systems that we can seriously
take into account do...). :-)
Ah, but in the OCFS2 case the -rw test works, while the "without -rw"
test does not work! ;-)

People definitely need to run both. It seems that just running with
-rw is not good enough.

peace & happiness,
martin
Martin Schwenke
2014-12-10 08:23:15 UTC
Permalink
On Fri, 5 Dec 2014 23:57:08 -0600, Scott Lovenberg
Post by Scott Lovenberg
Post by Martin Schwenke
Can autocluster work with other (e.g. Debian-based) distros? Sorry, not
really. There are a lot of assumptions about kickstart, network
configuration and location of configuration files, along with copious
use of yum. :-(
However, the modularisation work I did should allow much of the
post-boot package installation and configuration to be replaced by
something like Chef (from OpenStack). Someone would need a
non-trivial slab of time to make that change. When we've done that it
should be a little easier for autocluster to be distribution
agnostic... and I'm guessing that the part that people are really
interested in is the post-boot configuration, since that doesn't
depend on the cluster being virtual.
As luck would have it, at work I wrote a chef cookbook for clustered
MySQL (I'm getting around to fixing some serious problems with it
before I put it on Github), so I'd like to give some advice. RUN!
Don't look back, just run!
:-)
Post by Scott Lovenberg
[...]
I truly don't want to discourage anyone from getting this working on
Chef, but you should know ahead of time that there are some hurdles
that you don't even know you'll deal with until you are confronted by
them. [...]
Yep, it is probably a lot of work, which will take a lot of time. I
certainly don't think that have that amount of time in the foreseeable
future... :-(

peace & happiness,
martin
Michael Adam
2014-12-10 08:43:12 UTC
Permalink
Post by Martin Schwenke
Post by Michael Adam
Post by Martin Schwenke
The summary is that you can't race through and simply confirm that the
test prints the correct data_increment value when running with -rw.
For the recovery lock to work you need to run the non -rw version and
actually confirm that *the locking rate drops dramatically*. If it
doesn't then it is *not* working!
This is not necessarily true!
Then we need a better test and/or better documentation... and also more
hours in each day to make that happen. ;-)
:-D
Post by Martin Schwenke
Post by Michael Adam
For instance I remember that a few years ago, for GFS2 with the default
configuration, I observed a constant lock rate until I reached 5
nodes or so. This was due to the fact, that GFS' lock manager by
default restricted locks to 100/second. Only if you removed that
limit, you could see that dramatic drop.
Also the drop will not be as dramatic with every file system,
since file systems seem to have different levels of optimization
when only one node is involed.
I also remember (I think also with GFS), that initial lock rate
was pretty high for 1 node (with custom config), and dropped
drastically when I added a node. But when I removed the but-last
node, the rate did not raise as drastically as it initially
dropped, i.e. not to the orignal high lock rate.
The explanation was that the lock manager stayed in the special
mode for a single locking node only until a second locking node
was added, but it did not revert back to the special scheme
after the last had left (presumably based on a heuristic that
probably more lockers would come back later).
So I'd say that ping_pong without -rw is generally good for
seing possible lock rates, but if you want to verify real
behaviour, then you should test with -rw (of course only if
the file system implements coherence of data operations under
locks, which hopefully all file systems that we can seriously
take into account do...). :-)
Ah, but in the OCFS2 case the -rw test works, while the "without -rw"
test does not work! ;-)
That is really really strange.

But the major problem seems to be:
How can we reliably tell whether the "without-rw" test succeeds?
From what I wrote above, the pure lock rate does not always seem
to give enough information.

How did _you_ tell that the without-rw test failed?

Michael
Post by Martin Schwenke
People definitely need to run both. It seems that just running with
-rw is not good enough.
Martin Schwenke
2014-12-10 09:39:54 UTC
Permalink
Post by Michael Adam
Post by Martin Schwenke
Ah, but in the OCFS2 case the -rw test works, while the "without -rw"
test does not work! ;-)
That is really really strange.
How can we reliably tell whether the "without-rw" test succeeds?
From what I wrote above, the pure lock rate does not always seem
to give enough information.
How did _you_ tell that the without-rw test failed?
Well, I missed it the first time because I was very quick to jump to
the with-rw test. Then Amitay and I sat down, took our time and
made sure we understood everything... ;-)

Running ping_pong on 1 node gave an astronomical lock rate, like:

1951653 locks/sec

Running on a 2nd node gave the same rate, with no reduction on the 1st.

Running 2 ping_pongs on the same node resulted in a significant drop
in the lock rate.

So this confirmed that OCFS2 has a lock coherence problem across
nodes.

Note that I haven't tried anything to resolve this situation. I
have no DLM running or anything so I expected this problem, given that
Richard had to do a lot more in his tests to resolve this issue.

peace & happiness,
martin

Continue reading on narkive:
Loading...