3506 – race condition in trylock/testlock

Bug 3506 - race condition in trylock/testlock

Summary: race condition in trylock/testlock

Status:	RESOLVED FIXED

Alias:	None

Product:	Mercurial
Classification:	Unclassified
Component:	Mercurial (show other bugs)
Version:	unspecified
Hardware:	All Linux

Importance:	normal bug
Assignee:	Bugzilla

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-06-20 08:22 UTC by Andrew Suffield
Modified:	2012-10-19 14:25 UTC (History)
CC List:	4 users (show)

See Also:
Python Version:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Suffield 2012-06-20 08:22 UTC

mercurial.lock.trylock can follow this sequence of events:

call util.makelock
get EEXIST because somebody else has it locked
the other process that has the lock releases it at this point, deleting the symlink
call self.testlock
-> call util.readlock
   -> call os.readlink
   -> get ENOENT
   -> raise, because it's not EINVAL or ENOSYS
we're not inside any exception handler here, so OSError is propagated up the call stack, eventually aborting the whole operation

The correct behavior is of course to swallow ENOENT in trylock and go back around the loop - the lock was just released so this process should try to take it.

Comment 1 Bryan O'Sullivan 2012-09-10 20:29 UTC

Confirmed locally.

Comment 2 HG Bot 2012-09-29 16:11 UTC

Fixed by http://selenic.com/repo/hg/rev/829919ef894a
Tomasz Kleczek <tomasz.kleczek@fb.com>
lock: fixed race condition in trylock/testlock (issue3506)

Suppose the following scenario:

1. Process A takes the lock (e.g. on commit).
2. Process B wants to grab the lock. Since lock file exists
   the exception is raised. In the catch block the testlock
   function is called.
3. Process A releases the lock.
4. Process B tries to read the lock file as a part of testlock
   function. This results in OSError (ENOENT) and since we're
   not inside the exception handler function this is propagated
   and aborts the whole operation.

To fix this we now check in testlock function whether lock file
actually exists and if not (i.e. if readlock fails) we just return.

(please test the fix)