mercurial.lock.trylock can follow this sequence of events: call util.makelock get EEXIST because somebody else has it locked the other process that has the lock releases it at this point, deleting the symlink call self.testlock -> call util.readlock -> call os.readlink -> get ENOENT -> raise, because it's not EINVAL or ENOSYS we're not inside any exception handler here, so OSError is propagated up the call stack, eventually aborting the whole operation The correct behavior is of course to swallow ENOENT in trylock and go back around the loop - the lock was just released so this process should try to take it.
Confirmed locally.
Fixed by http://selenic.com/repo/hg/rev/829919ef894a Tomasz Kleczek <tomasz.kleczek@fb.com> lock: fixed race condition in trylock/testlock (issue3506) Suppose the following scenario: 1. Process A takes the lock (e.g. on commit). 2. Process B wants to grab the lock. Since lock file exists the exception is raised. In the catch block the testlock function is called. 3. Process A releases the lock. 4. Process B tries to read the lock file as a part of testlock function. This results in OSError (ENOENT) and since we're not inside the exception handler function this is propagated and aborts the whole operation. To fix this we now check in testlock function whether lock file actually exists and if not (i.e. if readlock fails) we just return. (please test the fix)