master-segfault


Project maintained by tshikaboom Hosted on GitHub Pages — Theme by mattgraham

Concurrency is hard

L’accident du Therac-25

1985-87 Machine therapie a rayonnement

Ariane 501

Plusieurs calculateurs de vol communiquant sur un bus Un processus de verification de fond plante

$500m lost.

Sequential Stack in a concurrent world

pop ( ) {
    t = Top;
    if (t != nil)
        Top = t->tl;
    return t;
}

push (b) {
    b->tl = Top;
    Top = b;
    return true;
}

Imagine that two threads invoke pop() concurrently. They might pop the same entry!

para-stack

Solutions

CAS - Compare and Swap

Solution for concurrency is sequential world. Hardware abstraction, in software, usually within while(true)

What does it do

The CAS(addr, expected, new) tests if the expected is located at addr. If it is, it replaces it with new and returns true. Otherwise it returns false without any action.

Code

bool CAS(val_t *addr, val_t expected, val_t new_val) {
    atomic {
        if (*addr == exp) then {
            *addr = new;
            return true;
        }
        else return false;
    }
}

Example

Considering the followint pop() code:

pop() {
    while (true) {
        t = Top;
        if (t == nil) break;
        n = t->tl;
        if CAS(&Top,t,n) break;
    }
    return t;
}

and two processes p1 and p2. Two concurrent pop() now work fine. para-conc-pop-1 para-conc-pop-2 para-conc-pop-3 CAS of th1 (red) fails.

TAS - Test and Set / Get and Set

Usually called TAS, should be called GAS (Get And Set). Hardware instruction but doesn’t test (like CAS).

class AtomicBoolean {
    private boolean b;
    public synchronized boolean getAndSet() {
        boolean tmp = b;
        b = true;
        return tmp;
    }
}

class AtomicReference<T> {
    private T ref;
    public T getAndSet() { }
    public void set(T r) { }
    public boolean synchronized compareAndSet(T expectedRef, T newRef) {
        if (ref == expectedRef) {
            ref = newRef;
            return true;
        } else
            return false;
    }
}

This mechanism is used essentially to implement low level locks.

The ABA problem

The ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and “value is the same” is used to indicate “nothing has changed”. However, another thread can execute between the two reads and change the value, do other work, then change the value back, thus fooling the first thread into thinking “nothing has changed” even though the second thread did work that violates that assumption. Free, allocate another thing at the same @, CAS passes through, invariant violated.

ABA Example

Process p1 reads value A from shared memory,

Although p1 can continue executing, it is possible that the behavior will not be correct due to the “hidden” modification in shared memory.

ELI5

John is waiting in his car at a red traffic light with his children. His children start fighting with each other while waiting, and he leans back to scold them. Once their fighting stops, John checks the light again and notices that it’s still red. However, while he was focusing on his children, the light had changed to green, and then back again. John doesn’t think the light ever changed, but the people waiting behind him are very mad and honking their horns now.

Test and test and set

Wait until lock looks free by spinning on local cache. No bus use while lock busy. Problem? When lock is released - invalidation storm.

Exponential Backoff

Easy to implement, same thing as in a CSMA/CA. On spinlock, back off, wait for exponentially inreasing period of time (usually bounded). [+] Easy to implement [+] Better performance than TTAS [-] ‘Pifologique’ fine-tuning [-] Not portable across architectures/platforms

spin-compare

Anderson Queue Lock

Idea

Avoids useless invalidations by keeping a queue of threads. Each thread notifies next in line withought bothering the others. Fairness is ensured by using FIFO queue mechanism.

Anderson ELI5

Relay race where the athlete passes on the baton to the next athlete in queue which ensures that the only one athlete acquires the baton.

anderson

anderson2

Hazard pointers - Michael’s algorithm

Michael adds to the previous algorithm a global array H of hazard pointers:

The algorithm is then modified:

So,

Michael’s pop() and push()

pop ( ) {
    while (true) {
        atomic { t = Top;
        H[tid] = t; };
        if (t == nil) break;
            n = t->tl;
        if CAS(&Top,t,n) break;
    }
    H[tid] = nil;
    return t;
}

push (b) {
    for (n = 0; n < no_threads, n++)
        if (H[n] == b) return false;
    while (true) {
        t = Top;
        b->tl = t;
        if CAS(&Top,t,b) break;
    }
    return true;
}

Similar to CAS but very hard to implement Supervise addresses, if they change, notify.

Load-Link(x) supervises x address Store-Conditional (x, y)

LL/SC Invariants

Logique de Hoare

Logique qui permet de raisonner sur les programmes sequentielles.

The Hoare triple

[p] C [q]

Primitives

{P} SKIP {P}

Hoare Example

{X=1} X:= X+1 {X=2}

Linearisability

Point de linearisation -> fin d’execution

                       q.enq(x)
------------------|--------------|------------------------
                                 ^ here

Java

volatile means that access to this var are gonna be serializable

TM - Transactional Memory - Software TM - Hardware TM

Permet atomicite sur tout.

Le Hard/Software checks allow atomicity for all.

Peterson

The smallest possible synchro algo for two threads (verified with a model checked). The number of registers needed to implement it is exponential to the number of threads to synchronize.

Architectures

MIMD - Multiple Instruction Multiple Data

MIMD instructions are used in large scale systems; this means that at any time, multiple processors may be working on multiple data sets. This is in contrast with SIMD instructions, where a single instruction is operating on multiple datasets: the processor may be using its ALU(s) and FPU(s) etc. at the same time but there is no inherent concurrency in execution, since there is only one instruction to execute from the CPU’s point of view.

MIMD processors usually make use of a shared bus to access the central memory shared between the CPUs.

Shared bus