# Stalking the Lost Write: Memory Visibility in Concurrent Java

Jeff Berkowitz, New Relic QCon San Francisco, November 2014

# The Computer We Imagine



```
statement-1;
statement-2;
if (b) statement-3;
while (cond) {
   statement-4;
```

### Memory

### **The Compiler We Imagine** Typical assembly language - no particular CPU

<u>Java</u>

x++;

### Assembly Language

mov mem.x, reg1
incr reg1
mov reg1, mem.x

```
mov mem.y, reg1
incr reg1
mov reg1, mem.y
```

### **The Compiler We Imagine** Typical assembly language - no particular CPU

<u>Java</u>

**x++**;

### <u>Assembly Language\*</u>

mov mem.x, reg1
incr reg1
mov reg1, mem.x

mov mem.y, reg1
incr reg1
mov reg1, mem.y

# The Compiler We Get

### <u>Java</u>

### Assembly Language

mov mem.x, reg1

mov mem.y, reg2

incr reg1
mov reg1, mem.x

incr reg2
mov reg2, mem.y

# The Compiler We Get

### <u>Java</u>

### **x++**;

### Assembly Language

mov mem.x, reg1
mov mem.y, reg2

incr reg1
mov reg1, mem.x

incr reg2
mov reg2, mem.y

# The End ResultTypical micro operations - no particular CPUJavaAssembly LanguageHardware Level

| mov | mem | х, | 1 |
|-----|-----|----|---|
|     |     |    | _ |

| mov | mem. | У | / |  |
|-----|------|---|---|--|
|     |      |   |   |  |

| incr | reg1  |    |
|------|-------|----|
| mov  | req1, | me |

| y++; | incr | reg2 |  |
|------|------|------|--|
|      |      |      |  |

**x++**;

reg1 reg2

em.x

mov reg2, mem.y

rd.issue(x)
rd.issue(y)

resp.mov(r1)
resp.mov(r2)
incr r1

wr.async(r1, x)

incr r2
wr.async(r2, y)

### The End Result **Typical micro operations - no particular CPU** <u>Assembly Language</u> <u>Hardware Level</u> Java rd.issue(x) mov mem.x, regl rd.issue(y) mov mem.y, reg2 x++; resp.mov(r1) incr reg1 resp.mov(r2) mov reg1, mem.x incr rl wr.async(r1, x) incr reg2 mov reg2, mem.y incr r2 wr.async(r2, y)

- **y++**;

### The Multiprocessor We Imagine

There are no caches or memory buffering here





### Code Example 1

int x, y, a, b; // all zero

### <u>CPU 1</u>

- void m1() { void m2() { y = a;b = 1;
- }

### Possible outcomes for x and y?

### <u>CPU 2</u>

 $\mathbf{x} = \mathbf{b};$ a = 2;}



### **Possible Trace 1**







### **Possible Trace 3**







### **Possible Trace 5**



- It looks like x or y must be 0 in the result
  - Makes sense: the first statement of ml() grabs a 0, and so does the first statement of m2()
- Is our reasoning correct?
  - int x, y, a, b; // all zero void m1() { void m2() {  $\mathbf{x} = \mathbf{b};$ y = a;b = 1;a = 2;

### Is That It?

### **Surprisingly, No** *Counterintuitively, the compiler can reverse the order*



→ mov #1, mem.b → mov mem.a, mem.y

→ mov #2, mem.a → mov mem.b, mem.x





# And It Gets Worse ...





- Widely-known cache coordination protocol
- Acronym for cache line states:
  - Modified Exclusive Shared Invalid
- Transfers cache-line "messages" between processor caches
- Typically coordinated by parallel signaling "bus" within chip or single board

### **MESI** Protocol

















### MESI Example 2-1 Credit: <u>http://bit.ly/pjug2013-mckenney-parallel</u>

- CPU 1
- void m1() { A = 1;B = 1;

int A, B; // both zero <u>CPU 2</u> void m2() { while (B == 0)assert(A == 1);


























































#### Where Are We?

### Some concurrent traces ("Good" traces) seem much more intuitive than others

void m1() {
 y = a;
 b = 1;
}



#### Others Not So Much

## • "Bad" traces don't correspond to any possible sequential execution of the original statements

void m1() {
 y = a;
 b = 1;
}



- "Good" traces correspond to some sequential execution of the original language statements
- The concept of some sequential execution can be formalized as sequential consistency or SC.
- "Bad" traces can be prevented by specifying rules allowing programmers to ensure their code is SC.

#### Sequential Consistency

#### Java Memory Model (JMM)

- Early Java was broken
- JMM introduced in Java 1.5 (2004)
- Now section 17.4 and 17.5 of JLS
- Based on the concept of a partial order
  - Most memory operations are unordered
- Abstract Happens-before operator defines ordering of specific memory operations

## Typical Rules from JMM

by the same thread in program order."

"All memory operations prior to writing a a read of the same volatile from another thread."

- "Every memory operation on a given thread happens-before the next memory operation
- volatile variable on one thread happen-before

#### Modified Example 1

volatile int a, b, x, y;

#### CPU 1

void m1() { = a; b = 1;

y == a must be visible to any thread that can observe b == 1

#### <u>CPU 2</u>

void m2()  $\mathbf{x} = \mathbf{b};$ a = 2;

x == b must be visible to any thread that can observe a == 2.



The two happens**before** operations mean that if CPU 2 can observe b = 1, it must also observe y = a.

The compiler and runtime cooperate to prevent the non-SC trace from occurring.

#### Result

#### Surprising Trace Prevented



### **Rights and Responsibilities**

- Programmer is responsible for ensuring the presence of a happens-before between every pair of references to a given datum.
- In exchange, JMM guarantees that program behavior will be **SC**
- Terminology: a missing happens-before is called a data race.

### **Ensuring Happens-Before**

- Single-threaded code naturally has H-Bs
- To ensure H-Bs in concurrent code, use:
  - immutability (final) with safe publication
  - primitives (volatile, mutex, atomics)
  - concurrency-safe library classes
  - concurrency-safe frameworks and programming models, e.g. Akka

#### MESI Example 3-1 MESI Example 2 but modified with volatile

#### CPU 1

void m1() { A = 1;B = 1;

}

#### volatile int a, b; // both zero **CPU 2** void m2() { while (B == 0)assert(A == 1);





































































#### Where Are We?

- Volatile is one way to express happensbefore relationships
- Prevents reordering in the compilers
- At runtime, JIT generates architecturespecific opcodes to
  - prevent memory op reordering in hardware
  - prevent deferred processing in hardware

#### Generalizing on the JMM...

- Go
  - New language from Google • Memory model expressed in terms of "happens-before" as in JMM.
- Akka

  - Async framework for Java, Scala, ... Spec makes reference to JMM

#### Other Languages

- C
  - Explicit (compiler directives, asms)
- C++
- Objective-C

Interesting memory model in C++ 2011

• Also low level, language-specific features

#### And More Languages

- C#
  - Similar to Java
- Rust
  - Concurrent task abstraction (a lá Occam?);
     No shared memory in "safe" code
- Dalvik (Android virtual machine)
  - Historically broken (Stackoverflow post)

### Explicit Control in C

- Compiler directives/annotations/asms to prevent aggressive compiler reordering
- Linux kernel: macros expand to explicit memory barrier instructions
  - void m1(void) { stmt-1; smp mb();

- stmt-2;

#### Summary

- These issues affect all languages that support programming with threads
- Java community was ahead of the curve in addressing them
- Awareness wins you may not program against the JMM, but understanding it is powerful.
- Keep learning avoid "DIY" and use the highest level tools you can.

#### References

#### http://bitly.com/bundles/pdxjjb/2

# Contains all the "bit.ly" links from this presentation

#### THANK YOU

- and providing feedback...
- version or another of this talk.

 Java Agent team and so many others at New Relic for attending my practice talks

And everyone who has attended one



#### Followed By

Lunch