State in scalable architectures

Created by Felipe Fernández / @felipefzdz

About Me

Felipe Fernández

  • Work as Software Craftsman for Codurance
  • Blog: http://codurance.com/blog/author/felipe-fern%C3%A1ndez
  • Twitter: @felipefzdz

About the talk

A story about different solutions around state,
driven by increasing scalability needs.

Disclaimer

Talk based on:

  • "Making sense of stream processing"
    by Martin Kleppman
  • "I heart logs" by Jay Kreps
  • "Jepsen" by Aphyr
  • My professional experience :)

1
State in a
monolith database

1. State in a monolith database

God database

1. State in a monolith database

God database

  • Database as a single source of truth
  • Table as original data
  • Materialised views, secondary indexes, replicas
1. State in a monolith database

Materialised Views

1. State in a monolith database

Replication

1. State in a monolith database

Secondary indexes

1. State in a monolith database

Replicated log

  • Database holds ACID properties for you

2
Data stores
explosion

2. Data stores explosion

Best tool for the job

2. Data stores explosion

Best tool for the job

  • Now we need to care about synchronisation
  • Network is evil: drops, delays, reorder, duplication

3
ACID properties

3. ACID properties

ACID properties

3. ACID properties

Atomicity

3. ACID properties

Isolation

3. ACID properties

Durability

3. ACID properties

Consistency

4
Data stores
integration

4. Data stores integration

Point to point integration

4. Data stores integration

Point to point integration

  • Stored procedures and triggers are hard to mantain
  • Data store interfaces are usually not general
  • Datastax, Cloudera: integral solutions
4. Data stores integration

Datastax Enterprise

5
Dual
Writes

5. Dual Writes

Do-it-yourself synchronisation

  • Atomicity: crash recovery
  • Isolation: race conditions
  • Consistency: eventual consistency
  • Durability: network partitions

6
The log

6. The log

Kafka to the rescue

6. The log

Kafka to the rescue

6. The log

Log guarantees

  • Offsets to achieve atomicity
  • Order to achieve isolation
  • History to replay consistency
  • Replication to achieve durability
6. The log

Order / Isolation

6. The log

Offsets / Atomicity

6. The log

History / Consistency

6. The log

Replication / Durability

6. The log

State in scalable architectures

  • Log as buffer. No need of back pressure, massive scalability
  • Data Stores and Processors are uncoupled

7
Conclusion

7. Conclusion

Turning the database inside out

7. Conclusion

Turning the database inside out

Thank you

Any questions?