System design is a series of decisions optimising for:
Profit: Functionality, Scalability, Speed
Loss: Security, Robustness, Observability, Cost
[Functionality] Does the system work?
[Compliance] Is the system compliant?
[Security] Is the system secure?
[Robustness] How does the system handle failures?
[Scalability] Does the system scale?
[Speed] What are the bottlenecks?
[Observabilty] Is the system observable?
[Cost] Is there anything we can simplify do decrease cost?
1.2. Functionality
1.2.1. Types of Development
Web
Frontend
Backend
Mobile
Game
Desktop
Embedded
DevOps
Data
ML / AI
Security
1.2.2. CAP Theorem
Characteristic
Description
Use Case
Consistency
All nodes in the system see the same data at the same time
Usually preferred for financial systems
Availability
System remains operational even if some nodes fail
Usually preferred for social media / streaming apps
Partition Tolerance
System remains operational even if network communication with some nodes fail
Non-optional because networks are not reliable, so the tradeoff is usually between C and A.
1.2.3. Duplexity: Initiation of communication + Sending of data + Concurrency
Duplexity
Who can initiate communication?
Who can send data?
Can both send at the same time?
Example
Use Case
Simplex
One side only
One side only
N/A
Webhooks
Event notifications
Half-duplex
Typically one side at a time (often client-first)
Both sides
No
HTTP
APIs
Full-duplex
Both sides
Both sides
Yes
WebSocket
Chat, collaboration
1.2.4. Message Distribution / Fanout Patterns
Key question: For one event,
who should receive it,
this determines fanout appraoch
how many recipients at peak
if >1000, avoid broadcast and look into group fanout strategies
Fanout Patterns
Description
Use Case
Typical Implementation
Example Schema
1:1 (simple fanout/targeted routing)
One sender → one recipient
Notifications, replies
Connection registry + routing
1:many (group/complex)
One sender → defined subset
Chat rooms, teams
Group registry or pub/sub topics
1:all (simple fanout/broadcast)
One sender → everyone
Live feeds, system events
Pub/sub broadcast
Group Fanout Patterns
Description
Example KV Shape
Use Case
Typical Implementation
Registry Shape
Conditional
Send to subset matching a condition
All users with role=admin
Admin alerts
Attribute filters
attr → [connectionId]
Hierarchical
Topics arranged in a tree
region.eu.fr.paris
Geo updates
Topic hierarchy
topicPath → [connectionId]
Partitioned
Split audience for scale/performance
Shard users across partitions
Massive scale
Hashing / partitions
shardId → [connectionId]
Fanout Implementation
Advantages
Disadvantages
Use Case
In-memory
Fast, simple
Single node only
MVPs, single-node WS
Registry
Scales, targeted delivery
Registry complexity
Chat, notifications
Pubsub Broker
Massive fanout, decoupled
Wasteful for 1:1, infra cost
Feeds, live updates
1.3. Compliance
1.3.1. MIFID
Markets in Financial Instruments Directive (MIFID) is a EU compliance standard for financial systems
Key Requirements:
Be synchronised to UTC
Stay within strict accuracy limits (e.g. milliseconds or microseconds depending on system)
Common implementations: NTP = Network Time Protocol
It keeps a machine’s clock accurate and synced with official time servers
Prevents clock drift (when a system slowly becomes a few seconds/minutes off)
Used everywhere in infrastructure (servers, containers, databases)
1.3.2. PCI DSS
PCI DSS (Payment Card Industry Data Security Standard) is a security compliance standard governed by major card brands (Visa, Mastercard, Amex) relevant to credit/debit card data
Prefer using payment providers (e.g. Stripe) to avoid handling card data
If unavoidable:
Never store sensitive authentication data (CVV, PIN, track data)
Isolate the Card Data Environment (CDE) via network segmentation
Encrypt cardholder data in transit and at rest
Strict access control + audit logging for in-scope systems
1.4. Security
1.5. Robustness
1.6. Scalability
1.7. Speed
1.8. Observabilty
1.9. Cost
1.9.1. Infrastructure as a Service (IaaS) vs Platform as a Service (PaaS)
Approach
Use Case
Adv.
Disadv.
IaaS
Large-scale / custom apps
Flexibility, Pay-as-you-go
Setup & maintenance, steeper learning curve
PaaS
MVPs
Faster dev + CI/CD + easy deployment & scaling + security out of the box
Vendor lock-in, less flexibility
1.9.2. Tenancy
A tenant is a customer/organisation space with its own users, data, config
Single-tenant
Multi-tenant
Definition
One tenant per isolated stack
Multiple tenants per stack
Isolation
Strong
Weak
Per-tenant customisation
Easy
Harder
OpEx
Higher
Lower
Scale
Worse (under-utilised)
Better (pooling)
Compliance / Data residency
Easier
Harder (needs partitioning)
Onboarding Speed
Slower
Faster
2. C2: Container Design
2.1. Functionality
2.1.1. Typical Cloud Infrastructure
Layer
Component
Description
Use Case
E.g.
Edge
DNS
Resolves domain name
AWS Route53, GCP DNS
Gateway
Gateway
Routing to different services, security
Typically used to route to serverless, not usually needed for servers + ALB
AWS API Gateway
Protocol Translation (HTTP to gRPC, REST to GraphQL)
More setup (certificates, servers), less integrated with shell workflows
Cloud uploads/downloads, APIs, large-scale file distribution
Peer-to-Peer
Parallel downloads from many peers, scales with number of participants
More complex coordination, uncommon for private 1-to-1 transfers
Large-scale distributed file sharing
H2H Protocol
Transport
Resume Support
Adv
Disadv
Typical Use Case
SFTP
SSH over TCP
Requires setup
Simple setup, single port, strong authentication model
Slight protocol overhead
Secure server-to-server or dev-to-server transfers
SCP
SSH over TCP
No
Very simple, minimal setup
No resume support, limited features
Quick ad-hoc file copy
rsync (over SSH)
SSH over TCP
Built-in
Efficient delta transfer, resume support
Slightly more complex usage
Large file sync, repeated transfers
Client-server Protocol
Transport
Resume Support
Adv
Disadv
Typical Use Case
FTP / FTPS
TCP / with TLS
Requires setup
Widely supported, simple client/server model
Legacy design, complex firewall/NAT handling
Legacy enterprise integrations
HTTP / HTTPS
TCP / with TLS
Requires setup
Firewall-friendly, scalable, cloud-native
Stateless, whole-object transfers
Cloud uploads/downloads (e.g. object storage)
P2P Protocol
Transport
Resume Support
Adv
Disadv
Typical Use Case
BitTorrent
TCP / µTP over UDP
Built-in
Parallel P2P transfer, scales with peers
More complex setup, uncommon for private transfers
Large-scale distributed file sharing
2.1.5. Operating Systems
Stack Size
Linux: 8MB
macOS: 8MB
Windows: 1MB
2.1.6. Transport Models
Model
TCP
UDP
Quick UDP Internet Connections (QUIC)
2.1.7. API Architectural Styles
What API architectural style is optimal for functionality (speed) and cost (DevX, maintenance, opex)?
Style
Description
Use Case
Adv
Disadv
REST
Perform HTTP verbs on resources. Entity based, e.g. POST /users
Most common
Universally understood + docgen tools e.g. Swagger, OpenAPI
Slowest - One request for each entity unlike GraphQL + less space efficient than RPC
GraphQL
Query or mutate entities. Entity based, e.g. mutation CreateUser() {...}
APIs for FE
Faster - One request for multiple entities
More setup e.g. defining the schema, resolvers + less standardised docgen e.g. GraphiQL
RPC
Call functions remotely. Action based, e.g. await client.createUser()
Internal APIs
Fastest and most space efficient because it uses binary instead of text payloads
Only for internal use, requires HTTP2^
tRPC
Type-safe RPC framework that auto-generates client and server
Typescript Apps
Can run on HTTP1 because of text payload
Tied to Typescript ecosystem + limited language interoperability + difficult to debug
2.1.8. Transport Protocols
What transport protocol is optimal for functionality (user experience) and cost (DevX, maintenance, opex)?
Transport Protocols
Adv
Disadv
Use Case
HTTP
gRPC
WS
What common combinations are there?
Edge
Core
Use Case
HTTP
HTTP
Traditional API
WS
HTTP
WS
RPC
WS
MQTT
MQTT
MQTT
2.1.9. Scheduler
Scheduler
Does
Assign task to node
Does not
Start or manage the workload
Orchestrator
Does
Scheduler
Provisioning and starting workloads on nodes
Scaling workloads up/down based on demand
Health monitoring and self-healing
Rolling updates and rollback management
Managing networking, storage and service discovery
2.1.10. WebRTC
If asked: “How would you design WhatsApp voice calls?”
Signaling: WebSockets (or SIP for enterprise VoIP).
Transport: RTP/SRTP for media.
NAT traversal: STUN + TURN fallback.
Encryption: SRTP end-to-end.
QoS handling: Adaptive bitrate, jitter buffer.
If asked: “How does WebRTC work?”
WebRTC = framework, uses:
Signaling (custom, often WebSocket)
RTP/SRTP for audio/video streams
STUN/TURN for NAT traversal
DTLS/SRTP for security
Adaptive bitrate + codec negotiation.
2.1.11. Websockets
2.1.11.1. Single-Node
At high level design, a single-node WebSocket system can often handle up to ~10k concurrent connections, but to maintain a margin of safety, it’s reasonable to start thinking about distributed WebSocket systems above ~1k connections. At that point, distributed systems also bring benefits like better fault tolerance and operational robustness. When calculating costs
At a lower level, websocket soak test tools can be used to validate these assumptions by observing system behaviour over time (CPU/memory usage, message latency, connection health (success/lifetime/dropped), network egress), identifying which part of the system becomes a bottleneck and needs to be scaled. The goal at this stage is typically to meet some kind of SLO, e.g.:
99.9% of WebSocket messages delivered within 200ms
99% of API requests complete under 500ms
< 0.1% connection drops per hour
2.1.12. Pubsub
What types of pubsub brokers are there?
Pub/Sub Message Delivery Characteristic
Description
Advantages
Disadvantages
Typical Use Case
At-most-once
Message is delivered zero or one time; no retries
Very low latency; simple; minimal overhead
Messages can be lost silently
Metrics, logs, realtime notifications where loss is acceptable
At-least-once
Message is delivered one or more times; retries on failure
Higher reliability; simple retry model
Duplicate messages; consumers must be idempotent
Event propagation, cache invalidation, background jobs
Exactly-once (logical)
System ensures message effects occur exactly once (often via deduplication)
Creates an Elastic Network Interface (ENI) in your subnet with a private IP
SSM, Secrets Manager, CloudWatch
Gateway
Route table entries that direct traffic to S3 / DynamoDB
Port Forwarding: Forwarding of information from a router's port to a port on a device on its subnet
Tailscale Protocol: Connecting directly to a device's port when it is already on a subnet
2.3.4. Firewalls
Type
E.g.
Layer
Found In
Checks
Use Case
Web-Application (WAF)
AWS WAF
Application
CDNs, gateways, load balancer
Examines HTTP payload for attack detection
Web app / API protection against SQLi, XSS, bots, malicious patterns
Proxy
Nginx reverse proxy
Application
Proxy servers, gateways
Examines payload for access control and anonimisation
Packet Filtering
iptables (basic rules)
Network & Transport
Routers
Examines packets based on source/destination IP, port, protocol
Simple allow/deny rules, port blocking
Host-Based
Windows Firewall, iptables
Network & Transport
Individual servers / VMs
Examines traffic per host
Protects single servers, last line of defense
2.4. Robustness
If asked: “How does VoLTE differ from WhatsApp?” • VoLTE → Managed SIP + RTP inside carrier network, guaranteed QoS, low jitter. • WhatsApp → WebRTC over the public Internet, no QoS guarantees.
2.4.1. Typical Cloud Infrastructure
Layer
Component
Description
Use Case
E.g.
2.4.2. How do we ensure reliable message processing and delivery over time?
Strategy
Description
Advantages
Disadvantages
Typical Use Case
Queue / Stream keyed by client
Messages placed in per-client or keyed queue, instance owning WS consumes and delivers
Durable, pull-based backpressure, supports retries/replays/offline delivery (because messages stay in log while client is offline)
Higher latency enqueueing/dequeueing than simple push with pubsub, ownership/rebalancing complexity, not true push (message is delivered when consumer polls, not at production time)
Systems needing durability, offline delivery, or replay
Client pull / reconnect catch-up
Client fetches pending messages from shared store on poll or reconnect
Extremely resilient; minimal server coupling
Higher latency; weaker real-time guarantees
Notifications, feeds, async workflows
2.4.3. Websockets
Connection Type
Timeout
Client x API Gateway Websocket Connection
2hrs
API Gateway x Lambda Integration
29s
WebSocket Issues
Scenario
Mitigation
Reconnect Storms
Backoff + jitter in the client
Gateway crash
Client reconnect typically handled by client ws library
Stale connection registry
TTL + Heartbeat allows stale data to be cleared from the registry
Message loss
Memory leaks
Slow clients
2.4.4. Caching
Cache Stampede Management Strategies
Description
Adv.
Disadv.
Use Case
Warmup
Prefill
2.4.5. Distributed Websockets
2.4.5.1. Issues and Mitigations
Issue
Scenario
Mitigation
Reconnect storms
Many clients reconnect after outage
Exponential backoff + jitter
Gateway crash
All connections on node drop
Client reconnect + registry TTL
Stale registry
Registry points to dead gateway
Heartbeats + expiry
Message duplication
Retries cause duplicates
Idempotency
Backpressure
Gateways send messages faster than client can read, causing buffers/memory to explode
Stop sending temporarily / drop messages / disconnect clients
Connection exhaustion
Too many open sockets
Connection limits
Message loss
Crash during send
ACKs, retries, durable queues
2.4.6. Adaptive Performance Strategies
Strategy
Description
Layer
Use Cases
Jitter Buffer
Temporary storage in receiver's app that smooths out variations in packet arrival times before playback
Application
Jitter
Bitrate
Bitrate Reduction + ...
Bitrate Reduction
Reducing the encoding and sending of data
Application
Packet Loss
2.5. Scalability
2.5.1. Overview
Type
Principle
Use Case
Adv
Disadv
Vertical
Upgrading CPU/RAM/Storage
Small to medium apps, monolithic systems, startups
No code change + lower latency
Limited by hardware ceilings + expensive at scale + SPOF
Horizontal
Adding more servers
Distributed systems
Fault tolerance via redundancy + Infinite scalability
Network latency + Higher complexity
Types of horizontal scaling:
Database Horizontal Scaling
Compute Horizontal Scaling
Database Horizontal Scaling, i.e. sharding
Type
Principle
Use Case
Adv
Disadv
Directory/Lookup-based
Shard where data belongs depends on manually maintained directory
Frequently changing shards / manual control
Easy to add / remove shards
Directory is a SPOF, lookup adds latency
Range-based
Shard where data belongs depends on which contiguous key ranges (e.g. A-F, G-L, ...)
GET /dashboard instead of GET /users + GET /orders + GET /recommendations
3.1. Functionality
3.1.1. Choosing a language for mobile app development
3.1.2. Choosing a language for frontend web development
Language
Use Case
Adv.
Disadv.
JS
Default
Natively supported - browsers come with JS engine
Single-threaded by default
Dart (compiled to JS)
Cross-platform
No UI interactivity
C/C++/Rust (through WASM)
3D graphics, gaming, video editing (e.g. Figma, Canva, AutoCAD Web)
High performance
No UI interactivity
Python (through WASM)
AI/ML in the browser
High performance, mature AI/ML ecosystem library
No UI interactivity
C (through Blazor WASM)
Existing .NET implementation
UI interactivity
Young ecosystem, large initial payload (downloads 6MB .NET runtime)
JS is the default choice as it is the only language that has direct access to the DOM to render UI.
3.1.3. Choosing a language / framework for backend web development
The choice of language for backend web development is tightly coupled to the language's runtime, libraries and frameworks as they provide key tradeoffs.
Language
Use Case
Adv.
Disadv.
Javascript
Real-time apps, typically preferred over php these days
Mature ecosystem, same language for FE and BE, great for concurrency (<10k users)
Not typed
PHP
Wordpress, CMS, e-commerce
Huge CMS ecosystem, powers wordpress
Process-per-request model limits real-time apps without extra tooling, js is typically preferred
'''
1. Recursively
- Adv.: Clean and intuitive
- Disadv.: Limited by recursion depth, stack overflow risk
'''defrecursive(root): iot(root) preOT(root) postOT(root)defiot(node):if node isNone:return iot(node.left) process(node) iot(node.right)defpreOT(node):if node isNone:return process(node) preOT(node.left) preOT(node.right)defpostOT(node):if node isNone:return preOT(node.left) preOT(node.right) process(node)'''
2. Iteratively
- Adv.: Robust for large or unbounded inputs
- Disadv.: Less intuitive and readable
'''defiot(root):if root isNone:return stack =[] node = root
while stack or node: go left as far as possible
while node: stack.append(node) node = node.left
node = stack.pop() process(node) stack.append(node.right)defpreOT(root):if root isNone:return stack =[root] switching this to a queue changes the DFS to BFS
while stack: node = stack.pop() process(node) push right first so left is processed first
if node.right: stack.append(node.right)if node.left: stack.append(node.left)defpostOT(root):if root isNone:return stack =[] lastNode =None node = root
while stack or node: go left as far as possible
if node: stack.append(node) node = node.left
continue at leftmost node,if candidate has right andisnot the last visited node, check right subtree
at
candidateNode = stack[-1]if candidateNode.right and lastNode != candidateNode.right: node = candidateNode.right
continue node = stack.pop() process(node) lastNode = node
node =None do not process node again
BFS
There are two ways to perform traversal:
Flat Traversal (FT)
Level-Order Traversal (LOT)
BFS is primarily done iteratively - it can be implemented recursively but there is no practical benefit.
defft(root):if root isNone:return queue = deque([root])while queue: node = queue.popleft() process(node)if node.left isnotNone: queue.append(node.left)if node.right isnotNone: queue.append(node.right)deflot(root):if root isNone:return queue = deque([root])while queue:for LOT, we just need to wrap the flat traversal logic in a for loop with levelSize iterations
levelSize =len(queue)for _ inrange(0,levelSize): same as flat traversal
Note:
You can also add metadata for each node by appending tuples (node, metadata) to the queue instead of just nodes
4.1.1.5. Array
How many times can I slide a window over an array?
Intuition
Start from the base case - window size 1
How many times can you slide it?
Increase window size
Formula
len(array) - windowSize + 1
4.1.1.6. Bitwise Operations
Operation
Application
Example
AND &
Get carry for binary addition of two numbers
1 & 1 = 1
AND &
Get last bit
10 & 1 = 0, 11 & 1 = 1
XOR ^
Get sum without carry for binary addition of two numbers
1 ^ 1 = 0
0 ^ 1 = 1
1 ^ 0 = 1
XOR ^
Find differences between two bit patterns
0110 ^ 1010 = 1100, i.e. different in first two bits
Bit Shift
Multiply/divide by 2
x = 2, x << 1 = 4, x >> 1 = 1
4.1.1.7. Dynamic Programming
Caching results for fibonacci-style recurrence
4.1.1.8. Binomial Theorem
Theory
The Binomial Theorem describes how to expand binomial expressions without brute force
Binomial Expression:
An expression formed from two terms,
e.g. (a+b)
Binomial Theorem Formula:
(x+y)n=∑k=0n(kn)xn−kyk
where (kn)≡nCk is the binomial coefficient a.k.a. combinations
Applications
The binomial coefficient can be used to describe symmetric number sequences, e.g. 1 4 6 4 1
4.1.1.9. Describing Symmetry
Linear Symmetry
Combinations / Binomial Coefficient
Modulus
Even Functions
Cosine
Rotational Symmetry
Odd Functions
Sine
4.2. Compliance
4.3. Security
4.3.1. Request/Response Flags
Flag
Purpose
Use Case
HttpOnly
Prevents JS from reading cookies
Protect tokens from XSS
Secure
Cookie only sent over HTTPS
Protect plaintext cookies from being leaked
SameSite
Controls if cookies are sent on cross-site requests (Strict/Lax/none)
CSRF protection / cross-site marketing
Cache-Control
Controls caching of resposne data (no-store, max-age etc.)
Ensure sensitive data isn't cached
CORS headers
Control which domains can make cross-origin requests
APIs that need controlled access
4.3.1.1. Authentication
Transporting Passwords
Use HTTPS for password submissions
Avoid logging raw credentials
4.3.1.2. Authentication Methods
Method
Use Case
Username + Password
Username + Password + 2FA
SSO
Custom-built SSO
Securing Passwords
Hashing
Passwords should be stored as irreversible cryptographic hashes
Salting
A random, user-specific unique value (salt) is added to the plain-text password before hashing, which is stored in plaintext in the database
Prevents
two users with the same passwords from getting the same hash
hackers using rainbow tables (precomputed mappings of common passwords -> hashes)
Peppering
A random, global value (pepper) is added to the plain-text password before hashing, which is stored as an env variable on the server
An additional layer of security on top of salting
4.4. Robustness
4.4.1. Recursion Depth Limits
C++: 100,000
Depends on frame size + OS stack size
Dart: 10,000
Set by default
JS: 10,000
(V8 engine/chrome)
Depends on
Java: 1,000
Depends on frame size + OS stack size
Python: 1,000
Set by default
4.5. Scalability
4.6. Speed
4.6.1. CPU Optimisations
Branch Prediction
Variable reassignment
CPU Pipelining
CPU Preloading
CPU Prefetching
Cache Locality
Memory Access Patterns
4.6.2. Language Optimisations
Peephole Optimisations
Inline
Unroll
4.6.3. Caching
Cache Read Strategies
Description
Adv.
Disadv.
Use Case
Read-thru
App reads cache -> on miss, cache reads from DB
Simplifies app logic
Stampede risk on hot keys + Tight coupling between cache and data store + Limited flexibility for custom fetch logic
Simple KV access
Cache Write Strategies
Description
Adv.
Disadv.
Use Case
Write-thru
App writes to cache -> cache writes to DB sync
Cache is consistent + Reads are fast after writes
Higher write latency + Cache outage blocks writes
Strong consistency / configuration data
Write-behind / back
App writes to cache -> cache writes to DB async
Very fast writes
Risk of data loss without durable buffering (queue / WAL required) + eventual consistency
High-throughput / analytics / logging / non-critical data
Cache Read/Write Strategies
Description
Adv.
Disadv.
Use Case
Cache-aside
App checks cache -> on miss, app reads from DB -> app writes to cache
Simple + cache only stores what is used
Stampede risk on hot keys + Harder to guarantee consistency under concurrent writes
Default choice for most BE systems / Read-heavy systems / microservices / web APIs
Cache-thru
Read-thru + Write-thru
Centralised data access
Cache is SPOF + Reduced observability + Harder debugging
Rare / legacy / strict data access boundaries
Cache Invalidation Strategies
Description
Adv.
Disadv.
Use Case
TTL-based
Cached entries expires after time
Simple invalidation + Prevents stale data buildup
Stampede risk on expiry + Hard to pick optimal TTL
Often combined with cache-aside / CDN caching
Event-based
Cache entries invalidated on data change events
Very fresh data + No guess work with TTL
Event loss or ordering issues can cause permanently stale cache + More moving parts
Event-driven / CQRS systems
4.7. Observability
4.7.1. Logging
Avoid auto logging POST bodies and GET parameters
If the auto logging runs on auth endpoints, passwords could be written in plaintext to logs
OS removes splash once first UIViewController is ready
OS removes splash once Activity content is ready
Native splash removed after JS bundle + RN root view are mounted
Native splash removed after Flutter engine renders first frame
First Frame Rendered
First frame is rendered
Same
Same
Same
5.4. Databases
5.4.1. SQL Databases
Question
Answer
What is atomicity?
Atomicity guarantees that all statements are committed within a transaction, or none are, i.e. all or nothing
What is consistency?
Consistency guarantees that a transaction brings the database from one valid state to another, preserving all defined constraints
What is isolation?
Isolation guarantees that transactions running in parallel do not interact in unsafe ways (subject to change in isolation levels)
What is durability?
Durability guarantees that once a transaction has been committed, there will be no data loss in the event of a crash
What is SQL?
Structured Query Language is a language used to interact with data in relational DBs
What subtypes of SQL are there?
DQL, DML, DDL, DCL
What is DQL?
Data Query Language is used to read data from a DB, e.g. SELECT
What is DML?
Data Modification Language is used to modify data in the DB, e.g. INSERT, UPDATE, DELETE
What is DDL?
Data Definition Language is used to define the structure of data in the DB, e.g. CREATE
What is DCL?
Data Control Language is used to control access to data in the DB, e.g. GRANT, REVOKE
What is a command/statement?
An instruction executed by the database, e.g. SELECT * FROM fooTable;
What is a clause?
A part within a statement
What is a read/query?
A statement that reads data, e.g. SELECT * FROM fooTable;
What is a write/update?
A statement that writes data, e.g. DELETE FROM fooTable WHERE ...;
What is a read-then-write?
A pattern where data is read before a write (the write may use the data / update the same data)
What is a read-for-update?
A pattern that reads data with the intention of updating the same data later, e.g. BEGIN; SELECT ... FOR UPDATE; -- biz logic on that data ; UPDATE ...; COMMIT;
What is a result set?
Data returned from a query
What is an update acknowledgement?
Confirmation returned from an update
What is a transaction?
A group of statements executed as a single unit of work e.g. BEGIN; SELECT ...; UPDATE ...; COMMIT; ROLLBACK;
What is a single unit of work?
A set of operations that guarantees atomicity, i.e. either all statements commit, or rollback
What is a lost update?
Two transactions read the same row, both modify it, one overwrites the other
What is a dirty read?
One transaction reads data written by another transaction that has not committed yet
What is a non-repeatable read?
A situation where a transaction reads data twice and gets different values because another transaction committed inbetween
What is a phantom read?
A situation where the set of rows matching a condition changes during a transaction
What is a deadlock?
A deadlock is when 2 or more transactions are waiting for each other's locks, preventing any from proceeding
What is autocommit?
Autocommit is a DB configuration that automatically commits individual SQL statements as its own transaction
What is an object?
Anything defined in a DB, e.g. Tables, Views, Indices, Stored Procedures, Triggers, Functions
What is a schema?
Logical grouping of DB objects
What is an execution plan?
The strategy the DB optimiser chooses to execute reads/writes efficiently, e.g. index scan vs full scan, hash join
What is a lock?
Mechanism used to control concurrent access to data, e.g. through blocking
What is a lock mode?
The type of lock that defines what operations are allowed/blocked (e.g. S, U, X, IS, IX)
What is lock granularity?
The level of the object at which a lock is applied (e.g. row, page, table)
What is a data lock mode?
Locks placed on actual data to control read/write access (e.g. S, U, X)
What is an intent lock mode?
Locks placed on coarser-granularity objects by transactions, signalling their intentions on finer-granularity objects (e.g. IS, IX)
What is the lock lifecycle?
Acquire intent lock when higher-level object is accessed during execution -> acquire data lock when lower-level object is accessed during execution -> do work -> release lock depending on lock mode
When are read locks released?
Depends on the isolation level
When are write locks released?
At transaction end / commit
What does blocking mean in locks?
A transaction is forced to wait at a statement that is trying to access a locked resource, because it cannot acquire the required lock
How many S locks can exist on a resource at any time?
Multiple
How many U locks can exist on a resource at any time?
One
How many X locks can exist on a resource at any time?
One
Why does S block U
Because U locks is intended to be promoted to X locks, which gets blocked by S, resulting in a deadlock
Why doesn't U block S
Because DBs allows concurrent reads while determining if an update is necessary, reducing total lock duration and improving performance (design choice)
Thread
1
2
3
4
5
6
A
request received
read old data under U lock allowed
response sent with old data
B
request received
read blocked
read new data
resposnse sent with new data
C
request received
U lock acquired
read
write
other processing
5.4.1.1. Deadlock Problem
Step
Transaction A
Transaction B
1
SELECT → acquires S lock
2
SELECT → acquires S lock
3
UPDATE → tries to acquire X, waits
4
UPDATE → tries to acquire X, waits
5
❌ waiting for B to release S
❌ waiting for A to release S
6
💥 deadlock
💥 deadlock
5.4.2. ORMs
Question
Answer
What is a session?
A session represents an unit of work and manages the persistence context and communication with the DB
What is the difference between an ORM session and a DB transaction?
ORM session represents an application-level UoW while DB transaction represents a DB-level UoW + a session may span multiple transactions + a transaction may be managed within a session
What is persistence context?
A cache that tracks entities and their changes during a session, ensuring consistency between in-memory objects and the DB
5.5. Networking Model
There are two main models that are used in the industry today:
Open Systems Intercommunication (OSI) model
Abstract: Typically used to discuss concepts
TCP/IP model
Concrete: This is what is used in the internet today
Can span multiple connections, until either peer terminates the session
Signaling: Session Management Signaling is the process of setting up, managing, and tearing down a communication session before real-time data flows. Signaling encompasses multiple processes:
Session Setup
Codec Negotiation
Process where two peers agree on a common codec for audio/video during signaling
NAT Traversal
Techniques + Protocols that allow devices behind NAT to communicate directly
There are three main techniques
Session Traversal Utilities for NAT (STUN)
Device asks STUN server "What's my public IP:port?"
Device shares info with other peer (P2P)
Works only if NAT keeps mappings stable
Traversal Using Relays around NAT (TURN)
Both devices send media to a TURN server
Used as fallback if direct P2P fails
Higher latency + server bandwith cost
Interactive Connectivity Establishment (ICE)
Gathers candidates
Private IP:port
Public IP:port from STUN
Relay addresses from TURN
Tries all possible paths
Picks the fastest, lowest-latency route
Encryption keys exchange
Exchange session metadata
5.7. Web Identifiers
Term
Definition
E.g.
TCP connection
Source IP : Source Port -> Destination IP : Destination Port
192.168.1.10 : 52341 → 34.120.10.5 : 443
Socket
OS-managed object that includes TCP connection + send/receive buffer