Software Development Engineering Cheat Sheet

Authors
Published on
43 min read
Software Development Engineering Cheat Sheet

Table of Contents

Cheat sheets for software development and engineering. If software were dishes, developers are the chefs who prepare them and engineers are the architects who design the kitchen.

1. Engineering

How to design software that is robust, scalable, and efficient.

1.1. Data Structures & Algorithms

How to solve problems with code.

1.1.1. Methods to Reinterpret Problems

  • Create formula and see if shifting variables around can simplify solution

1.1.2. Modulo

ApplicationModulo byExample
Get n trailing digits10^n1234 % 100 = 34
Check even/odd2isEven = x % 2 == 0
Get value of bit after addition2(1 + 1) % 2 = 0
(0 + 1) % 2 = 1
(0 + 0) % 2 = 0
Check divisible by nnisXDivisibleByN = x % n == 0

1.1.3. Floor Division

ApplicationDenominatorExample
Remove n trailing digits10^n12345 // 100 = 123
Get carry over bit after addition2(1 + 1) // 2 = 1
(0 + 1) // 2 = 0
(0 + 0) // 2 = 0
Get midpoint of any array ([0,1,2] [0,1,2,3])2midpoint = len(arr) // 2

1.1.4. Binary Trees

Sizes

  • no. of nodes: nn
  • height of tree: logxnlog_x n,
    • where xx is for a xx-ary tree
  • width of tree: 2x2^x
    • where xx is the level of the tree for which you want the width

How to navigate a Tree

There are two methods of navigating a tree: Depth-First Search (DFS) and Breadth-First Search (BFS)

DFS

There are three ways to perform traversal:

  1. In-Order Traversal (IOT) -> left, node, right
  2. Pre-Order Traversal (PreOT) -> node, left, right
  3. Post-Order Traversal (PostOT) -> left, right, node

There are two ways to implement DFS:

'''
1. Recursively
    - Adv.: Clean and intuitive
    - Disadv.: Limited by recursion depth, stack overflow risk
'''

def recursive(root):
    iot(root)
    preOT(root)
    postOT(root)

def iot(node):
    if node is None:
        return

    iot(node.left)
    process(node)
    iot(node.right)

def preOT(node):
    if node is None:
        return

    process(node)
    preOT(node.left)
    preOT(node.right)

def postOT(node):
    if node is None:
        return

    preOT(node.left)
    preOT(node.right)
    process(node)

'''
2. Iteratively
    - Adv.: Robust for large or unbounded inputs
    - Disadv.: Less intuitive and readable
'''

def iot(root):
    if root is None:
        return
        
    stack = []
    node = root

    while stack or node:
         go left as far as possible
        while node:
            stack.append(node)
            node = node.left
        
        node = stack.pop()
        process(node)
        stack.append(node.right)

def preOT(root):
    if root is None:
        return 
    
    stack = [root]  switching this to a queue changes the DFS to BFS
    while stack:
        node = stack.pop()
        
        process(node)

         push right first so left is processed first
        if node.right:
            stack.append(node.right)
        if node.left:
            stack.append(node.left)


def postOT(root):
    if root is None:
        return
    
    stack = []
    lastNode = None
    node = root

    while stack or node:
         go left as far as possible
        if node:
            stack.append(node)
            node = node.left
            continue
        
         at leftmost node, if candidate has right and is not the last visited node, check right subtree
         at 
        candidateNode = stack[-1]
        if candidateNode.right and lastNode != candidateNode.right:
            node = candidateNode.right
            continue

        node = stack.pop()
        process(node)
        lastNode = node
        node = None  do not process node again

BFS

There are two ways to perform traversal:

  1. Flat Traversal (FT)
  2. Level-Order Traversal (LOT)

BFS is primarily done iteratively - it can be implemented recursively but there is no practical benefit.


def ft(root):
    if root is None:
        return
    
    queue = deque([root])

    while queue:
        node = queue.popleft()

        process(node)

        if node.left is not None:
            queue.append(node.left)
        if node.right is not None:
            queue.append(node.right)

def lot(root):
    if root is None:
        return

    queue = deque([root])

    while queue: 
         for LOT, we just need to wrap the flat traversal logic in a for loop with levelSize iterations
        levelSize = len(queue)
        for _ in range(0,levelSize):
             same as flat traversal

Note:

  • You can also add metadata for each node by appending tuples (node, metadata) to the queue instead of just nodes

1.1.5. Array

How many times can I slide a window over an array?

  • Intuition
    • Start from the base case - window size 1
      • How many times can you slide it?
    • Increase window size
  • Formula
    • len(array) - windowSize + 1

1.1.6. Bitwise Operations

OperationApplicationExample
AND &Get carry for binary addition of two numbers1 & 1 = 1
AND &Get last bit10 & 1 = 0, 11 & 1 = 1
XOR ^Get sum without carry for binary addition of two numbers1 ^ 1 = 0
0 ^ 1 = 1
1 ^ 0 = 1
XOR ^Find differences between two bit patterns0110 ^ 1010 = 1100, i.e. different in first two bits
Bit ShiftMultiply/divide by 2x = 2, x << 1 = 4, x >> 1 = 1

1.1.7. Dynamic Programming

  • Caching results for fibonacci-style recurrence

1.1.8. Binomial Theorem

Theory

  • The Binomial Theorem describes how to expand binomial expressions without brute force
    • Binomial Expression:
      • An expression formed from two terms,
      • e.g. (a+b)(a + b)
    • Binomial Theorem Formula:
      • (x+y)n=k=0n(nk)xnkyk(x+y)^n = \sum_{k=0}^{n} \binom{n}{k} x^{n-k} y^{k}
        • where (nk)nCk\binom{n}{k} \equiv {}^{n}C_k is the binomial coefficient a.k.a. combinations

Applications

  • The binomial coefficient can be used to describe symmetric number sequences, e.g. 1 4 6 4 1

1.1.9. Describing Symmetry

  • Linear Symmetry
    • Combinations / Binomial Coefficient
    • Modulus
    • Even Functions
    • Cosine
  • Rotational Symmetry
    • Odd Functions
    • Sine

1.2. System Design

How to design scalable and efficient systems.

1.2.1. Encryption / Decryption with Keys

There are two types of encryption/decryption patterns

Key TypeDescriptionE.g.AdvDisadvUse Case
SymmetricPrivate key is shared, i.e. one key for both encryption and decryptionAESComputationally fasterHard to distributeBulk data ancryption (disks, HTTPS session data, VPNs)
AsymmetricPublic/private key is set up, i.e. two keysRSA, ECDSAEasier to distributeComputationally slowerKey exchange, digital signatures, SSL/TLS handshake, email encryption

Public and private keys are used for two main purposes:

Key Use CasePrivate KeyPublic Key / Shared Private Key
Message Authentication and Integrity (Digital Signatures)Sign messageVerify message came from sender (authentication) + Ensure message wasn't modified in transit (integrity)
Message ConfidentialityDecrypt messageEncrypt message

1.2.2. API Architectural Styles

The three main type of API architectural style are REST APIs, RPC APIs, and GraphQL APIs.

StyleDescriptionUse CaseAdvDisadv
REST, e.g. express.js, Spring Boot, Flask, FastPerform HTTP verbs on resources. Entity based, e.g. POST /usersMost commonUniversally understood + docgen tools e.g. Swagger, OpenAPISlowest - One request for each entity unlike GraphQL + less space efficient than RPC
GraphQL, e.g. ApolloQuery or mutate entities. Entity based, e.g. mutation CreateUser() {...}APIs for FEFaster - One request for multiple entitiesMore setup e.g. defining the schema, resolvers + less standardised docgen e.g. GraphiQL
RPC, e.g. gRPC + ProtobufCall functions remotely. Action based, e.g. await client.createUser()Internal APIsFastest and most space efficient because it uses binary instead of text payloadsOnly for internal use unless public clients uses same set up

1.2.3. Databases

ParadigmExamplesUse CaseAdvDisadv
SQLPostgreSQL, MySQL, MSSQLStructured relationships + strong consistency e.g. financial dataPowerful Querying + ACIDSlower writes due to B-Trees, slower reads/writes due to stronger consistency/locks,
Key-ValueRocksDB, DynamoDB, CassandraHigh-throughput writes, cachingExtremely fast writes + BASESlower writes due to LSMT
DocumentMongoDB, FirestoreSemi-structured JSON-like data, e.g. mobile/web appsFlexible schema + BASESlower writes due to LSMT
ColumnarCassandraTime series data, e.g. analytics, event loggingFast on columnar queries, aggregationsSlower writes due to LSMT
GraphNeo4jSocial graphs, recommendation enginesOptimised for graph traversal and relationship modelingLimited for heavy aggregations
TypeDBComplex knowledge graphs, strongly typed and structured relationshipsSmall eco system

1.2.4. Scaling

TypePrincipleUse CaseAdvDisadv
VerticalUpgrading CPU/RAM/StorageSmall to medium apps, monolithic systems, startupsNo code change + lower latencyLimited by hardware ceilings + expensive at scale + SPOF
HorizontalAdding more serversDistributed systemsFault tolerance via redundancy + Infinite scalabilityNetwork latency + Higher complexity

Types of horizontal scaling:

  1. Database Horizontal Scaling
  2. Compute Horizontal Scaling

Database Horizontal Scaling, i.e. sharding

TypePrincipleUse CaseAdvDisadv
Directory/Lookup-basedShard where data belongs depends on manually maintained directoryFrequently changing shards / manual controlEasy to add / remove shardsDirectory is a SPOF, lookup adds latency
Range-basedShard where data belongs depends on which contiguous key ranges (e.g. A-F, G-L, ...)Time-series data, ordered data, range queriesEfficient for range queries + simple to implementData skew possible, hotspots risk
Hash-basedShard where data belongs depends on hash of keyHigh-write, evenly distributed workloadsGood load balancing, no need to manage rangesRange queries inefficient, rebalancing expensive

Compute Horizontal Scaling

TypePrincipleUse CaseAdvDisadv
Centralised Load Balancing / Orchestrator-based SchedulingRequests are routed based on a load balancer or schedulerWorkloads are heterogeneous, resource usage unpredicatable, fine-grained control over task placementAssign request based on compute needs + Easy to add/remove nodes + Supports complex scheduling policiesOrchestrator / scheduler is SPOF + can be bottleneck
Static PartitioningRequests are routed based on predefined ranges or affinity rules, e.g. ID range, locationTasks are grouped logicallyLow latency as no lookup is neededHotspots + manual rebalancing + difficult to add/remove nodes
Consistent HashingRequests are routed based on hash of request keyStateless workloads, e.g. microservices, serverless, API gatewaysAutomatic load balancing + no load balancingRange based tasks difficult + rebalancing required when nodes are added/removed

1.2.5. CAP Theorem

  • Consistency: All nodes in the system see the same data at the same time

  • Availability: System remains operational even if some nodes fail

  • Partition Tolerance: System remains operational even if network communication with some nodes fail

  • In a distributed system, you can only achieve two out of CAP.

  • P isn't optional because networks are not reliable, so the tradeoff is usually between C and A.

  • C is usually preferred for financial systems

  • A is usually preferred for social media / streaming apps

1.2.6. Authentication

  • There is a trade-off betweeen safety and convenience
  • Best practise to use a pre-built library, but understanding the principles is helpful in system design
  • Authentication: verifying identity
  • Authorisation: checking permissions

1.2.6.1. Authentication

Transporting Passwords

  • Use HTTPS for password submissions
  • Avoid logging raw credentials

1.2.6.2. Authentication Methods

MethodUse Case
Username + Password
Username + Password + 2FA
SSO
Custom-built SSO
Securing Passwords
  • Hashing
    • Passwords should be stored as irreversible cryptographic hashes
  • Salting
    • A random, user-specific unique value (salt) is added to the plain-text password before hashing, which is stored in plaintext in the database
    • Prevents
      • two users with the same passwords from getting the same hash
      • hackers using rainbow tables (precomputed mappings of common passwords -> hashes)
  • Peppering
    • A random, global value (pepper) is added to the plain-text password before hashing, which is stored as an env variable on the server
    • An additional layer of security on top of salting

1.2.6.3. Proof of Authentication a.k.a access tokens

  • After a user is authenticated, a token needs to be stored on the client
  • There are two main types of tokens used: session tokens and JWTs
Session TokenJWTs
StructureRandom opaque string, e.g. b8c9d7f1e6a24f38b1d80b7d849d3e4eStructured base64-encoded JSON object e.g. <header hash>.<payload hash>.<signature hash>
Data accessClient cannot read it, server must retrieve data for clientClient can decode payload easily, e.g. { "email" : "...", "iat": 1665385660, "roles": ["admin"] }
Where data livesIn the backend (server/db/cache) alongside the tokenInside the token
GenerationServer uses cyrpotgraphically secure RNGBuilds JSON payload and signs it
VerificationServer checks that client token string matchesServer verifies signature with public key
RevocationEasy - Delete from backend (server/db/cache)Hard - Blacklist / short expiry
TransportAuthorization header + HttpOnly + Secure + SameSite=StrictAuthorization header + HttpOnly + Secure + SameSite=Strict
Client-side StorageCookiesCookies
Server-side StorageIn the backend (server/db/cache)n.a.
Use CaseMonolithic appDistributed services, OAuth

1.2.6.4. Refresh Tokens

  • Clients can be provided with a refresh token that is used to refresh access tokens
  • Access tokens should be short-lived (minutes)
  • Refresh tokens can be long-lived (hours/days/weeks)
  • Adv
    • Reduced exposure
    • Centralised control if using JWT access tokens and session refresh tokens

1.2.7. Authorisation

Access Control ApproachPrincipleUse Case
Role-Based (RBAC)Users -> Roles -> PermissionsEasiest to implement / reason about
Attribute-Based (ABAC)Permission based on user attributes, e.g. user.department == doc.department and time < 18:00Highly customisable
Relationship-Based (ReBAC)Permissions via graph relations, e.g. editor of project XCollaboration apps
Scope-Based (SBAC)Users -> Scope -> Permissions, e.g. contacts.readOAuth

1.2.8. Where to authenticate and authorise

Authenticate token inAdvDisadvUse Case
AppMost flexible (custom logic, fine grained checks)Adds latency per requestAuthorisation
GatewayOffloads auth early, blocks bad traffic before appLess flexibleBasic checks
Load BalancerCentralisedLimited to basic checks (signature, expiration)Basic checks

1.2.9. Tenancy

A tenant is a customer/organisation space with its own users, data, config

Single-tenantMulti-tenant
DefinitionOne tenant per isolated stackMultiple tenants per stack
IsolationStrongWeak
Per-tenant customisationEasyHarder
OpExHigherLower
ScaleWorse (under-utilised)Better (pooling)
Compliance / Data residencyEasierHarder (needs partitioning)
Onboarding SpeedSlowerFaster

1.2.10. Logging

  • Avoid auto logging POST bodies and GET parameters
    • If the auto logging runs on auth endpoints, passwords could be written in plaintext to logs

1.2.11. Sandboxing

1.2.12. Encoding

Encoding is used to serialise user facing data (text/image/audio/video) for storage / transport over the network.

TypeDescriptionUse CaseE.g.
Base3232-character set encoding (A-Z, 2-7)QR codes, OTP secretsJBSWY3DPEBLW64TMMQ======
Base64Represents binary data in ASCIIImages, API keys, JWT segmentsSGVsbG8gd29ybGQ=
Base85Represents binary data in ASCIIPDF<~87cURD_*#TDfTZ)+T~>
URLMakes data safe for URLsURLs%20 -> spaces
HexRepresents binary as hex strings0x12ab
ASCII / UTF-8Maps chars as numeric codesText65 -> "A"
Unicode (UTF-16, UTF-32)Maps characters to numeric codesText (International)U+4F60 -> "你"

1.3. Concrete Knowledge

  • BFF: Backend for Frontend
    • GET /dashboard instead of GET /users + GET /orders + GET /recommendations

1.3.1. JavaScript

Engines

  • V8 (Chrome)
  • SpiderMonkey (Firefox)
  • JavaScriptCore (Safari)
  • Hermes (React Native)

Runtimes

  • Node
    • V8 engine
    • Adv.
      • Mature ecosystem
      • Safest bet
    • Disadv.
      • Slower
      • Security via containers/OS policies
  • Deno
    • V8 engine
    • Adv.
      • Like node but faster
    • Disadv.
      • Mostly compatible with node modules
      • Security via containers/OS policies
  • Bun
    • JavaScriptCore engine
    • Adv.
      • Sandboxed
    • Disadv.
      • Least compatibility with node modules

1.3.2. CPU Optimisations

  • Branch Prediction
  • Variable reassignment
  • CPU Pipelining
  • CPU Preloading
  • CPU Prefetching
  • Cache Locality
  • Memory Access Patterns

1.3.3. Language Optimisations

  • Peephole Optimisations
  • Inline
  • Unroll

1.3.4. Operating Systems

  • Stack Size
    • Linux: 8MB
    • macOS: 8MB
    • Windows: 1MB

1.3.5. Recursion Depth Limits

  • C++: 100,000
    • Depends on frame size + OS stack size
  • Dart: 10,000
    • Set by default
  • JS: 10,000
    • (V8 engine/chrome)
    • Depends on
  • Java: 1,000
    • Depends on frame size + OS stack size
  • Python: 1,000
    • Set by default

1.3.6. Typical Cloud Infrastructure

LayerComponentDescriptionUse CaseE.g.
EdgeDNSResolves domain nameAWS Route53, GCP DNS
CDNCaches static content for low-latencyAWS CloudFront
WAF / DDos ProtectionProtect from malicious actsAWS WAF
Application Load BalancerDistributes traffic to apps using HTTP infoAWS ALB
Network Load BalancerDistributes traffic to apps using TCP/UDP infoAWS NLB
Gateway Load BalancerDistributes traffic to third party security/network applicances using TCP/UDP infoAWS GWLB
Global Load BalancerDistributes traffic geographicallyAWS ELB
GatewayGatewayRouting to different services, securityAWS API Gateway
Protocol Translation (HTTP to gRPC, REST to GraphQL)
Aggregation (Compose multiple backend calls into one)
AppApp ServersSteady high throughput, long-lived connections, heavy local state, custom networking, predictable workloads, higher memory/CPU/GPU, strict latency floorsAWS ECS, AWS EKS
ServerlessSpiky demand, low ops overhead, pay-per-useAWS Lambdas
Data ProxyDatabase ProxyManages a pool of persistent connections to the DB>10k client connectionsAWS RDS Proxy
DataRelational DBsAWS RDS, AWS Aurora (Serverless RDS)
Document DBsDynamoDB
KV DBs / CachesAWS ElastiCache (Redis)
Object StorageAWS S3
DSQLDistributed SQL Query EngineQuery large-scale data across object storage/ data lake with SQLAWS Athena
Data LakeCentralised storage for raw dataanalytics, ML workloads, batch processingAWS Lake Formation, Iceberg on S3
NetworkingVPCIsolated virtual network for cloud resourcesDefine public/private subnets, control routing, isolation, multi-tier deploymentsAWS VPC
SubnetsSegments inside a VPCControls traffic flow and exposure of resources (e.g. public ALB, private DB)AWS Subnets
Security GroupsVirtual firewalls attached to resourcesControl traffic at instance/service levelAWS Security Groups
ObservabilityLoggingCollect, aggregate and index logs from all servicesAWS CloudWatch
Monitoring / MetricsMonitor resource usage, uptime, etc.AWS CloudWatch
TracingTraces request flow across different servicesAWS X-Ray
DevOpsCI/CDAWS CodeBuild, GitHub Actions

1.3.7. Networking Model

There are two main models that are used in the industry today:

  1. Open Systems Intercommunication (OSI) model
    1. Abstract: Typically used to discuss concepts
  2. TCP/IP model
    1. Concrete: This is what is used in the internet today
OSI LayerNamePurposeTCP/IP LayerData UnitExamples
7ApplicationUser AppsApplicationDataZoom, WhatsApp, Teams
App ProtocolsHTTP, WebSockets, WebRTC, SIP, DNS, WebRTC API, WebRTC Signaling, DNS, gRPC, RTP/SRTP
6PresentationData formattingJSON, XML Protobuf
6PresentationEncoding & CompressionJPEG, MP3, H.264, gzip
6PresentationEncryptionTLS, DTLS, SSL, SRTP,
5SessionManage session lifecycleNetBIOS, RPC, WebRTC session setup
4TransportReliable/unreliable delivery, multiplexing, manage connectionsTransportSegment (TCP) / Datagram (UDP)TCP, UDP, QUIC
3NetworkRouting, addressingInternetPacketIP, ICMP, BGP
2Data LinkFraming, error detectionLinkFrameEthernet, Wi-FI MAC, PPP, 5G NR
1PhysicalRaw bits over a mediumBitsFiber, RF, copper, modulation

1.3.8. Sessions and connections

Definition

ConnectionSession
LayerTransportApplication
DefinitionA channel between two peersA context between two peers
LifespanExists only while data flows on the transportCan span multiple connections, until either peer terminates the session

Signaling: Session Management Signaling is the process of setting up, managing, and tearing down a communication session before real-time data flows. Signaling encompasses multiple processes:

  • Session Setup
  • Codec Negotiation
  • Process where two peers agree on a common codec for audio/video during signaling
  • NAT Traversal
  • Techniques + Protocols that allow devices behind NAT to communicate directly
  • There are three main techniques
    1. Session Traversal Utilities for NAT (STUN)
      • Device asks STUN server "What's my public IP:port?"
      • Device shares info with other peer (P2P)
      • Works only if NAT keeps mappings stable
    2. Traversal Using Relays around NAT (TURN)
      • Both devices send media to a TURN server
      • Used as fallback if direct P2P fails
      • Higher latency + server bandwith cost
    3. Interactive Connectivity Establishment (ICE)
      • Gathers candidates
        • Private IP:port
        • Public IP:port from STUN
        • Relay addresses from TURN
      • Tries all possible paths
      • Picks the fastest, lowest-latency route
  • Encryption keys exchange
  • Exchange session metadata

1.3.9. HTTP

There are 3 main versions of HTTP being used

VersionDescriptionAdvDisadvUse Case
1.1Most widely supportedSimple, easy to debug, universally compatibleOne request per connection -> head-of-line blocking -> higher latency, more open connections = higher infra costLegacy, IoT
2Multiplexed streams over one TCP connectionBig improvements in latency and throughput over HTTP/1, fewer connections per client, required for gRPCHead-of-line blocking if packet loss occurs, more complex load balancinggRPC
3Runs over QUIC (UDP)Lowest latencyLess mature, harder debugging, firewalls may block UDPMobile / unstable networks
  • Modern clients auto-negotiate best protocol via Application Layer Protocol Negotiation (ALPN)
    • client says “I support h2, http/1.1, h3”, server picks one

1.3.10. Transmission Control Protocol (TCP)

  • Lossless

1.3.11. User Datagram Protocol (UDP)

  • Lossy

1.3.12. Quick UDP Internet Connections (QUIC)

  • UDP at Transport Layer + Reliability at App Layer

1.3.13. Which transport protocol

1.3.14. WebRTC

  • Frameworks

    • Web Real-Time Connection (WebRTC)
      • Open source framework for P2P RTC
      • Components
        • Signaling
        • Media Capture
        • Media Transport
        • Encryption
        • NAT Traversal
        • Adaptive Quality
        • Data Channels
  • Signaling Protocols

    • Session Initiation Protocol (SIP)
      • Set up, modify, tear down real-time sessions for voice/video/messaging
  • Monitoring Protocols

    • Real-time Transport Control Protocol (RTCP)
      • Measures network performance metrics for RTP
  • Security Protocols

    • Transport Layer Security (TLS)
      • Secures TCP
    • Datagram Transport Layer Security (DTLS)
      • Secures UDP
      • i.e. TLS for UDP
  • Transport Protocols

    • Real-time Transport Protocol (RTP)
      • Transports real-time media (audio/video)
      • Rides on UDP, sometimes TCP
    • Secure Real-time Transport Protocol (SRTP)
      • Encrypted RTP
      • Uses DTLS for key exchange
    • RTCP
  • Network Address Translation (NAT)

    • NAT Devices
      • Home Routers
      • Corporate Firewalls
    • Vanilla NAT
      • 1:1 mapping between private IPs to public IPs (e.g. 192.168.0.1 (private) : 203.0.113.1 (public))
      • Provides control over private IP ranges
      • Single source of truth for configuring public/private IP mappings (e.g. ISP changes IP allocations)
    • Port Address Translation (PAT) a.k.a NAT Overload
      • 1:many mapping between private IPs to public IPs by using ports as well
        • e.g.
          • 192.168.0.10:52301 -> 203.0.113.7:40001
          • 192.168.0.11:52301 -> 203.0.113.7:40002
      • Workaround to IPv4's small address space, not needed in IPv6 where 1:1 mappings are encouraged
  • Firewall

    • Decides which packets are allowed/blocked
    • Lives between private network and public internet
    • Typically blocks incoming connections, not outgoing
    • Corporates typically block UDP entirely because the lack of handshakes make it hard for firewalls to understand the session state If asked: “How would you design WhatsApp voice calls?” • Signaling: WebSockets (or SIP for enterprise VoIP). • Transport: RTP/SRTP for media. • NAT traversal: STUN + TURN fallback. • Encryption: SRTP end-to-end. • QoS handling: Adaptive bitrate, jitter buffer.

If asked: “How does WebRTC work?” • WebRTC = framework, uses: • Signaling (custom, often WebSocket) • RTP/SRTP for audio/video streams • STUN/TURN for NAT traversal • DTLS/SRTP for security • Adaptive bitrate + codec negotiation.

If asked: “How does VoLTE differ from WhatsApp?” • VoLTE → Managed SIP + RTP inside carrier network, guaranteed QoS, low jitter. • WhatsApp → WebRTC over the public Internet, no QoS guarantees.

1.3.15. Performance Metrics

MetricDescriptionLayerUnitsE.g.
BitrateRate at which app encodes and sends dataApplicationbits/sVoice: 10 kbps 2G, 64kbps 3G, 64 kbps LTE, 12-64 kbps VoLTE, 128 kbps Vo5G
Video: 1 Mbps (360p), 2 Mbps (720p), 5 Mbps (1080p), 15 Mbps (4K)
ThroughputRate at which data is sent over the networkNetworkbits/sZoom bitrate 2Mbps, network throughput only 1.5Mbps due to packet loss
Available BandwitdhRate at which a network link can support data transferNetworkbits/sWi-Fi: 5Mbps
Latency / Round Trip Time (RTT)Time taken for packet to go to peer and backms<150ms before humans detect delay
Packet Loss% of dropped packets between nodes in one direction%<1% before choppy/freezing videoaudio
JitterVariability in packet arrival time in one directionms<30ms before video stutters \ audio cracks

1.3.16. Adaptive Performance Strategies

StrategyDescriptionLayerUse Cases
Jitter BufferTemporary storage in receiver's app that smooths out variations in packet arrival times before playbackApplicationJitter
BitrateBitrate Reduction + ...
Bitrate ReductionReducing the encoding and sending of dataApplicationPacket Loss

1.3.17. Network Protocols

Application Layer

Signaling Layer

  • Voice over Public Switched Telephone Network (PSTN)
    • Dedicated E2E path between landlines/mobile phones using circuit switchers
    • Transmits uncompressed voice using Pulse Code Modulation (PCM) at 64 kbps per call
    • Used in landlines and mobile phones when on connections of < 4G
    • >4G and above
    • Carrier provides QoS guarantees
  • Voice over IP
    • Transmits voice using IP
    • No QoS guarantees, call quality depends on network connection
  • Video over IP
    • Transmits video using IP

1.3.18. Wireless Systems

  • Application
  • Transport / IP
  • Radio Resource Control (RRC): Manages radio resources and connection states between base station and user device
    • Types of radio resources:
      • Time
      • Frequency
      • Power
      • Modulation & Coding
      • Bearer
      • Control
      • Random access
      • Beamforming
    • Types of connection states:
      • RRC_IDLE
      • RRC_INACTIVE (5G)
      • RRC_CONNECTED
  • PDCP
  • RLC
  • Medium Access Control (MAC) Layer: Decides who gets to transmit, when, and how much bandwidth
  • Physical (PHY) Layer: Deals with actual signal transmission over radio waves (modulation, power levels etc.)

1.3.19. Telco 101:

Rendering 3D models to 2D assets
Rendering 3D models to 2D assets
  • Cell Tower
    • Software Components i.e. Base Station Software Stack
      • Radio Access Network (RAN) Software
        • Handles communication between mobile devices and cell tower, e.g.
          • Handover Control: Deciding when phone switches from one tower to another
          • Radio Resource Control (RRC): managing spectrum and assigning frequencies to devices
          • MAC & PHY Scheduling: Deciding which user gets how much bandwidth every millisecond
          • Security & Authentication: Encrypting radio traffic before it hits the core
          • Quality of Service: Prioritising latency-sensitive traffic like voice and video
      • Cell Tower OS
        • Manages hardware scheduling, memory and task prioritisation
      • Management Software
        • For engineers to monitor and configure the cell tower
    • Hardware Components
      • Antennas: Send/receive radio signals
      • Remote Radio Unit (RRU): Converts radio waves to/from digital data
      • Baseband Unit (BBU): Runs the base station software stack
        • In 5G, BBUs are
          • centralised in regional data centers
          • serve dozens of towers
          • do not exist on the cell tower
      • Backhaul: Connection to core network via
        • Fiber (Most common)
        • Microwave (rural areas)
        • Satellite (remote locations)

1.3.20. Scheduler

Scheduler

  • Does
    • Assign task to node
  • Does not
    • Start or manage the workload

Orchestrator

  • Does
    • Scheduler
    • Provisioning and starting workloads on nodes
    • Scaling workloads up/down based on demand
    • Health monitoring and self-healing
    • Rolling updates and rollback management
    • Managing networking, storage and service discovery

1.3.21. Firewalls

TypeE.g.LayerFound InChecksUse Case
Web-Application (WAF)AWS WAFApplicationCDNs, gateways, load balancerExamines HTTP payload for attack detectionWeb app / API protection against SQLi, XSS, bots, malicious patterns
ProxyNginx reverse proxyApplicationProxy servers, gatewaysExamines payload for access control and anonimisation
Packet Filteringiptables (basic rules)Network & TransportRoutersExamines packets based on source/destination IP, port, protocolSimple allow/deny rules, port blocking
Host-BasedWindows Firewall, iptablesNetwork & TransportIndividual servers / VMsExamines traffic per hostProtects single servers, last line of defense

1.3.22. Optimising for reads/writes

Read Optimisation Strategy
CDN caching

The disadvantages in general are:

  1. Higher storage
  2. Stale data
  3. Additional complexity with invalidation strategy
Write Optimisation Strategy

The disadvantages in general are:

  1. More complex read paths
  2. Additional complexity with background preprocessors
Balanced Approach
CQRS + messaging
per-endpoint SLAs with targeted caching
tiered storage (hot cache -> primary DB -> datalake)

2. Development

Software development revolves around turning ideas into working software. Developers are the chefs who prepare the dishes, organise the kitchen with the goal of getting the best tasting food to customers with the least amount of wastage with time and ingredients.

2.1. Delivery

Delivery is about delivering value to users with minimal waste using business processes.

2.1.1. Tickets

Tickets are the backbone of software delivery. They help track work, manage priorities, and ensure that the team is aligned on what needs to be done. This is similar to how chefs use order tickets in a restaurant to manage customer orders.

2.1.2. Typical stages a ticket flows through

Tickets go through multiple stages, just like how an order ticket in a restaurant go through multiple stages, e.g. waiter takes the order from the customer, sends it to the kitchen, chefs prepare the different components of the dish, head chef does the final check, waiter brings the dish to the customer.

  1. Epic Refinement
  • Functional Design
    • BPMN Diagrams
    • High Level User Stories
    • e.g. "I want spagbol"
  • Technical Design
    • High-level answer to the question "What do we need to do?"
      • e.g. "We need to buy tomatoes, mince, etc."
    • Avoid diving too deep into "How do we need to do it?"
      • e.g. "We need to cook the tomatoes for x mins"
  1. Ticket Refinement
  • Business Refinement
    • Validation Steps
      • e.g. "There should be mirepoix, arrabiata, browned mince, cooked spaghetti..."
  • Technical Refinement
    • Tech Steps
    • e.g. "We need to chop the tomatoes, celery, onions into squares, cook them for x mins"
  1. Delivery
  • Development
    • Do the tech steps
    • e.g. Chefs carrying out the recipe steps
  • Code Review
  • Functional Review
    • e.g. Head chef checking the food
  • Validation
    • e.g. Waiter asks the user "How's the food?"

2.1.3. Poke Yoke

  1. What was the root cause of the issue?
  2. How could we have detected this issue earlier?
  3. How can we prevent this issue from happening again?

2.2. Maintainability

How to deliver value to users with minimal waste using code.

  • Single Layer of Abstraction Principle (SLAP)
  • Dependency Injection
  • Clean Conditionals
  • Conventional Commits
  • Early Returns / Continues
  • Prefer for loops over while

2.3. Testing

  • E2E
    • Main user stories, happy paths
  • Integration
    • Edge cases not caught by E2E
  • Unit
    • Small functions

2.4. Concrete Knowledge

2.4.1. Types of Development

  • Web
    • Frontend
    • Backend
  • Mobile
  • Game
  • Desktop
  • Embedded
  • DevOps
  • Data
  • ML / AI
  • Security

2.4.2. Choosing a language for mobile app development

2.4.3. Choosing a language for frontend web development

LanguageUse CaseAdv.Disadv.
JSDefaultNatively supported - browsers come with JS engineSingle-threaded by default
Dart (compiled to JS)Cross-platformNo UI interactivity
C/C++/Rust (through WASM)3D graphics, gaming, video editing (e.g. Figma, Canva, AutoCAD Web)High performanceNo UI interactivity
Python (through WASM)AI/ML in the browserHigh performance, mature AI/ML ecosystem libraryNo UI interactivity
C (through Blazor WASM)Existing .NET implementationUI interactivityYoung ecosystem, large initial payload (downloads 6MB .NET runtime)

JS is the default choice as it is the only language that has direct access to the DOM to render UI.

2.4.4. Choosing a language / framework for backend web development

The choice of language for backend web development is tightly coupled to the language's runtime, libraries and frameworks as they provide key tradeoffs.

LanguageUse CaseAdv.Disadv.
JavascriptReal-time apps, typically preferred over php these daysMature ecosystem, same language for FE and BE, great for concurrency (<10k users)Not typed
PHPWordpress, CMS, e-commerceHuge CMS ecosystem, powers wordpressProcess-per-request model limits real-time apps without extra tooling, js is typically preferred
PythonML / AIHuge AI/ML ecosystem
JavaEnterprise, financeStrict typing, battle testedHeavier setup
CEnterprise with Microsoft eco-systemGreat integrations with Microsoft / AzureTied to Microsoft eco-system
GoMicroservices, cloud-native, high-concurrency APIsExtremely fast, great concurrency with goroutinesLess suited for CMS, e-commerce
RustHigh-performance APIs
RubyReplaced by JS-Declining in popularity due to memory usage, scaling, and struggling with concurrency

2.4.5. Choosing an Infrastructure as Code (IaC) framework for cloud

FrameworkDescriptionUse CaseAdv.Disadv.
AWS
SST (Serverless Stack)Third party abstraction on top of CDKSmall projectsUltra-fast local lambdas with hot reload, DevXLess flexible than CDK, third party solution, risky with breaking changes
CDKAWS high-level code-first framework built on CloudFormationBest all round-choice for AWSCommon programming languages supportedSteep learning curve, no local emulators for lambdas and API gateways
SAM (Serverless Application Model)AWS high-level serverless-first legacy framework built on CloudFormationPrefer CDKDevX with emulators for local lambdas/API gatewaysYAML config, serverless projects only
CloudFormationAWS low-level frameworkLow-level controlAccess to L1 constructs for high customisabilityJSON/YAML config, verbose
Azure
Bicep
ARM TemplatesJSON Config
GCP
Deployment ManagerYAML
Multi-vendor
Terraform
Pulumi
Serverless FrameworkLegacy vendor agnostic frameworkDo not use, it is deadSupports AWS, Azure, GCPYAML config, mocking AWS locally required

2.4.6. Choosing a library for local dev of cloud resources

AWS

Library / ToolDescriptionUse CaseAdv.Disadv.
LocalStackFull AWS service emulator in DockerBest library to start with before using other libraries for specific functionalityBroad AWS coverage, runs in one containerSlower than service-specific emulators, partial coverage of some services
MinIOS3 compatible object storeLocal S3FastS3 only, some S3 features differ
ElasticMQSQS emulatorLocal SQSFastSQS Only
DynamoDB LocalDynamoDB emulatorLocal KVFastDynamoDB only
SAM CLILambdas / API Gateway emulatorLocal lambdas / API GatewayFastServerless services only
SSTLambda emulator with hot reloadExtremely fast local lambda devExtremely fastNeed to use SST

2.4.7. Browser Storage

Storage TypeDescriptionSet byAccess viaLifetimeAccess scopeCapacityUse CasesSecurity Notes
CookiesKV pairsResponses (Set-Cookie) + JS (document.cookie)Requests (auto-sent) + JSConfigurable to clear after session / expiry datetimeBrowser + domain4KB each, 50 per domainAuth, prefsUse HttpOnly, Secure, SameSite flags
Session StorageKV pairsJSJSCleared on tab closeTab / Session5MBTemporary UI state, multi-tab separationAccessible to JS -> XSS risk
Local StorageKV pairsJSJSPersistent until clearedBrowser + Origin10MBApp state, non-sensitive prefsAccessible to JS -> XSS risk
Extension Storage???JS (Extensions only)JS (Extensions only)Persistent until clearedExtension5MB (sync), 10MB (local)Extension settings, sync across devices
IndexedDBNoSQL DBJSJSPersistent until clearedBrowser + OriginxGB, depending on disk spacePWAs, offline apps, large structured dataOrigin-scoped, but XSS risk

2.4.8. Request/Response Flags

FlagPurposeUse Case
HttpOnlyPrevents JS from reading cookiesProtect tokens from XSS
SecureCookie only sent over HTTPSProtect plaintext cookies from being leaked
SameSiteControls if cookies are sent on cross-site requests (Strict/Lax/none)CSRF protection / cross-site marketing
Cache-ControlControls caching of resposne data (no-store, max-age etc.)Ensure sensitive data isn't cached
CORS headersControl which domains can make cross-origin requestsAPIs that need controlled access

2.4.9. Response Codes

CodeMeaningWhen to useBenefit of using
InformationalRequest received, continuing processRare in practice, mostly for protocol-level interactions
100ContinueClient should continue sending request body (after headers OK)Saves bandwidth if request is rejected early
101Switching ProtocolsUsed for HTTP to WebSocket upgrade or HTTP/1 to HTTP/2 switchNecessary to start persistent connections
SuccessRequest succeeded
200OKStandard response for successful request (e.g. GET, POST when no resource creation)
201CreatedNew resource created successfully (e.g. POST /users)
202AcceptedRequest accepted for async processing but is not done yet
204No ContentSuccess, but no response body (e.g. DELETE)
RedirectionFurther action needed
301Moved PermanentlyResource permanently movedTells crawlers to update their search index, better SEO
302Found (Moved Temporarily)Temporary redirect (historically used like 303)
303See OtherRedirect after POST -> GET (common for web forms), e.g. ???
304Not ModifiedUsed with cachingClient can use cached response, lowers latency and bandwidth does not need to wait for body to arrive
Client ErrorProblem with request
400Bad RequestMalformed syntax, invalid patterns
401UnauthorizedMissing/invalid authentication
403ForbiddenAuthenticated but not authorised
404Not FoundResource doesn't exist, or if you don't want malicious actors to know your API endpoints if they are not authenticated/authorisedSecurity through obscurity + clear feedback
409ConflictResource conflict (e.g. duplicate unique field)
429Too Many RequestsRate limiting / throttling
Server ErrorProblem on server side
500Internal Server ErrorGeneric server crash/error
502Bad GatewayUpstream server error (e.g. reverse proxy can't reach backend)
503Service UnavailableServer overloaded, down for maintenance
504Gateway TimeoutUpstream service didn't respond in time

2.4.10. Web Identifiers

TermDefinitionE.g.
DomainRegistrable name of a website / portion of hostexample.com
HostNetwork address (domain name / IP) in a requestexample.com, shop.example.com
Scheme???http://, ws://
Port???443
OriginScheme + Host + Porthttps://example.com:443
Fragment???#reviews
Uniform Resource Name (URN)Name of a resource, not how to locate iturn:isbn:0451450523 (book ISBN), urn:uuid:6fa459ea-ee8a-3ca4-894e-db77e160355e (UUID)
Uniform Resource Locator (URL)How to locate a resourcehttps://shop.example.com:443/products?id=10#reviews
Uniform Resource Identifier (URI)URL / URN-

2.4.11. React

  • Avoid useEffect if there are no external deps (source)

2.4.12. Database Terminology

  • Statement
    • A single command
    • e.g. SELECT, UPDATE, FROM, WHERE
  • Read / Query / Data Query Language (DQL)
    • A complete set of statements
    • Ends with a semicolon
    • e.g. SELECT * FROM fooTable;
  • Write / Update / Data Modification Language (DML)
    • A complete set of statements
    • Ends with a semicolon
    • e.g. UPDATE fooTable SET colName = x;
  • Read Result Set
    • Data returned from a query
  • Update Acknowledgement
    • Confirmation returned from a query
    • e.g. x rows inserted
  • Transaction
    • A group of queries executed as a single unit
    • e.g. BEGIN / START TRANSACTION -> COMMIT / ROLLBACK
  • Session
    • A client's connection to the DB
  • Database Object
    • Anything defined in a DB
    • e.g. Tables, Views, Indices, Stored Procedures, Triggers, Functions
  • Schema
    • Logical grouping of DB objects
  • Execution Plan
    • The strategy the DB optimiser chooses to run your query
    • e.g. index scan vs full scan, hash join

2.4.13. Database Data Persistence

Data in Tables (Persistent)

  1. Base/Regular Table
    • Data stored in disk
    • Data is persistent across sessions
  2. Temporary Table
    • Data stored in disk
    • Data exists only in session
    • Data can exist across sessions if cached

Data in Queries (In Memory)

  1. Result Set
    • Data stored in memory
    • Data exists onl
  2. Derived / Subquery e.g. FROM
    • Data stored in memory
    • Data exists only in query
  3. Common Table Expression (CTEs) e.g. WITH
    • Same as subquery, but provides syntactic alias for reusing subqueries

Named Queries

  1. View/Virtual
    • Query definition stored in disk
    • Data only stored
  2. Materialised View
    • Data stored in disk
    • Manual/scheduled refresh
  3. Stored Procedure
    • Data stored in disk
    • ???

2.4.14. Database Isolation Levels

| Isolation Level | Dirty Reads |

2.4.15. Testing Frameworks

Frontend

  • Web
    • Playwright (purpose built from the ground up)
    • Cypress (multiple packages patched together)
  • Cross Platform
    • integration_test (flutter)
  • Mobile
    • Maestro (js)
      • Supports OS level interaction, e.g. going to system settings

2.4.16. Cold/Warm/Hot Starts on Mobile

  1. Cold Start
    • binary not in memory
    • e.g. launching app after killing it
  2. Warm Start
    • binary in memory, app process in background
    • e.g. when switching between apps
  3. Hot Start
    • binary in memory, app process in foreground
    • e.g. when locking and unlocking the screen momentarily, or switching between apps briefly
      • This occurs because the Android and iOS give apps a grace period (~2s) before backgrounding
    • App still has GPU and CPU priority

2.4.17. Splash Screen

Splash screens are only shown for cold start

PhaseNative iOSNative AndroidReact NativeFlutter
Process StartupOS launches app processSameSameSame
Show OS-level SplashLaunch splashSameSameSame
Runtime Init + Framework BoostrapInitializes iOS runtime + UIKit, sets up main run loop, prepares initial UIViewControllerInit Android Runtime + base Activity, inflates first layoutNative layer starts JS engine, loads JS bundle, sets up React tree & JS x native bridgeNative layer starts Flutter engine, loads Dart VM, initializes widget tree & Skia renderer
App InitSet up SDKs, DB, config etc.SameSameSame
Remove SplashOS removes splash once first UIViewController is readyOS removes splash once Activity content is readyNative splash removed after JS bundle + RN root view are mountedNative splash removed after Flutter engine renders first frame
First Frame RenderedFirst frame is renderedSameSameSame

3. Resources

4. TODO

  • Designing Data–Intensive Applications: Big Ideas Behind Reliable, Scalable, and Maintainable Systems
  • Cheatsheet
  • 2's complement
  • Hypergeometric/Binomial: Given 100 faulty in sample size of 1000, what is the probability of getting
    • Prefer Playwright (purpose built from the ground up) over Cypress (multiple packages patched together)
  • Cross Platform
    • Flutter
      • integration_test