Loading...
Loading...
Distributed System: Multiple computers jo ek network pe communicate karte hain aur ek coherent system ki tarah dikhte hain users ko.
Challenges:
Consistency (C): Har read latest write ya error return kare Availability (A): Har request response mile (error nahi) Partition Tolerance (P): Network partition hone pe bhi system kaam kare
In practice, P mandatory hai — network partitions real mein hote hain. So choice is C vs A.
| System | Choice | Example | |--------|--------|---------| | Traditional RDBMS | CA | MySQL, PostgreSQL | | MongoDB, HBase | CP | Strong consistency | | DynamoDB, Cassandra | AP | Eventual consistency | | Zookeeper | CP | Leader election |
States: Leader | Follower | Candidate
1. Leader Election:
- Timeout → Follower becomes Candidate
- Candidate requests votes from all
- Majority votes → becomes Leader
- Leader sends heartbeats
2. Log Replication:
- Client → Leader
- Leader appends to log, sends AppendEntries to followers
- Majority acknowledge → commit
- Leader notifies followers to commit
Terms: monotonically increasing, help detect stale leaders
# Each node maintains a vector
node_A = [0, 0, 0] # [A, B, C]
node_B = [0, 0, 0]
# A sends event: increment own, attach to message
node_A[0] += 1 # A: [1,0,0]
send_to_B(msg, clock=[1,0,0])
# B receives: merge clocks
node_B = [max(1,0), max(0,0)+1, max(0,0)] # B: [1,1,0]
Problem: When node added/removed, rehash ke liye sab keys move karni padengi
Solution: Ring of 0–2^32 positions
- Nodes placed on ring (hash of node ID)
- Key belongs to first node clockwise
- Add/remove node: only adjacent keys move
- Virtual nodes: better load distribution
Used by: Cassandra, Amazon DynamoDB, CDN networks
Leader-Follower (Primary-Replica):
All writes → Leader
Reads → Leader or Follower
Leader fails → failover to follower
Used by: MySQL replication, MongoDB replica set
Leaderless (Quorum):
Write quorum W, Read quorum R
W + R > N (total replicas) → strong consistency
DynamoDB, Cassandra use this
N=3, W=2, R=2: any 2 nodes must agree
Monolith → Microservices
API Gateway (single entry point)
├── Auth Service
├── User Service ──── User DB
├── Order Service ─── Order DB
├── Payment Service ─ Payment DB
└── Notification Service
Communication:
Sync: REST/gRPC (request-response)
Async: Message Queue — Kafka, RabbitMQ
// service definition (.proto)
syntax = "proto3";
service UserService {
rpc GetUser (UserRequest) returns (UserResponse);
rpc ListUsers (ListRequest) returns (stream UserResponse);
}
message UserRequest { int32 user_id = 1; }
message UserResponse {
int32 id = 1;
string name = 2;
string email = 3;
}
# gRPC server (Python)
class UserServicer(user_pb2_grpc.UserServiceServicer):
def GetUser(self, request, context):
user = db.get_user(request.user_id)
return user_pb2.UserResponse(id=user.id, name=user.name)
Architecture:
Producer → [Topic: Partition 0] → Consumer Group A
[Topic: Partition 1] → Consumer Group B
[Topic: Partition 2]
Broker: Kafka server (holds partitions)
Zookeeper/KRaft: metadata, leader election
Key Properties:
- Append-only log (immutable)
- Retention: time-based or size-based
- Offset: consumer tracks position
- Replay: reprocess messages from any offset
# Kafka Producer
from kafka import KafkaProducer
import json
producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda v: json.dumps(v).encode()
)
producer.send('orders', {'order_id': 123, 'amount': 500})
producer.flush()
# Kafka Consumer
from kafka import KafkaConsumer
consumer = KafkaConsumer(
'orders',
bootstrap_servers=['localhost:9092'],
group_id='payment-service',
value_deserializer=lambda m: json.loads(m.decode())
)
for message in consumer:
process_order(message.value)
CLOSED → (failures < threshold) → normal operation
CLOSED → (failures > threshold) → OPEN
OPEN → (timeout elapsed) → HALF-OPEN → test request
HALF-OPEN → success → CLOSED
HALF-OPEN → failure → OPEN
Libraries: Hystrix (Netflix), Resilience4j
Order Saga:
1. Create Order (local commit)
2. Reserve Inventory → success
3. Charge Payment → success → Order Confirmed
Charge Payment → FAIL → Compensate:
- Release inventory
- Cancel order
Choreography: each service publishes events
Orchestration: central coordinator
Q: Two-Phase Commit (2PC) ka problem kya hai? A: Coordinator crash hone pe system blocked reh sakta hai. Ek participant prepare phase mein hang ho jaye — doosre participants wait karte hain indefinitely.
Q: Service Mesh kya hai? A: Infrastructure layer jo service-to-service communication handle karta hai — mTLS, retry logic, circuit breaking, observability. Istio, Linkerd popular choices.
Q: Idempotency kyun important hai distributed systems mein? A: Network retry pe same request dobara execute ho sakta hai. Idempotent operations safe hain — same request N times bhejo, result same. Payment systems mein critical — duplicate charge nahi honi chahiye.
Complete Distributed Systems notes for B.Tech CS Sem 8 — CAP Theorem, Consensus algorithms, MapReduce, Microservices, Fault Tolerance, gRPC, Kafka with interview questions.
46 pages · 2.4 MB · Updated 2026-03-11
Consistency, Availability, Partition Tolerance — ek distributed system teen mein sirf do guarantee kar sakta hai ek time pe. CA (RDBMS), CP (MongoDB, Zookeeper), AP (DynamoDB, Cassandra).
Dono consensus algorithms hain. Paxos — theoretical, complex. Raft — designed for understandability, same guarantees, leader election clear hai. Production mein Raft zyada use hota hai (etcd, CockroachDB).
Independent deploy, scale, develop. Technology heterogeneity. Team autonomy. Ek service fail ho toh doosri affected nahi. Lekin complexity badh jaati hai — network calls, distributed transactions.
System guarantee karta hai ki agar koi nayi update nahi aayi, toh eventually sab replicas same value dikhayenge. Amazon, DNS yahi use karta hai — strong consistency se fast hota hai.
Services ko decouple karne ke liye. Producer-consumer async communication. High throughput, replay capability, backpressure handling. Event-driven architecture ka backbone.
DBMS Complete Notes — B.Tech CS Sem 4
Database Management Systems
Compiler Design — Complete Notes CS Sem 6
Compiler Design
Machine Learning Complete Notes — B.Tech CS Sem 6
Machine Learning
Engineering Mathematics 1 — Calculus, Matrices, Differential Equations
Engineering Mathematics 1
Programming Fundamentals Using C — Complete Notes
Programming Fundamentals (C)
Your feedback helps us improve notes and tutorials.