How web pages work
What is The Internet
The Internet is a global, decentralized network infrastructure that interconnects millions of autonomous systems through a hierarchical routing architecture based on the Internet Protocol Suite (TCP/IP). At its core, the Internet operates as a packet-switched network utilizing statistical multiplexing to optimize bandwidth utilization across heterogeneous physical media including fiber optic cables, copper wires, radio waves, and satellite links.

It is mainly used for accessing resources (and services) such as hypertext documents of the World Wide Web, electronic mail, various files, or data streams.
Technical Architecture
The Internet's architecture follows the OSI (Open Systems Interconnection) model and the simplified TCP/IP stack:
Physical Layer: Encompasses submarine cables (carrying ~95% of intercontinental data), terrestrial fiber networks, last-mile connections (DSL, cable, fiber-to-the-home), and wireless technologies (4G/5G, Wi-Fi 6/6E/7)
Data Link Layer: Handles frame transmission with protocols like Ethernet (IEEE 802.3), implementing CSMA/CD for collision detection and MAC addressing for local network identification
Network Layer: Implements IP (Internet Protocol) with IPv4 (32-bit addresses, ~4.3 billion unique addresses) and IPv6 (128-bit addresses, ~340 undecillion addresses), utilizing routing protocols like BGP (Border Gateway Protocol) for inter-AS routing and OSPF/RIP for intra-AS routing
Transport Layer: Provides end-to-end communication with TCP (reliable, connection-oriented, with congestion control algorithms like CUBIC and BBR) and UDP (unreliable, connectionless, lower latency)
Application Layer: Implements protocols like HTTP/HTTPS, FTP, SMTP, DNS, WebSocket, and emerging protocols like QUIC
Given its high complexity global reach, it doesn't have a single centralized governance. This means that if a certain component (network or computer) becomes unavailable, the Internet as a whole doesn't collapse.
The Internet's decentralized governance model involves multiple stakeholders:
IETF (Internet Engineering Task Force): Develops and maintains protocol standards through RFCs (Request for Comments)
ICANN (Internet Corporation for Assigned Names and Numbers): Manages DNS root zones and IP address allocation
W3C (World Wide Web Consortium): Standardizes web technologies (HTML, CSS, Web APIs)
Regional Internet Registries (RIRs): ARIN, RIPE NCC, APNIC, LACNIC, AFRINIC manage IP address allocation regionally
IEEE: Develops standards for networking hardware and protocols (802.x standards)
World Wide Web (WWW)
The World Wide Web is a distributed hypermedia information system built atop the Internet infrastructure, characterized by three fundamental technologies:
HTML (HyperText Markup Language): Semantic document structure
URI/URL (Uniform Resource Identifier/Locator): Global addressing scheme
HTTP (HyperText Transfer Protocol): Application-layer communication protocol
Technical Specifications
Document Object Model (DOM): Tree-structured representation of HTML documents, manipulable via JavaScript through standardized APIs
REST (Representational State Transfer): Architectural style emphasizing stateless communication, resource-based interactions, and uniform interfaces
Semantic Web: Extension using RDF (Resource Description Framework), OWL (Web Ontology Language), and SPARQL for machine-readable data
Web Assembly (WASM): Binary instruction format enabling near-native performance for web applications
URL Structure and Components
URLs follow RFC 3986 specification with the structure:
scheme://[userinfo@]host[:port][/path][?query][#fragment]
Example breakdown:
https://user:pass@example.com:8080/path/to/resource?key=value#section
│ │ │ │ │ │ └─ Fragment identifier
│ │ │ │ │ └─ Query parameters
│ │ │ │ └─ Path to resource
│ │ │ └─ Port number (optional, defaults: HTTP=80, HTTPS=443)
│ │ └─ Host (domain name or IP address)
│ └─ Authentication credentials (deprecated for security)
└─ Protocol scheme
HTTP Protocol
HTTP (HyperText Transfer Protocol) is a stateless, application-layer protocol operating typically over TCP (and now QUIC for HTTP/3), following a request-response paradigm with extensive header metadata and multiple request methods.
HTTP Evolution
HTTP/0.9 (1991)
Single-line protocol
GET method only
No headers, status codes, or content types
HTTP/1.0 (1996)
Headers introduced
Status codes
Content-Type support
Additional methods (POST, HEAD)
HTTP/1.1 (1997-1999)
Persistent connections (Keep-Alive)
Pipelining (sequential request sending)
Chunked transfer encoding
Cache control mechanisms
Host header (enabling virtual hosting)
Additional methods (PUT, DELETE, OPTIONS, TRACE, CONNECT)
HTTP/2 (2015)
Binary framing layer
Multiplexing (parallel requests/responses over single connection)
Stream prioritization
Header compression (HPACK algorithm)
Server push
Flow control at stream and connection level
HTTP/2, finalized in 2015, brought significant improvements to website performance and efficiency over its predecessor, HTTP/1.1. The key advancements include multiplexing, allowing multiple requests and responses to occur simultaneously over a single connection, and header compression, reducing overhead. It also introduced server push, enabling servers to proactively send resources to the client before they're even requested. These enhancements result in faster page load times, reduced latency, and better resource utilization.
HTTP/3 (2022)
Built on QUIC (UDP-based) instead of TCP
0-RTT connection establishment
Improved congestion control
Connection migration (survives IP changes)
Native encryption (TLS 1.3 mandatory)
Eliminates head-of-line blocking at transport layer
HTTP Request Structure
METHOD /path/to/resource HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Cookie: session=abc123; preferences=dark_mode
Cache-Control: max-age=0
[Optional Request Body]
HTTP Response Structure
HTTP/1.1 200 OK
Date: Mon, 23 Oct 2025 12:28:53 GMT
Server: Apache/2.4.41 (Ubuntu)
Content-Type: text/html; charset=UTF-8
Content-Length: 12345
Content-Encoding: gzip
Cache-Control: public, max-age=3600
Set-Cookie: session=xyz789; HttpOnly; Secure; SameSite=Strict
Content-Security-Policy: default-src 'self'
[Response Body]
HTTP Methods (Verbs) and Semantics
GET: Idempotent, safe, cacheable retrieval
POST: Non-idempotent resource creation/processing
PUT: Idempotent resource replacement
PATCH: Partial resource modification
DELETE: Idempotent resource removal
HEAD: GET without response body
OPTIONS: Discover allowed methods (CORS preflight)
CONNECT: Establish tunnel (proxies)
TRACE: Loop-back diagnostic
Status Code Categories
1xx (Informational): 100 Continue, 101 Switching Protocols, 103 Early Hints
2xx (Success): 200 OK, 201 Created, 204 No Content, 206 Partial Content
3xx (Redirection): 301 Moved Permanently, 302 Found, 304 Not Modified, 307 Temporary Redirect
4xx (Client Error): 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 429 Too Many Requests
5xx (Server Error): 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout
Security Headers
Modern HTTP responses should include security headers:
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline'
Referrer-Policy: strict-origin-when-cross-origin
Permissions-Policy: geolocation=(), microphone=(), camera=()
Web servers, Web browsers, and DNS Servers
Web servers generate HTTP responses while Web browsers initiate HTTP requests and interpret HTTP responses.
Web Servers Architecture
Modern web servers employ sophisticated architectures for handling concurrent requests:
Process-based Models (Apache prefork MPM)
Each request handled by separate process
High memory overhead but strong isolation
Suitable for mod_php and other embedded interpreters
Thread-based Models (Apache worker MPM)
Threads share process memory space
Lower memory footprint than processes
Potential thread-safety issues with certain modules
Event-driven Models (Nginx, Node.js)
Single or few threads with event loop
Non-blocking I/O operations
Excellent for static content and reverse proxying
C10K problem solution (handling 10,000+ concurrent connections)
Hybrid Models (Apache event MPM)
Combines threading with event-driven architecture
Dedicated threads for long-running connections
Event loop for keep-alive connections
Popular Web Server Software
Nginx (~33% market share)
Asynchronous event-driven architecture
Excellent reverse proxy and load balancer
Low memory footprint
Apache HTTP Server (~31% market share)
Modular architecture (.htaccess support)
Extensive module ecosystem
Multiple processing models (MPMs)
Microsoft IIS (~7% market share)
Tight Windows integration
Native .NET support
Kernel-mode caching
LiteSpeed (~12% market share)
Drop-in Apache replacement
Built-in cache acceleration
HTTP/3 support
DNS stands for Domain Name System and is basically a systemized way to label computers and other devices over the Internet or private networks making them identifiable via a Domain Name rather than through a unique IP address.
In reality, this interaction is a multi-step process as shown below:

The Domain Name System (DNS) is like a global phonebook for the internet, translating human-readable domain names (like google.com) into machine-readable IP addresses. Its maintenance is a shared responsibility. At the top level, the Internet Corporation for Assigned Names and Numbers (ICANN) oversees the root servers and delegates authority for top-level domains (like .com, .org) to registries. These registries then work with registrars (companies like GoDaddy or Namecheap) who sell domain names to individuals and organizations. Finally, website owners manage the DNS records for their specific domains, specifying how their domain name maps to their web server's IP address. This distributed system ensures redundancy and resilience, keeping the internet running smoothly.
DNS operates as a distributed, hierarchical database with multiple components:
DNS Hierarchy:
Root Level (.)
├── Top-Level Domains (TLDs)
│ ├── Generic TLDs (.com, .org, .net, .info)
│ ├── Country-code TLDs (.uk, .de, .jp, .io)
│ └── Sponsored TLDs (.edu, .gov, .mil)
└── Second-Level Domains (example.com)
└── Subdomains (www.example.com, api.example.com)
DNS Query Process :
Browser Cache Check: TTL-based caching
OS Cache Check: System resolver cache
Router Cache Check: Local network cache
ISP Recursive Resolver:
Performs iterative queries on behalf of client
Implements DNSSEC validation
Caches responses based on TTL
Root Name Server Query (13 logical servers, hundreds of physical instances via anycast):
Returns TLD name server information
TLD Name Server Query:
Returns authoritative name server for domain
Authoritative Name Server Query:
Returns IP address for requested domain
WebSocket Protocol
WebSocket (RFC 6455) provides full-duplex, bidirectional communication over a single TCP connection:
Handshake Process:
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Frame Structure:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
+---------------------------------------------------------------+
Use Cases:
Real-time chat applications
Live sports updates
Financial trading platforms
Collaborative editing (Google Docs-style)
Online gaming
IoT device communication
Live streaming chat
WebSockets - a different protocol used in web technologies
Persistent connection: WebSockets establish a persistent, two-way connection between the client and server. Once the connection is established, both parties can send data at any time.
Stateful: WebSockets maintain state, meaning the server can keep track of the client's connection and any data that has been exchanged.
Full-duplex: Communication is full-duplex, allowing both the client and server to send data simultaneously.
Ideal for: Real-time applications like chat, online games, live updates, and collaborative tools where instant communication is crucial.
Web Pages
Web pages are (basically) documents formatted with HTML and can be optionally packed with styling guides (CSS files), behavior (Javascript files), and multimedia (images, audio, video files).

Modern web pages are composite documents assembled from multiple resources:
Critical Rendering Path
HTML Parsing: Incremental parsing with speculative parsing for resource hints
DOM Construction: Tree structure with ~O(n) complexity
CSSOM Construction: Parallel to DOM, blocks rendering
Render Tree: DOM + CSSOM merger, excludes non-visual elements
Layout/Reflow: Geometric calculations, expensive operation
Paint: Fill pixels, create layers
Composite: GPU acceleration, layer combination
Resource Loading Optimization
Resource Hints:
<link rel="dns-prefetch" href="//example.com">
<link rel="preconnect" href="//example.com">
<link rel="prefetch" href="/next-page.html">
<link rel="preload" href="/font.woff2" as="font" crossorigin>
<link rel="prerender" href="/likely-next-page.html">
Critical CSS Inlining:
<style>
/* Critical above-the-fold styles */
body { margin: 0; font-family: sans-serif; }
.hero { height: 100vh; background: linear-gradient(...); }
</style>
<link rel="preload" href="/styles.css" as="style" onload="this.onload=null;this.rel='stylesheet'">
JavaScript Loading Strategies:
<!-- Parser blocking -->
<script src="script.js"></script>
<!-- Deferred execution (DOM ready) -->
<script defer src="script.js"></script>
<!-- Async download and execution -->
<script async src="script.js"></script>
<!-- Module scripts (deferred by default) -->
<script type="module" src="module.js"></script>
FrontEnd vs. BackEnd
FrontEnd refers to the source code (received as a plain answer from HTTP requests) which a browser can interpret.
BackEnd is the source code executed at a Web Server level, and cannot be accessed by the browser. It is typically used for generating HTTP responses, interacting with Databases or other Information Systems, and for performing various other logical tasks.
Frontend Architecture
Component-Based Architectures:
React: Virtual DOM, unidirectional data flow, hooks-based state management
Vue: Reactive data binding, template-based, progressive framework
Angular: Full MVC framework, dependency injection, TypeScript-first
Svelte: Compile-time optimization, no virtual DOM, reactive assignments
State Management Patterns:
Flux/Redux: Unidirectional data flow, single source of truth
MobX: Observable state, reactive programming
Zustand/Jotai: Lightweight, hook-based state management
XState: Finite state machines for complex state logic
Frontend Build Pipeline:
Source Code → Transpilation (Babel/TypeScript) →
Bundling (Webpack/Vite/Rollup) → Minification →
Tree Shaking → Code Splitting → Asset Optimization →
Production Build
Modern Frontend Technologies:
WebAssembly: Near-native performance for compute-intensive tasks
Progressive Web Apps (PWA): Offline capability, installable web apps
Server-Side Rendering (SSR): Initial HTML from server
Static Site Generation (SSG): Pre-rendered pages at build time
Incremental Static Regeneration (ISR): Hybrid static/dynamic approach
Edge Computing: Computation at CDN edge locations
Backend Architecture
Architectural Patterns:
Monolithic Architecture:
Single deployable unit
Shared memory space
Simple deployment but scaling challenges
Examples: Traditional Rails, Django applications
Microservices Architecture:
Independently deployable services
Service-specific databases
Inter-service communication (REST, gRPC, message queues)
Container orchestration (Kubernetes, Docker Swarm)
Service mesh (Istio, Linkerd) for communication
Serverless Architecture:
Function-as-a-Service (FaaS): AWS Lambda, Vercel Functions
Backend-as-a-Service (BaaS): Firebase, Supabase
Event-driven execution
Pay-per-use pricing model
Cold start considerations
Backend Technologies Stack:
Runtime Environments:
Node.js: V8-based JavaScript runtime, event-driven, non-blocking I/O
Python: Django (batteries-included), Flask (microframework), FastAPI (async)
Java: Spring Boot (enterprise), Micronaut (microservices)
Ruby: Rails (convention over configuration)
Go: High performance, built-in concurrency
Rust: Memory safety without garbage collection
Database Systems:
Relational (SQL):
PostgreSQL: Advanced features, JSON support, extensions
MySQL/MariaDB: High performance, replication
SQLite: Embedded, serverless
NoSQL:
Document Stores: MongoDB, CouchDB (JSON documents)
Key-Value: Redis (in-memory), DynamoDB
Column-Family: Cassandra, HBase (wide column)
Graph: Neo4j, ArangoDB (relationship-focused)
API Architectures:
REST: Resource-based, stateless, HTTP verbs
GraphQL: Query language, single endpoint, type system
gRPC: Binary protocol, Protocol Buffers, bidirectional streaming
WebSocket: Real-time, bidirectional communication
Server-Sent Events: Unidirectional server-to-client streaming
Caching Strategies:
Browser Cache: HTTP headers (Cache-Control, ETag)
CDN Cache: Geographic distribution, edge caching
Application Cache: In-memory (Redis, Memcached)
Database Cache: Query result caching
Full-Page Cache: Varnish, Nginx FastCGI cache
Message Queue Systems:
RabbitMQ: AMQP protocol, reliable delivery
Apache Kafka: Distributed streaming, high throughput
Redis Pub/Sub: Simple publish-subscribe
Amazon SQS/SNS: Managed queue/notification services
DevOps and Deployment
CI/CD Pipeline:
Code Commit → Automated Tests → Build →
Security Scanning → Staging Deployment →
Integration Tests → Production Deployment →
Monitoring & Rollback
Container Technologies:
Docker: Containerization, Dockerfile, image layers
Kubernetes: Orchestration, pods, services, ingress
Helm: Kubernetes package management
Service Mesh: Istio, Linkerd for microservice communication
Infrastructure as Code:
Terraform: Cloud-agnostic infrastructure provisioning
CloudFormation: AWS-specific IaC
Ansible: Configuration management
Pulumi: Programming language-based IaC
Monitoring and Observability:
Metrics: Prometheus, Grafana, DataDog
Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Splunk
Tracing: Jaeger, Zipkin, AWS X-Ray
Error Tracking: Sentry, Rollbar
Security Considerations
Web Security Threats
Common Vulnerabilities (OWASP Top 10):
Injection: SQL, NoSQL, LDAP injection
Broken Authentication: Session management flaws
Sensitive Data Exposure: Inadequate encryption
XML External Entities (XXE): XML processor attacks
Broken Access Control: Privilege escalation
Security Misconfiguration: Default settings, verbose errors
Cross-Site Scripting (XSS): Reflected, Stored, DOM-based
Insecure Deserialization: Remote code execution
Using Components with Known Vulnerabilities: Outdated dependencies
Insufficient Logging & Monitoring: Delayed breach detection
Security Best Practices
Input Validation:
// Server-side validation example
const sanitizeInput = (input) => {
return input
.replace(/[<>]/g, '') // Remove angle brackets
.trim()
.substring(0, 255); // Length limit
};
// Parameterized queries (SQL injection prevention)
const query = 'SELECT * FROM users WHERE id = ?';
db.execute(query, [userId]);
Authentication & Authorization:
Multi-factor Authentication (MFA)
OAuth 2.0 / OpenID Connect
JWT with proper validation
Session management with secure cookies
Rate limiting and account lockout
Encryption:
TLS 1.3: Modern encryption for data in transit
AES-256: Symmetric encryption for data at rest
Bcrypt/Argon2: Password hashing
Certificate Pinning: Mobile app security
The modern web is a complex ecosystem of protocols, technologies, and architectural patterns. Understanding these fundamentals—from low-level networking protocols to high-level application architectures—is essential for building robust, scalable, and secure web applications. As the web continues to evolve with emerging technologies like WebAssembly, edge computing, and HTTP/3, maintaining a solid grasp of these core concepts provides the foundation for adapting to future innovations in web development.
Last updated
Was this helpful?