How web pages work

What is The Internet

The Internet is a global, decentralized network infrastructure that interconnects millions of autonomous systems through a hierarchical routing architecture based on the Internet Protocol Suite (TCP/IP). At its core, the Internet operates as a packet-switched network utilizing statistical multiplexing to optimize bandwidth utilization across heterogeneous physical media including fiber optic cables, copper wires, radio waves, and satellite links.

Full Internet map of 29 June 1999. Antoniou, Pavlos & Pitsillides, Andreas. (2007)

It is mainly used for accessing resources (and services) such as hypertext documents of the World Wide Web, electronic mail, various files, or data streams.

Technical Architecture

The Internet's architecture follows the OSI (Open Systems Interconnection) model and the simplified TCP/IP stack:

  1. Physical Layer: Encompasses submarine cables (carrying ~95% of intercontinental data), terrestrial fiber networks, last-mile connections (DSL, cable, fiber-to-the-home), and wireless technologies (4G/5G, Wi-Fi 6/6E/7)

  2. Data Link Layer: Handles frame transmission with protocols like Ethernet (IEEE 802.3), implementing CSMA/CD for collision detection and MAC addressing for local network identification

  3. Network Layer: Implements IP (Internet Protocol) with IPv4 (32-bit addresses, ~4.3 billion unique addresses) and IPv6 (128-bit addresses, ~340 undecillion addresses), utilizing routing protocols like BGP (Border Gateway Protocol) for inter-AS routing and OSPF/RIP for intra-AS routing

  4. Transport Layer: Provides end-to-end communication with TCP (reliable, connection-oriented, with congestion control algorithms like CUBIC and BBR) and UDP (unreliable, connectionless, lower latency)

  5. Application Layer: Implements protocols like HTTP/HTTPS, FTP, SMTP, DNS, WebSocket, and emerging protocols like QUIC

Given its high complexity global reach, it doesn't have a single centralized governance. This means that if a certain component (network or computer) becomes unavailable, the Internet as a whole doesn't collapse.

The Internet's decentralized governance model involves multiple stakeholders:

  • IETF (Internet Engineering Task Force): Develops and maintains protocol standards through RFCs (Request for Comments)

  • ICANN (Internet Corporation for Assigned Names and Numbers): Manages DNS root zones and IP address allocation

  • W3C (World Wide Web Consortium): Standardizes web technologies (HTML, CSS, Web APIs)

  • Regional Internet Registries (RIRs): ARIN, RIPE NCC, APNIC, LACNIC, AFRINIC manage IP address allocation regionally

  • IEEE: Develops standards for networking hardware and protocols (802.x standards)

World Wide Web (WWW)

The World Wide Web is a distributed hypermedia information system built atop the Internet infrastructure, characterized by three fundamental technologies:

  1. HTML (HyperText Markup Language): Semantic document structure

  2. URI/URL (Uniform Resource Identifier/Locator): Global addressing scheme

  3. HTTP (HyperText Transfer Protocol): Application-layer communication protocol

Technical Specifications

  • Document Object Model (DOM): Tree-structured representation of HTML documents, manipulable via JavaScript through standardized APIs

  • REST (Representational State Transfer): Architectural style emphasizing stateless communication, resource-based interactions, and uniform interfaces

  • Semantic Web: Extension using RDF (Resource Description Framework), OWL (Web Ontology Language), and SPARQL for machine-readable data

  • Web Assembly (WASM): Binary instruction format enabling near-native performance for web applications

URL Structure and Components

URLs follow RFC 3986 specification with the structure:

scheme://[userinfo@]host[:port][/path][?query][#fragment]

Example breakdown:

https://user:pass@example.com:8080/path/to/resource?key=value#section
│      │         │            │    │               │          └─ Fragment identifier
│      │         │            │    │               └─ Query parameters
│      │         │            │    └─ Path to resource
│      │         │            └─ Port number (optional, defaults: HTTP=80, HTTPS=443)
│      │         └─ Host (domain name or IP address)
│      └─ Authentication credentials (deprecated for security)
└─ Protocol scheme

HTTP Protocol

HTTP (HyperText Transfer Protocol) is a stateless, application-layer protocol operating typically over TCP (and now QUIC for HTTP/3), following a request-response paradigm with extensive header metadata and multiple request methods.

HTTP Evolution

HTTP/0.9 (1991)

  • Single-line protocol

  • GET method only

  • No headers, status codes, or content types

HTTP/1.0 (1996)

  • Headers introduced

  • Status codes

  • Content-Type support

  • Additional methods (POST, HEAD)

HTTP/1.1 (1997-1999)

  • Persistent connections (Keep-Alive)

  • Pipelining (sequential request sending)

  • Chunked transfer encoding

  • Cache control mechanisms

  • Host header (enabling virtual hosting)

  • Additional methods (PUT, DELETE, OPTIONS, TRACE, CONNECT)

HTTP/2 (2015)

  • Binary framing layer

  • Multiplexing (parallel requests/responses over single connection)

  • Stream prioritization

  • Header compression (HPACK algorithm)

  • Server push

  • Flow control at stream and connection level

HTTP/3 (2022)

  • Built on QUIC (UDP-based) instead of TCP

  • 0-RTT connection establishment

  • Improved congestion control

  • Connection migration (survives IP changes)

  • Native encryption (TLS 1.3 mandatory)

  • Eliminates head-of-line blocking at transport layer

HTTP Request Structure

METHOD /path/to/resource HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Cookie: session=abc123; preferences=dark_mode
Cache-Control: max-age=0

[Optional Request Body]

HTTP Response Structure

HTTP/1.1 200 OK
Date: Mon, 23 Oct 2025 12:28:53 GMT
Server: Apache/2.4.41 (Ubuntu)
Content-Type: text/html; charset=UTF-8
Content-Length: 12345
Content-Encoding: gzip
Cache-Control: public, max-age=3600
Set-Cookie: session=xyz789; HttpOnly; Secure; SameSite=Strict
Content-Security-Policy: default-src 'self'

[Response Body]

HTTP Methods (Verbs) and Semantics

  • GET: Idempotent, safe, cacheable retrieval

  • POST: Non-idempotent resource creation/processing

  • PUT: Idempotent resource replacement

  • PATCH: Partial resource modification

  • DELETE: Idempotent resource removal

  • HEAD: GET without response body

  • OPTIONS: Discover allowed methods (CORS preflight)

  • CONNECT: Establish tunnel (proxies)

  • TRACE: Loop-back diagnostic

Status Code Categories

  • 1xx (Informational): 100 Continue, 101 Switching Protocols, 103 Early Hints

  • 2xx (Success): 200 OK, 201 Created, 204 No Content, 206 Partial Content

  • 3xx (Redirection): 301 Moved Permanently, 302 Found, 304 Not Modified, 307 Temporary Redirect

  • 4xx (Client Error): 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 429 Too Many Requests

  • 5xx (Server Error): 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout

Security Headers

Modern HTTP responses should include security headers:

Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline'
Referrer-Policy: strict-origin-when-cross-origin
Permissions-Policy: geolocation=(), microphone=(), camera=()
Example structure of a HTTP request

Web servers, Web browsers, and DNS Servers

A simplified interaction between a Web browser and a Web server

Web Servers Architecture

Modern web servers employ sophisticated architectures for handling concurrent requests:

Process-based Models (Apache prefork MPM)

  • Each request handled by separate process

  • High memory overhead but strong isolation

  • Suitable for mod_php and other embedded interpreters

Thread-based Models (Apache worker MPM)

  • Threads share process memory space

  • Lower memory footprint than processes

  • Potential thread-safety issues with certain modules

Event-driven Models (Nginx, Node.js)

  • Single or few threads with event loop

  • Non-blocking I/O operations

  • Excellent for static content and reverse proxying

  • C10K problem solution (handling 10,000+ concurrent connections)

Hybrid Models (Apache event MPM)

  • Combines threading with event-driven architecture

  • Dedicated threads for long-running connections

  • Event loop for keep-alive connections

  1. Nginx (~33% market share)

    • Asynchronous event-driven architecture

    • Excellent reverse proxy and load balancer

    • Low memory footprint

  2. Apache HTTP Server (~31% market share)

    • Modular architecture (.htaccess support)

    • Extensive module ecosystem

    • Multiple processing models (MPMs)

  3. Microsoft IIS (~7% market share)

    • Tight Windows integration

    • Native .NET support

    • Kernel-mode caching

  4. LiteSpeed (~12% market share)

    • Drop-in Apache replacement

    • Built-in cache acceleration

    • HTTP/3 support

DNS stands for Domain Name System and is basically a systemized way to label computers and other devices over the Internet or private networks making them identifiable via a Domain Name rather than through a unique IP address.

In reality, this interaction is a multi-step process as shown below:

DNS operates as a distributed, hierarchical database with multiple components:

DNS Hierarchy:

Root Level (.)
    ├── Top-Level Domains (TLDs)
    │   ├── Generic TLDs (.com, .org, .net, .info)
    │   ├── Country-code TLDs (.uk, .de, .jp, .io)
    │   └── Sponsored TLDs (.edu, .gov, .mil)
    └── Second-Level Domains (example.com)
        └── Subdomains (www.example.com, api.example.com)

DNS Query Process :

  1. Browser Cache Check: TTL-based caching

  2. OS Cache Check: System resolver cache

  3. Router Cache Check: Local network cache

  4. ISP Recursive Resolver:

    • Performs iterative queries on behalf of client

    • Implements DNSSEC validation

    • Caches responses based on TTL

  5. Root Name Server Query (13 logical servers, hundreds of physical instances via anycast):

    • Returns TLD name server information

  6. TLD Name Server Query:

    • Returns authoritative name server for domain

  7. Authoritative Name Server Query:

    • Returns IP address for requested domain

WebSocket Protocol

WebSocket (RFC 6455) provides full-duplex, bidirectional communication over a single TCP connection:

Handshake Process:

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

Frame Structure:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

Use Cases:

  • Real-time chat applications

  • Live sports updates

  • Financial trading platforms

  • Collaborative editing (Google Docs-style)

  • Online gaming

  • IoT device communication

  • Live streaming chat

Web Pages

Web pages are (basically) documents formatted with HTML and can be optionally packed with styling guides (CSS files), behavior (Javascript files), and multimedia (images, audio, video files).

Modern web pages are composite documents assembled from multiple resources:

Critical Rendering Path

  1. HTML Parsing: Incremental parsing with speculative parsing for resource hints

  2. DOM Construction: Tree structure with ~O(n) complexity

  3. CSSOM Construction: Parallel to DOM, blocks rendering

  4. Render Tree: DOM + CSSOM merger, excludes non-visual elements

  5. Layout/Reflow: Geometric calculations, expensive operation

  6. Paint: Fill pixels, create layers

  7. Composite: GPU acceleration, layer combination

Resource Loading Optimization

Resource Hints:

<link rel="dns-prefetch" href="//example.com">
<link rel="preconnect" href="//example.com">
<link rel="prefetch" href="/next-page.html">
<link rel="preload" href="/font.woff2" as="font" crossorigin>
<link rel="prerender" href="/likely-next-page.html">

Critical CSS Inlining:

<style>
  /* Critical above-the-fold styles */
  body { margin: 0; font-family: sans-serif; }
  .hero { height: 100vh; background: linear-gradient(...); }
</style>
<link rel="preload" href="/styles.css" as="style" onload="this.onload=null;this.rel='stylesheet'">

JavaScript Loading Strategies:

<!-- Parser blocking -->
<script src="script.js"></script>

<!-- Deferred execution (DOM ready) -->
<script defer src="script.js"></script>

<!-- Async download and execution -->
<script async src="script.js"></script>

<!-- Module scripts (deferred by default) -->
<script type="module" src="module.js"></script>

FrontEnd vs. BackEnd

FrontEnd refers to the source code (received as a plain answer from HTTP requests) which a browser can interpret.

BackEnd is the source code executed at a Web Server level, and cannot be accessed by the browser. It is typically used for generating HTTP responses, interacting with Databases or other Information Systems, and for performing various other logical tasks.

Frontend Architecture

Component-Based Architectures:

  • React: Virtual DOM, unidirectional data flow, hooks-based state management

  • Vue: Reactive data binding, template-based, progressive framework

  • Angular: Full MVC framework, dependency injection, TypeScript-first

  • Svelte: Compile-time optimization, no virtual DOM, reactive assignments

State Management Patterns:

  • Flux/Redux: Unidirectional data flow, single source of truth

  • MobX: Observable state, reactive programming

  • Zustand/Jotai: Lightweight, hook-based state management

  • XState: Finite state machines for complex state logic

Frontend Build Pipeline:

Source Code → Transpilation (Babel/TypeScript) → 
Bundling (Webpack/Vite/Rollup) → Minification → 
Tree Shaking → Code Splitting → Asset Optimization → 
Production Build

Modern Frontend Technologies:

  • WebAssembly: Near-native performance for compute-intensive tasks

  • Progressive Web Apps (PWA): Offline capability, installable web apps

  • Server-Side Rendering (SSR): Initial HTML from server

  • Static Site Generation (SSG): Pre-rendered pages at build time

  • Incremental Static Regeneration (ISR): Hybrid static/dynamic approach

  • Edge Computing: Computation at CDN edge locations

Backend Architecture

Architectural Patterns:

Monolithic Architecture:

  • Single deployable unit

  • Shared memory space

  • Simple deployment but scaling challenges

  • Examples: Traditional Rails, Django applications

Microservices Architecture:

  • Independently deployable services

  • Service-specific databases

  • Inter-service communication (REST, gRPC, message queues)

  • Container orchestration (Kubernetes, Docker Swarm)

  • Service mesh (Istio, Linkerd) for communication

Serverless Architecture:

  • Function-as-a-Service (FaaS): AWS Lambda, Vercel Functions

  • Backend-as-a-Service (BaaS): Firebase, Supabase

  • Event-driven execution

  • Pay-per-use pricing model

  • Cold start considerations

Backend Technologies Stack:

Runtime Environments:

  • Node.js: V8-based JavaScript runtime, event-driven, non-blocking I/O

  • Python: Django (batteries-included), Flask (microframework), FastAPI (async)

  • Java: Spring Boot (enterprise), Micronaut (microservices)

  • Ruby: Rails (convention over configuration)

  • Go: High performance, built-in concurrency

  • Rust: Memory safety without garbage collection

Database Systems:

Relational (SQL):

  • PostgreSQL: Advanced features, JSON support, extensions

  • MySQL/MariaDB: High performance, replication

  • SQLite: Embedded, serverless

NoSQL:

  • Document Stores: MongoDB, CouchDB (JSON documents)

  • Key-Value: Redis (in-memory), DynamoDB

  • Column-Family: Cassandra, HBase (wide column)

  • Graph: Neo4j, ArangoDB (relationship-focused)

API Architectures:

  • REST: Resource-based, stateless, HTTP verbs

  • GraphQL: Query language, single endpoint, type system

  • gRPC: Binary protocol, Protocol Buffers, bidirectional streaming

  • WebSocket: Real-time, bidirectional communication

  • Server-Sent Events: Unidirectional server-to-client streaming

Caching Strategies:

  • Browser Cache: HTTP headers (Cache-Control, ETag)

  • CDN Cache: Geographic distribution, edge caching

  • Application Cache: In-memory (Redis, Memcached)

  • Database Cache: Query result caching

  • Full-Page Cache: Varnish, Nginx FastCGI cache

Message Queue Systems:

  • RabbitMQ: AMQP protocol, reliable delivery

  • Apache Kafka: Distributed streaming, high throughput

  • Redis Pub/Sub: Simple publish-subscribe

  • Amazon SQS/SNS: Managed queue/notification services

DevOps and Deployment

CI/CD Pipeline:

Code Commit → Automated Tests → Build → 
Security Scanning → Staging Deployment → 
Integration Tests → Production Deployment → 
Monitoring & Rollback

Container Technologies:

  • Docker: Containerization, Dockerfile, image layers

  • Kubernetes: Orchestration, pods, services, ingress

  • Helm: Kubernetes package management

  • Service Mesh: Istio, Linkerd for microservice communication

Infrastructure as Code:

  • Terraform: Cloud-agnostic infrastructure provisioning

  • CloudFormation: AWS-specific IaC

  • Ansible: Configuration management

  • Pulumi: Programming language-based IaC

Monitoring and Observability:

  • Metrics: Prometheus, Grafana, DataDog

  • Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Splunk

  • Tracing: Jaeger, Zipkin, AWS X-Ray

  • Error Tracking: Sentry, Rollbar

Security Considerations

Web Security Threats

Common Vulnerabilities (OWASP Top 10):

  1. Injection: SQL, NoSQL, LDAP injection

  2. Broken Authentication: Session management flaws

  3. Sensitive Data Exposure: Inadequate encryption

  4. XML External Entities (XXE): XML processor attacks

  5. Broken Access Control: Privilege escalation

  6. Security Misconfiguration: Default settings, verbose errors

  7. Cross-Site Scripting (XSS): Reflected, Stored, DOM-based

  8. Insecure Deserialization: Remote code execution

  9. Using Components with Known Vulnerabilities: Outdated dependencies

  10. Insufficient Logging & Monitoring: Delayed breach detection

Security Best Practices

Input Validation:

// Server-side validation example
const sanitizeInput = (input) => {
  return input
    .replace(/[<>]/g, '') // Remove angle brackets
    .trim()
    .substring(0, 255); // Length limit
};

// Parameterized queries (SQL injection prevention)
const query = 'SELECT * FROM users WHERE id = ?';
db.execute(query, [userId]);

Authentication & Authorization:

  • Multi-factor Authentication (MFA)

  • OAuth 2.0 / OpenID Connect

  • JWT with proper validation

  • Session management with secure cookies

  • Rate limiting and account lockout

Encryption:

  • TLS 1.3: Modern encryption for data in transit

  • AES-256: Symmetric encryption for data at rest

  • Bcrypt/Argon2: Password hashing

  • Certificate Pinning: Mobile app security

Last updated

Was this helpful?