NoSQL Databases

NoSQL databases are non-tabular and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads.

Document databases (e.g. CouchDB, MongoDB, Google Firestore). Inserted data is stored in the form of free-form JSON structures or “documents,” where the data could be anything from integers to strings to freeform text. There is no inherent need to specify what fields, if any, a document will contain.
Key-value stores (e.g. Redis, Riak). Free-form values—from simple integers or strings to complex JSON documents—are accessed in the database by way of keys.
Wide column stores (e.g. HBase, Cassandra). Data is stored in columns instead of rows as in a conventional SQL system. Any number of columns (and therefore many different types of data) can be grouped or aggregated as needed for queries or data views.
Graph databases (e.g. Neo4j). Data is represented as a network or graph of entities and their relationships, with each node in the graph a free-form chunk of data.

NoSQL vs SQL

SQL

NoSQL

Type

Relational

Non-Relational

Data

Structured Data stored in Tables

Un-structured, stored in JSON files but the graph database supports relationships

Schema

Static. Rigid, bound to relationships

Dynamic and flexible

Scalability

Vertical

Horizontal

Language

SQL

Unstructured

Joins

Helpful to design complex queries

No joins. Don't have the powerful interface to prepare complex queries

Suitable for large data sets

Yes

Support

Great support

Community increasing support

Auto elasticity

Requires downtime in most case

Automatic. No outage required

1. SQL Sample:

A relational database like SQL (MySQL, PostgreSQL, etc.) uses structured tables with predefined columns. Here's an example of a "Users" table with user-related data:

Table: Users

user_id

first_name

last_name

date_of_birth

city

John

Doe

john.doe@gmail.com

1990-05-10

New York

Jane

Smith

jane.smith@yahoo.com

1988-07-22

Los Angeles

Mark

Johnson

mark.j@outlook.com

1995-03-15

Chicago

Table: Orders

order_id

user_id

product_name

quantity

order_date

101

Laptop

2023-10-20

102

Smartphone

2023-10-22

103

Headphones

2023-10-23

Explanation:

This SQL setup has two tables: Users and Orders.
Users stores user details (ID, name, email, etc.).
Orders stores order information and uses user_id as a foreign key to reference users.

SQL is great for structured, relational data where you have defined schema, and relationships (joins) between tables.

2. Firebase Firestore Sample:

Firebase Firestore is a NoSQL database where data is stored in documents and collections, and the schema can be more flexible.

Collection: Users

{
  "user_1": {
    "first_name": "John",
    "last_name": "Doe",
    "email": "john.doe@gmail.com",
    "date_of_birth": "1990-05-10",
    "city": "New York",
    "orders": [
      {
        "order_id": "101",
        "product_name": "Laptop",
        "quantity": 1,
        "order_date": "2023-10-20"
      },
      {
        "order_id": "103",
        "product_name": "Headphones",
        "quantity": 1,
        "order_date": "2023-10-23"
      }
    ]
  },
  "user_2": {
    "first_name": "Jane",
    "last_name": "Smith",
    "email": "jane.smith@yahoo.com",
    "date_of_birth": "1988-07-22",
    "city": "Los Angeles",
    "orders": [
      {
        "order_id": "102",
        "product_name": "Smartphone",
        "quantity": 2,
        "order_date": "2023-10-22"
      }
    ]
  }
}

Explanation:

Firebase Firestore organizes data in collections and documents. Here, the Users collection contains multiple documents, each representing a user.
The data for each user (like personal info and orders) is nested in one document. The orders field is an array that stores order details directly within the user document.
Firestore's schema is more flexible, allowing for varying fields between documents, if needed, without predefined table structures.

Key Differences:

SQL: Data is normalized and split across multiple related tables. Queries often involve JOIN operations to combine related information from different tables.
Firestore: Data is often denormalized and stored in a hierarchical, document-based format, with nested structures (like arrays of objects) that eliminate the need for complex joins. This makes Firestore more suitable for dynamic or unstructured data.

These examples highlight how data would be modeled for SQL versus Firebase Firestore, based on the nature of the database systems.

A basic example of data that cannot be easily or efficiently stored in a SQL database would be highly unstructured or nested data with variable schemas, such as real-time chat messages with embedded media, metadata, and replies. NoSQL databases like Firebase Firestore, MongoDB, or Cassandra are often better suited for such data due to their flexibility.

Here's an example:

Real-Time Chat Messages Data (with nested replies, media, and reactions)

Example Data:

{
  "message_id": "msg_123",
  "sender": {
    "user_id": "user_1",
    "name": "John Doe"
  },
  "content": "Hey, check out this picture!",
  "timestamp": "2024-10-24T14:30:00Z",
  "media": {
    "type": "image",
    "url": "https://example.com/image123.jpg",
    "metadata": {
      "resolution": "1920x1080",
      "size_in_kb": 450
    }
  },
  "reactions": [
    {
      "user_id": "user_2",
      "emoji": "👍",
      "timestamp": "2024-10-24T14:32:00Z"
    },
    {
      "user_id": "user_3",
      "emoji": "😂",
      "timestamp": "2024-10-24T14:35:00Z"
    }
  ],
  "replies": [
    {
      "message_id": "msg_124",
      "sender": {
        "user_id": "user_2",
        "name": "Jane Smith"
      },
      "content": "That's a cool picture!",
      "timestamp": "2024-10-24T14:34:00Z",
      "reactions": []
    }
  ]
}

Why This is Difficult to Store in SQL:

Variable and Nested Data Structures:
- The chat messages contain nested data such as replies, reactions, and media, which can have different schemas (e.g., some messages may have media, others may not).
- SQL databases require predefined schemas (tables and columns), and handling such flexible, deeply nested data structures requires complex normalization. This would require multiple tables and joins, making it inefficient to store and query.
Arrays and Embedded Documents:
- Fields like reactions and replies are arrays of objects. SQL doesn't handle arrays or embedded documents natively, so you would need to create separate tables to store each reaction or reply, which complicates both data storage and querying.
Dynamic Data:
- Some fields, such as media, might be present for some messages but absent for others. SQL databases require a strict schema, so if some fields don't always exist (like media), you may end up with a lot of NULL values or complex table designs with conditional fields.
Real-Time and Scalability Needs:
- Real-time applications (like chat) often require databases that handle high-frequency updates and flexible data structures, which SQL databases struggle with at large scale. NoSQL databases like Firestore excel at this due to their document-based nature, sharding, and horizontal scaling capabilities.

How NoSQL Handles It:

In NoSQL databases like Firebase Firestore or MongoDB:

You can store this entire message as a single document in a "Messages" collection.
No need for predefined schema: fields can vary between documents.
Nested arrays (like reactions and replies) and objects (like media and sender) are handled efficiently as part of the document itself.
Data can be retrieved or updated in real-time without complex joins or schema alterations.

This flexibility makes NoSQL databases a better choice for such unstructured, dynamic data.

Extra: Neo4j

Neo4j is a graph database where data is stored as nodes and relationships instead of tables. It's particularly good for handling connected data like social networks, recommendations, or complex hierarchies.

Key concepts:

Nodes: Represent entities (like people, products) with labels and properties
Relationships: Connect nodes, have types and properties
Properties: Key-value pairs stored on both nodes and relationships
Labels: Categories for nodes (like :Person, :Product)

Cypher is Neo4j's query language, designed to be visual and intuitive:

// Find John's friends
MATCH (p:Person {name: 'John'})-[:FRIEND]->(friend)
RETURN friend.name

// Create a new person and relationship
CREATE (john:Person {name: 'John'})-[:WORKS_AT]->(company:Company {name: 'Acme'})

Neo4j is particularly popular in these key areas:

Fraud Detection & Security

Banking: Pattern detection for suspicious transactions
Cybersecurity: Network analysis, threat detection
Insurance: Claims fraud investigation by connecting entities

Recommendations

E-commerce: Product recommendations based on purchase patterns
Social Media: Friend/content suggestions (LinkedIn-style connections)
Entertainment: Content recommendations (Netflix, Spotify type systems)

Knowledge Graphs

Scientific Research: Connecting research papers, authors, and topics
Enterprise Knowledge Management: Company documentation and relationships
AI/ML: Knowledge bases for natural language processing

Identity & Access Management

Role-based access control
Complex organizational hierarchies
Access dependency tracking

The common thread is that Neo4j excels when dealing with highly connected data where relationships are as important as the data points themselves.

PreviousAuthentication and Authorization NextDatabase as a Service

Last updated 8 months ago

Was this helpful?