NoSQL Databases
Last updated
Was this helpful?
Last updated
Was this helpful?
NoSQL databases are non-tabular and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads.
Document databases (e.g. CouchDB, MongoDB, Google Firestore). Inserted data is stored in the form of free-form JSON structures or “documents,” where the data could be anything from integers to strings to freeform text. There is no inherent need to specify what fields, if any, a document will contain.
Key-value stores (e.g. Redis, Riak). Free-form values—from simple integers or strings to complex JSON documents—are accessed in the database by way of keys.
Wide column stores (e.g. HBase, Cassandra). Data is stored in columns instead of rows as in a conventional SQL system. Any number of columns (and therefore many different types of data) can be grouped or aggregated as needed for queries or data views.
Graph databases (e.g. Neo4j). Data is represented as a network or graph of entities and their relationships, with each node in the graph a free-form chunk of data.
SQL
NoSQL
Type
Relational
Non-Relational
Data
Structured Data stored in Tables
Un-structured, stored in JSON files but the graph database supports relationships
Schema
Static. Rigid, bound to relationships
Dynamic and flexible
Scalability
Vertical
Horizontal
Language
SQL
Unstructured
Joins
Helpful to design complex queries
No joins. Don't have the powerful interface to prepare complex queries
Suitable for large data sets
No
Yes
Support
Great support
Community increasing support
Auto elasticity
Requires downtime in most case
Automatic. No outage required
A relational database like SQL (MySQL, PostgreSQL, etc.) uses structured tables with predefined columns. Here's an example of a "Users" table with user-related data:
Table: Users
John
Doe
john.doe@gmail.com
1990-05-10
New York
Jane
Smith
1988-07-22
Los Angeles
Mark
Johnson
mark.j@outlook.com
1995-03-15
Chicago
Table: Orders
101
1
Laptop
1
2023-10-20
102
2
Smartphone
2
2023-10-22
103
1
Headphones
1
2023-10-23
Explanation:
This SQL setup has two tables: Users
and Orders
.
Users
stores user details (ID, name, email, etc.).
Orders
stores order information and uses user_id
as a foreign key to reference users.
SQL is great for structured, relational data where you have defined schema, and relationships (joins) between tables.
Firebase Firestore is a NoSQL database where data is stored in documents and collections, and the schema can be more flexible.
Collection: Users
Explanation:
Firebase Firestore organizes data in collections and documents. Here, the Users
collection contains multiple documents, each representing a user.
The data for each user (like personal info and orders) is nested in one document. The orders
field is an array that stores order details directly within the user document.
Firestore's schema is more flexible, allowing for varying fields between documents, if needed, without predefined table structures.
SQL: Data is normalized and split across multiple related tables. Queries often involve JOIN
operations to combine related information from different tables.
Firestore: Data is often denormalized and stored in a hierarchical, document-based format, with nested structures (like arrays of objects) that eliminate the need for complex joins. This makes Firestore more suitable for dynamic or unstructured data.
These examples highlight how data would be modeled for SQL versus Firebase Firestore, based on the nature of the database systems.
A basic example of data that cannot be easily or efficiently stored in a SQL database would be highly unstructured or nested data with variable schemas, such as real-time chat messages with embedded media, metadata, and replies. NoSQL databases like Firebase Firestore, MongoDB, or Cassandra are often better suited for such data due to their flexibility.
Here's an example:
Example Data:
Variable and Nested Data Structures:
The chat messages contain nested data such as replies
, reactions
, and media
, which can have different schemas (e.g., some messages may have media, others may not).
SQL databases require predefined schemas (tables and columns), and handling such flexible, deeply nested data structures requires complex normalization. This would require multiple tables and joins, making it inefficient to store and query.
Arrays and Embedded Documents:
Fields like reactions
and replies
are arrays of objects. SQL doesn't handle arrays or embedded documents natively, so you would need to create separate tables to store each reaction or reply, which complicates both data storage and querying.
Dynamic Data:
Some fields, such as media
, might be present for some messages but absent for others. SQL databases require a strict schema, so if some fields don't always exist (like media
), you may end up with a lot of NULL values or complex table designs with conditional fields.
Real-Time and Scalability Needs:
Real-time applications (like chat) often require databases that handle high-frequency updates and flexible data structures, which SQL databases struggle with at large scale. NoSQL databases like Firestore excel at this due to their document-based nature, sharding, and horizontal scaling capabilities.
In NoSQL databases like Firebase Firestore or MongoDB:
You can store this entire message as a single document in a "Messages" collection.
No need for predefined schema: fields can vary between documents.
Nested arrays (like reactions
and replies
) and objects (like media
and sender
) are handled efficiently as part of the document itself.
Data can be retrieved or updated in real-time without complex joins or schema alterations.
This flexibility makes NoSQL databases a better choice for such unstructured, dynamic data.
Neo4j is a graph database where data is stored as nodes and relationships instead of tables. It's particularly good for handling connected data like social networks, recommendations, or complex hierarchies.
Key concepts:
Nodes: Represent entities (like people, products) with labels and properties
Relationships: Connect nodes, have types and properties
Properties: Key-value pairs stored on both nodes and relationships
Labels: Categories for nodes (like :Person, :Product)
Cypher is Neo4j's query language, designed to be visual and intuitive:
Neo4j is particularly popular in these key areas:
Fraud Detection & Security
Banking: Pattern detection for suspicious transactions
Cybersecurity: Network analysis, threat detection
Insurance: Claims fraud investigation by connecting entities
Recommendations
E-commerce: Product recommendations based on purchase patterns
Social Media: Friend/content suggestions (LinkedIn-style connections)
Entertainment: Content recommendations (Netflix, Spotify type systems)
Knowledge Graphs
Scientific Research: Connecting research papers, authors, and topics
Enterprise Knowledge Management: Company documentation and relationships
AI/ML: Knowledge bases for natural language processing
Identity & Access Management
Role-based access control
Complex organizational hierarchies
Access dependency tracking
The common thread is that Neo4j excels when dealing with highly connected data where relationships are as important as the data points themselves.