Databases for Backend Systems

Importance of Databases in Backend Development

Databases are one of the most frequent components used in backend systems.
Backend engineers constantly interact with databases for:
- Storing data
- Retrieving data
- Updating data
- Deleting data
Understanding database concepts is crucial for efficient backend development.

Persistence and the Need for Databases

What is Persistence?

Persistence means storing data so that it survives after the program stops running.
Data remains available across:
- Different sessions
- Different system states
- Long periods of time
- Different physical locations

Example: To-Do List Application

A user:
- Creates tasks
- Marks tasks as completed
When the app is closed and reopened:
- The previous state must remain intact.
Without persistence:
- All tasks would disappear every time the app restarts.

Importance of Persistence

Used extensively in everyday applications.
Ensures data continuity and reliability.

What is a Database?

Broad Definition

A database is any structured storage system used to persist data.
Examples of databases include:
- Phone contact lists
- Browser local storage
- Cookies
- Session storage
- Text files storing structured information

Core Operations of a Database (CRUD)

A database system provides methods to:

Create data
Read data
Update data
Delete data

These operations are commonly called CRUD operations.

Databases in Backend Systems

Disk-Based Databases

When developers refer to databases in backend systems, they typically mean:

Disk-based databases
Data stored in:
- HDD (Hard Disk Drive)
- SSD (Solid State Drive)

Why Disk Storage Instead of RAM?

RAM (Primary Memory)

Extremely fast
Limited capacity
Expensive

Typical RAM capacity:

8GB
16GB
32GB
Maximum ~128GB for most systems

Disk Storage (Secondary Memory)

Slower than RAM
Much cheaper
Much larger capacity

Typical disk storage:

512GB
1TB
2TB+

Trade-Off

Storage Type	Speed	Capacity	Cost
RAM	Very Fast	Small	Expensive
Disk	Slower	Large	Cheap

Use Case Difference

Caching systems (e.g., Redis) → store data in RAM
Databases (PostgreSQL, MongoDB) → store data on disk

Reason:

Databases prioritize large storage capacity
Caching prioritizes fast access

Database Management Systems (DBMS)

What is a DBMS?

A Database Management System (DBMS) is software that:

Stores data
Provides efficient access to data
Handles database operations

Examples:

PostgreSQL
MySQL
SQL Server
MongoDB

Responsibilities of a DBMS

1. Data Organization

Efficient storage structure.
Enables faster queries and updates.

2. Data Access

Provides methods for:

Create
Read
Update
Delete

3. Data Integrity

Ensures data is:

Accurate
Valid
Consistent

Example:

If a column stores payment amount, the DBMS ensures:
- Only numbers are inserted
- Strings or invalid data are rejected

4. Security

DBMS controls:

User access
Permissions
Roles

Prevents unauthorized access to data.

5. Scalability and Load Handling

Handles large datasets
Manages system load efficiently

Why Not Use Text Files Instead of Databases?

Before DBMS systems existed, data was often stored in plain text files.

However, this approach has major problems.

Problem 1: Parsing Overhead

Applications must manually:
- Read the file
- Parse each line
- Extract fields
- Compare values

Example workflow:

Read file
Split lines
Parse fields
Search for matching records

Issues:

Slow
Error-prone
Hard to maintain

Performance varies depending on programming language:

Rust → faster parsing
JavaScript/Python → slower compared to Rust

Problem 2: Lack of Structure

Text files have no enforced schema.

Consequences:

Any data format can be inserted.
Cannot enforce rules like:
- Numeric-only fields
- Required fields
- Data types

This leads to inconsistent data.

Problem 3: Concurrency Issues

Definition

Concurrency occurs when multiple users modify the same data simultaneously.

Example Scenario

Initial value:

amount = 40

Two users modify it simultaneously:

User A:

40 + 20 = 60

User B:

40 - 20 = 20

Possible final values:

Because updates overwrite each other unpredictably.

Problem

Text files lack concurrency control mechanisms.

Types of Database Management Systems

DBMS systems are broadly categorized into:

Relational Databases
Non-Relational Databases (NoSQL)

Relational Databases

Structure

Data is organized into:

Tables
Rows
Columns

Relationships between tables are defined using:

Foreign keys

Key Characteristics

Strict schema
Structured data
Predefined columns and types
Strong data integrity

Schema

A schema defines:

Table structure
Column names
Data types
Constraints

Data must follow this schema.

Advantages

High data integrity
Consistent data structure
Powerful relational queries

Examples

MySQL
PostgreSQL
SQL Server

Query Language

Relational databases use:

SQL (Structured Query Language)

Non-Relational Databases (NoSQL)

Structure

Data stored as:

Documents
Key-value pairs
Collections

Example (MongoDB):

Table → Collection
Row → Document

Flexible Schema

Unlike relational databases:

Each document may have different fields.
No strict schema requirement.

Advantages

High flexibility
Faster development for prototypes
Dynamic data structures

Example

MongoDB

Relational vs Non-Relational Databases

Feature	Relational DB	Non-Relational DB
Schema	Strict	Flexible
Data Format	Structured	Semi-structured
Relationships	Strong	Limited
Data Integrity	Strong	Weaker
Query Language	SQL	Varies

Example Use Cases

CRM System (Relational Database)

CRM = Customer Relationship Management

Stores:

Customer contacts
Sales information
Transaction data

Requirements:

Accurate data
Complex relationships
Strong data consistency

Best fit:

Relational database (PostgreSQL)

CMS System (Non-Relational Database)

CMS = Content Management System

Example: blogging platform.

Content may include:

Text
Images
Code blocks
Embedded videos

Because structure varies widely:

MongoDB or NoSQL databases are suitable.
In MongoDB, we will have to ensure data integrity in the Application layer.

Why PostgreSQL is a Popular Choice

1. Open Source

Free
Fully accessible source code
Can be self-hosted

2. SQL Standard Compliance

Queries follow SQL standards.
Easy to migrate between databases.

Example migration:

PostgreSQL → MySQL

Requires minimal query changes.

3. Extensibility

PostgreSQL offers:

Many built-in features
Extension support
Custom capabilities

Documentation exceeds 1400 pages.

4. Reliability and Scalability

Widely trusted in production environments.

Used by:

Startups
Large enterprises

5. Excellent JSON Support

PostgreSQL supports:

JSON
JSONB (binary JSON)

This allows handling dynamic data structures similar to NoSQL databases.

Example:

CMS content stored as JSON inside PostgreSQL.

SQL Basics

SQL Definition

SQL = Structured Query Language

Used for:

Querying data
Updating data
Managing databases

SQL runs on DBMS systems like PostgreSQL.

PostgreSQL Data Types

Auto-Increment Types

SERIAL

Integer
Automatically increments for each new row

Example sequence:

BIGSERIAL

Same as SERIAL but supports larger numeric range.

Used for production systems.

Integer Types

PostgreSQL provides multiple integer sizes:

Type	Capacity
SMALLINT	Small numbers
INTEGER	Medium numbers
BIGINT	Very large numbers

Numeric Types

DECIMAL / NUMERIC

Used for precise numeric values.

Example:

DECIMAL(10,2)

Meaning:

Total digits = 10
Decimal digits = 2

Example valid value:

12345678.90

Use Case

Financial values:

Prices
Payment amounts

Accuracy is critical.

Floating Point Types

Examples:

FLOAT
REAL
DOUBLE PRECISION

Characteristics:

Faster calculations
Less accurate

Use when:

Small precision errors are acceptable.

Example:

Scientific calculations
Measurements

Tradeoff:-

Decimal/ Numeric are more accurate than FLOAT / REAL but does Faster Calculation.

String Data Types

CHAR(n)

Fixed length
Pads extra spaces if text is shorter.

Example:

CHAR(10)
"AB"
Stored as: "AB        "

VARCHAR(n)

Variable length
Maximum size defined

Example:

VARCHAR(255)

Use CHAR for same lenght if you have like MO - monday, TO - tuesday. It will save the variable space taken by Varchar.

TEXT

Unlimited length string
Recommended in PostgreSQL.

Reason

No performance difference compared to VARCHAR
Avoids schema changes when text length increases.
TEXT has indexing as well as same efficiency in performance, contrary to the common belief of not having it.

Boolean and Time Types

Boolean

Stores:

TRUE
FALSE

Date

Stores date only.

Example:

2026-03-13

Time

Stores time only.

Example:

12:30:45

TIMESTAMP

Stores:

Date
Time

Timestamp with Time Zone - TIMESTAMPZ

Stores:

Date
Time
Time zone

UUID

UUID = Universally Unique Identifier

Example:

550e8400-e29b-41d4-a716-446655440000

Used as:

Primary keys
Unique identifiers across systems

Advantages:

Extremely low collision probability
Good for distributed systems

Ex:-
gen_random_uuid()

JSON Data Types

JSON

Stored as plain text JSON.

JSONB

Binary JSON format used internally by PostgreSQL.

Advantages:

Faster querying
Better indexing

Preferred over JSON in most cases.

Arrays

PostgreSQL allows storing arrays.

Example:

Integer arrays
JSON arrays
Date arrays

ENUM (Custom type Properties)

Only Allowed Values
Data Integrity
Documentation
When creating ENUM, we should have one default value to make things easier for us.

Other Data Types

INET (IP Address)
MACADDR
POINT (Geometrical Points)
XML

Constraints (Priority wise)

PRIMARY KEY
NOT NULL
UNIQUE
DEFAULT ( 'CURRENT_TIMESTAMP', '1')
CHECK
Foreign Key (Iput should have a record in the Remote table)
REFERENCES table

ON DELETE (Referential Integrity)

RESTRICT (Restrict the remote one if a record is found in the current Table)
CASCADE (Delete both)
SET NULL (Not if Field is set as NOT NULL)
SET DEFAULT

Database Migrations

What are Migrations?

Migrations are version-controlled database schema changes.

Instead of manually running SQL commands, developers use migration files.

Example structure:

db/
 └ migrations/
     ├ 001.sql
     ├ 002.sql
     ├ 003.sql

Each file contains SQL statements.

Migration Tool Workflow

Migration tools:

Read migration files sequentially
Execute SQL commands
Track current database version

Example tools:

dbmate
go-migrate

Types of Migrations

Up Migration

Applies new changes: 'dbmate up'

Examples:

Create table
Add column
Create index

Down Migration

Reverts previous changes.

Used for:

Rollbacks
Fixing production errors
Created in Reverse Order to migrate down

Example:

DROP TABLE users;

Benefits of Migrations

1. Version Control

Database changes are tracked in Git.

2. Rollbacks

If something breaks in production:

Database can revert to previous state.

Database Relationships

One-to-One Relationship

Each record in table A corresponds to exactly one record in table B.

Example:

users
user_profiles

Implementation:

user_profiles.user_id = users.id

One-to-Many Relationship

One record in table A corresponds to multiple records in table B. We don't keep it as Primary Key, but keep it as Foreign Key.

Example:

projects → tasks

Implementation:

tasks.project_id references projects.id

Many-to-Many Relationship

Multiple records from both tables relate to each other.

Example:

users ↔ projects

Solution:

Create linking table

Example:

project_members

Contains:

project_id
user_id

Composite primary key:

(project_id, user_id)

Indexing

What is an Index?

An index is a lookup structure that speeds up queries.

Analogy:

Similar to the index page in a book.

Instead of scanning every row, the database:

Looks up the index
Finds row location
Fetches data directly

Without Index

Database performs sequential scan.

For large datasets (millions of rows):

Very slow

With Index

Database performs direct lookup.

Much faster queries.

When to Use Indexes

Create indexes for fields used in:

WHERE clauses
JOIN conditions
ORDER BY clauses

Database Triggers

What is a Trigger?

A trigger is a database function automatically executed when certain events occur.

Example:

Automatically update updated_at column whenever a row changes.

Trigger workflow:

Row updated
Trigger fires
Timestamp updated automatically

Seeding Data

What is Seeding?

Seeding = inserting test data into a database.

'dbmate new seed_data'

Purpose:

Development testing
Simulating real user data

Typically done using migration scripts.

Backend Engineer Database Workflow

Typical backend database workflow:

Design API endpoints
Model database schema
Write migration files
Create indexes and constraints
Write SQL queries
Use parameterized queries
Return results to frontend

Parameterized Queries

Definition

Parameterized queries allow safe insertion of user inputs into SQL queries.

Purpose:

Prevent SQL Injection attacks

Example:

Instead of:

SELECT * FROM users WHERE id = " + userInput

Use:

SELECT * FROM users WHERE id = ?

User input is treated as data, not executable SQL.

Importance of Databases in Backend Development​

Persistence and the Need for Databases

What is Persistence?​

Example: To-Do List Application​

Importance of Persistence​

What is a Database?

Broad Definition​

Core Operations of a Database (CRUD)​

Databases in Backend Systems

Disk-Based Databases​

Why Disk Storage Instead of RAM?​

RAM (Primary Memory)​

Disk Storage (Secondary Memory)​

Trade-Off​

Use Case Difference​

Database Management Systems (DBMS)

What is a DBMS?​

Responsibilities of a DBMS​

1. Data Organization​

2. Data Access​

3. Data Integrity​

4. Security​

5. Scalability and Load Handling​

Why Not Use Text Files Instead of Databases?

Problem 1: Parsing Overhead​

Problem 2: Lack of Structure​

Problem 3: Concurrency Issues​

Definition​

Example Scenario​

Problem​

Types of Database Management Systems

Relational Databases

Structure​

Key Characteristics​

Schema​

Advantages​

Examples​

Query Language​

Non-Relational Databases (NoSQL)

Structure​

Flexible Schema​

Advantages​

Example​

Relational vs Non-Relational Databases

Example Use Cases

CRM System (Relational Database)​

CMS System (Non-Relational Database)​

Why PostgreSQL is a Popular Choice

1. Open Source​

2. SQL Standard Compliance​

3. Extensibility​

4. Reliability and Scalability​

5. Excellent JSON Support​

SQL Basics

SQL Definition​

PostgreSQL Data Types

Auto-Increment Types​

SERIAL​

BIGSERIAL​

Integer Types

Numeric Types

DECIMAL / NUMERIC​

Use Case​

Floating Point Types​

Tradeoff:-​

Decimal/ Numeric are more accurate than FLOAT / REAL but does Faster Calculation.​

String Data Types

CHAR(n)​

VARCHAR(n)​

Use CHAR for same lenght if you have like MO - monday, TO - tuesday. It will save the variable space taken by Varchar.​

TEXT​

Reason​

Boolean and Time Types

Boolean​

Date​

Time​

TIMESTAMP​

Timestamp with Time Zone - TIMESTAMPZ​

UUID

JSON Data Types

Importance of Databases in Backend Development

What is Persistence?

Example: To-Do List Application

Importance of Persistence

Broad Definition

Core Operations of a Database (CRUD)

Disk-Based Databases

Why Disk Storage Instead of RAM?

RAM (Primary Memory)

Disk Storage (Secondary Memory)

Trade-Off

Use Case Difference

What is a DBMS?

Responsibilities of a DBMS

1. Data Organization

2. Data Access

3. Data Integrity

4. Security

5. Scalability and Load Handling

Problem 1: Parsing Overhead

Problem 2: Lack of Structure

Problem 3: Concurrency Issues

Definition

Example Scenario

Problem

Structure

Key Characteristics

Schema

Advantages

Examples

Query Language

Structure

Flexible Schema

Advantages

Example

CRM System (Relational Database)

CMS System (Non-Relational Database)

1. Open Source

2. SQL Standard Compliance

3. Extensibility

4. Reliability and Scalability

5. Excellent JSON Support

SQL Definition

Auto-Increment Types

SERIAL

BIGSERIAL

DECIMAL / NUMERIC

Use Case

Floating Point Types

Tradeoff:-

Decimal/ Numeric are more accurate than FLOAT / REAL but does Faster Calculation.

CHAR(n)

VARCHAR(n)

Use CHAR for same lenght if you have like MO - monday, TO - tuesday. It will save the variable space taken by Varchar.

TEXT

Reason

Boolean

Date

Time

TIMESTAMP

Timestamp with Time Zone - TIMESTAMPZ

JSON

JSONB

What are Migrations?

Migration Tool Workflow

Up Migration

Down Migration

1. Version Control

2. Rollbacks

One-to-One Relationship

One-to-Many Relationship

Many-to-Many Relationship

What is an Index?

Without Index

With Index

When to Use Indexes

What is a Trigger?

What is Seeding?

Definition