Skip to main content

Databases for Backend Systems

Importance of Databases in Backend Development

  • Databases are one of the most frequent components used in backend systems.

  • Backend engineers constantly interact with databases for:

    • Storing data
    • Retrieving data
    • Updating data
    • Deleting data
  • Understanding database concepts is crucial for efficient backend development.


Persistence and the Need for Databases

What is Persistence?

  • Persistence means storing data so that it survives after the program stops running.

  • Data remains available across:

    • Different sessions
    • Different system states
    • Long periods of time
    • Different physical locations

Example: To-Do List Application

  • A user:

    • Creates tasks
    • Marks tasks as completed
  • When the app is closed and reopened:

    • The previous state must remain intact.
  • Without persistence:

    • All tasks would disappear every time the app restarts.

Importance of Persistence

  • Used extensively in everyday applications.
  • Ensures data continuity and reliability.

What is a Database?

Broad Definition

  • A database is any structured storage system used to persist data.

  • Examples of databases include:

    • Phone contact lists
    • Browser local storage
    • Cookies
    • Session storage
    • Text files storing structured information

Core Operations of a Database (CRUD)

A database system provides methods to:

  • Create data
  • Read data
  • Update data
  • Delete data

These operations are commonly called CRUD operations.


Databases in Backend Systems

Disk-Based Databases

When developers refer to databases in backend systems, they typically mean:

  • Disk-based databases

  • Data stored in:

    • HDD (Hard Disk Drive)
    • SSD (Solid State Drive)

Why Disk Storage Instead of RAM?

RAM (Primary Memory)

  • Extremely fast
  • Limited capacity
  • Expensive

Typical RAM capacity:

  • 8GB
  • 16GB
  • 32GB
  • Maximum ~128GB for most systems

Disk Storage (Secondary Memory)

  • Slower than RAM
  • Much cheaper
  • Much larger capacity

Typical disk storage:

  • 512GB
  • 1TB
  • 2TB+

Trade-Off

Storage TypeSpeedCapacityCost
RAMVery FastSmallExpensive
DiskSlowerLargeCheap

Use Case Difference

  • Caching systems (e.g., Redis) → store data in RAM
  • Databases (PostgreSQL, MongoDB) → store data on disk

Reason:

  • Databases prioritize large storage capacity
  • Caching prioritizes fast access

Database Management Systems (DBMS)

What is a DBMS?

A Database Management System (DBMS) is software that:

  • Stores data
  • Provides efficient access to data
  • Handles database operations

Examples:

  • PostgreSQL
  • MySQL
  • SQL Server
  • MongoDB

Responsibilities of a DBMS

1. Data Organization

  • Efficient storage structure.
  • Enables faster queries and updates.

2. Data Access

Provides methods for:

  • Create
  • Read
  • Update
  • Delete

3. Data Integrity

Ensures data is:

  • Accurate
  • Valid
  • Consistent

Example:

  • If a column stores payment amount, the DBMS ensures:

    • Only numbers are inserted
    • Strings or invalid data are rejected

4. Security

DBMS controls:

  • User access
  • Permissions
  • Roles

Prevents unauthorized access to data.

5. Scalability and Load Handling

  • Handles large datasets
  • Manages system load efficiently

Why Not Use Text Files Instead of Databases?

Before DBMS systems existed, data was often stored in plain text files.

However, this approach has major problems.


Problem 1: Parsing Overhead

  • Applications must manually:

    • Read the file
    • Parse each line
    • Extract fields
    • Compare values

Example workflow:

  1. Read file
  2. Split lines
  3. Parse fields
  4. Search for matching records

Issues:

  • Slow
  • Error-prone
  • Hard to maintain

Performance varies depending on programming language:

  • Rust → faster parsing
  • JavaScript/Python → slower compared to Rust

Problem 2: Lack of Structure

Text files have no enforced schema.

Consequences:

  • Any data format can be inserted.

  • Cannot enforce rules like:

    • Numeric-only fields
    • Required fields
    • Data types

This leads to inconsistent data.


Problem 3: Concurrency Issues

Definition

Concurrency occurs when multiple users modify the same data simultaneously.

Example Scenario

Initial value:

amount = 40

Two users modify it simultaneously:

User A:

40 + 20 = 60

User B:

40 - 20 = 20

Possible final values:

  • 60
  • 20

Because updates overwrite each other unpredictably.

Problem

Text files lack concurrency control mechanisms.


Types of Database Management Systems

DBMS systems are broadly categorized into:

  1. Relational Databases
  2. Non-Relational Databases (NoSQL)

Relational Databases

Structure

Data is organized into:

  • Tables
  • Rows
  • Columns

Relationships between tables are defined using:

  • Foreign keys

Key Characteristics

  • Strict schema
  • Structured data
  • Predefined columns and types
  • Strong data integrity

Schema

A schema defines:

  • Table structure
  • Column names
  • Data types
  • Constraints

Data must follow this schema.

Advantages

  • High data integrity
  • Consistent data structure
  • Powerful relational queries

Examples

  • MySQL
  • PostgreSQL
  • SQL Server

Query Language

Relational databases use:

SQL (Structured Query Language)


Non-Relational Databases (NoSQL)

Structure

Data stored as:

  • Documents
  • Key-value pairs
  • Collections

Example (MongoDB):

  • Table → Collection
  • Row → Document

Flexible Schema

Unlike relational databases:

  • Each document may have different fields.
  • No strict schema requirement.

Advantages

  • High flexibility
  • Faster development for prototypes
  • Dynamic data structures

Example

  • MongoDB

Relational vs Non-Relational Databases

FeatureRelational DBNon-Relational DB
SchemaStrictFlexible
Data FormatStructuredSemi-structured
RelationshipsStrongLimited
Data IntegrityStrongWeaker
Query LanguageSQLVaries

Example Use Cases

CRM System (Relational Database)

CRM = Customer Relationship Management

Stores:

  • Customer contacts
  • Sales information
  • Transaction data

Requirements:

  • Accurate data
  • Complex relationships
  • Strong data consistency

Best fit:

  • Relational database (PostgreSQL)

CMS System (Non-Relational Database)

CMS = Content Management System

Example: blogging platform.

Content may include:

  • Text
  • Images
  • Code blocks
  • Embedded videos

Because structure varies widely:

  • MongoDB or NoSQL databases are suitable.
  • In MongoDB, we will have to ensure data integrity in the Application layer.

Why PostgreSQL is a Popular Choice

1. Open Source

  • Free
  • Fully accessible source code
  • Can be self-hosted

2. SQL Standard Compliance

  • Queries follow SQL standards.
  • Easy to migrate between databases.

Example migration:

  • PostgreSQL → MySQL

Requires minimal query changes.

3. Extensibility

PostgreSQL offers:

  • Many built-in features
  • Extension support
  • Custom capabilities

Documentation exceeds 1400 pages.

4. Reliability and Scalability

Widely trusted in production environments.

Used by:

  • Startups
  • Large enterprises

5. Excellent JSON Support

PostgreSQL supports:

  • JSON
  • JSONB (binary JSON)

This allows handling dynamic data structures similar to NoSQL databases.

Example:

  • CMS content stored as JSON inside PostgreSQL.

SQL Basics

SQL Definition

SQL = Structured Query Language

Used for:

  • Querying data
  • Updating data
  • Managing databases

SQL runs on DBMS systems like PostgreSQL.


PostgreSQL Data Types

image

Auto-Increment Types

SERIAL

  • Integer
  • Automatically increments for each new row

Example sequence:

1
2
3
4

BIGSERIAL

Same as SERIAL but supports larger numeric range.

Used for production systems.


Integer Types

PostgreSQL provides multiple integer sizes:

TypeCapacity
SMALLINTSmall numbers
INTEGERMedium numbers
BIGINTVery large numbers

Numeric Types

DECIMAL / NUMERIC

Used for precise numeric values.

Example:

DECIMAL(10,2)

Meaning:

  • Total digits = 10
  • Decimal digits = 2

Example valid value:

12345678.90

Use Case

Financial values:

  • Prices
  • Payment amounts

Accuracy is critical.


Floating Point Types

Examples:

  • FLOAT
  • REAL
  • DOUBLE PRECISION

Characteristics:

  • Faster calculations
  • Less accurate

Use when:

  • Small precision errors are acceptable.

Example:

  • Scientific calculations
  • Measurements

Tradeoff:-

Decimal/ Numeric are more accurate than FLOAT / REAL but does Faster Calculation.

String Data Types

CHAR(n)

  • Fixed length
  • Pads extra spaces if text is shorter.

Example:

CHAR(10)
"AB"
Stored as: "AB "

VARCHAR(n)

  • Variable length
  • Maximum size defined

Example:

VARCHAR(255)

Use CHAR for same lenght if you have like MO - monday, TO - tuesday. It will save the variable space taken by Varchar.

TEXT

  • Unlimited length string
  • Recommended in PostgreSQL.

Reason

  • No performance difference compared to VARCHAR
  • Avoids schema changes when text length increases.
  • TEXT has indexing as well as same efficiency in performance, contrary to the common belief of not having it.

Boolean and Time Types

Boolean

Stores:

TRUE
FALSE

Date

Stores date only.

Example:

2026-03-13

Time

Stores time only.

Example:

12:30:45

TIMESTAMP

Stores:

  • Date
  • Time

Timestamp with Time Zone - TIMESTAMPZ

Stores:

  • Date
  • Time
  • Time zone

UUID

UUID = Universally Unique Identifier

Example:

550e8400-e29b-41d4-a716-446655440000

Used as:

  • Primary keys
  • Unique identifiers across systems

Advantages:

  • Extremely low collision probability

  • Good for distributed systems

    Ex:-

  • gen_random_uuid()


JSON Data Types

JSON

Stored as plain text JSON.


JSONB

Binary JSON format used internally by PostgreSQL.

Advantages:

  • Faster querying
  • Better indexing

Preferred over JSON in most cases.


Arrays

PostgreSQL allows storing arrays.

Example:

  • Integer arrays
  • JSON arrays
  • Date arrays

ENUM (Custom type Properties)

image
  • Only Allowed Values
  • Data Integrity
  • Documentation
  • When creating ENUM, we should have one default value to make things easier for us.

Other Data Types

  • INET (IP Address)
  • MACADDR
  • POINT (Geometrical Points)
  • XML

image

Constraints (Priority wise)

  • PRIMARY KEY

  • NOT NULL

  • UNIQUE

  • DEFAULT ( 'CURRENT_TIMESTAMP', '1')

  • CHECK

  • image
  • Foreign Key (Iput should have a record in the Remote table)

  • image
  • REFERENCES table

ON DELETE (Referential Integrity)

  • RESTRICT (Restrict the remote one if a record is found in the current Table)
  • CASCADE (Delete both)
  • SET NULL (Not if Field is set as NOT NULL)
  • SET DEFAULT

Database Migrations

What are Migrations?

image

Migrations are version-controlled database schema changes.

Instead of manually running SQL commands, developers use migration files.

Example structure:

db/
└ migrations/
├ 001.sql
├ 002.sql
├ 003.sql

Each file contains SQL statements.


Migration Tool Workflow

Migration tools:

  1. Read migration files sequentially
  2. Execute SQL commands
  3. Track current database version

Example tools:

  • dbmate
  • go-migrate

Types of Migrations

image

Up Migration

Applies new changes: 'dbmate up'

Examples:

  • Create table
  • Add column
  • Create index

Down Migration

Reverts previous changes.

Used for:

  • Rollbacks
  • Fixing production errors
  • Created in Reverse Order to migrate down

Example:

image
DROP TABLE users;

Benefits of Migrations

1. Version Control

Database changes are tracked in Git.


2. Rollbacks

If something breaks in production:

  • Database can revert to previous state.

Database Relationships

One-to-One Relationship

Each record in table A corresponds to exactly one record in table B.

Example:

users
user_profiles

Implementation:

  • user_profiles.user_id = users.id

One-to-Many Relationship

One record in table A corresponds to multiple records in table B. We don't keep it as Primary Key, but keep it as Foreign Key.

Example:

projects → tasks

Implementation:

  • tasks.project_id references projects.id

Many-to-Many Relationship

Multiple records from both tables relate to each other.

Example:

users ↔ projects

Solution:

  • Create linking table

  • image

Example:

image image
project_members

Contains:

  • project_id
  • user_id

Composite primary key:

(project_id, user_id)

Indexing

What is an Index?

An index is a lookup structure that speeds up queries.

Analogy:

  • Similar to the index page in a book.

Instead of scanning every row, the database:

  1. Looks up the index
  2. Finds row location
  3. Fetches data directly

Without Index

Database performs sequential scan.

For large datasets (millions of rows):

  • Very slow

With Index

Database performs direct lookup.

Much faster queries.


When to Use Indexes

Create indexes for fields used in:

  • WHERE clauses
  • JOIN conditions
  • ORDER BY clauses

Database Triggers

What is a Trigger?

A trigger is a database function automatically executed when certain events occur.

Example:

  • Automatically update updated_at column whenever a row changes.

Trigger workflow:

  1. Row updated
  2. Trigger fires
  3. Timestamp updated automatically

Seeding Data

What is Seeding?

Seeding = inserting test data into a database.

'dbmate new seed_data'

Purpose:

  • Development testing
  • Simulating real user data

Typically done using migration scripts.

image

Backend Engineer Database Workflow

Typical backend database workflow:

  1. Design API endpoints
  2. Model database schema
  3. Write migration files
  4. Create indexes and constraints
  5. Write SQL queries
  6. Use parameterized queries
  7. Return results to frontend

Parameterized Queries

Definition

Parameterized queries allow safe insertion of user inputs into SQL queries.

Purpose:

  • Prevent SQL Injection attacks

Example:

Instead of:

SELECT * FROM users WHERE id = " + userInput

Use:

SELECT * FROM users WHERE id = ?

User input is treated as data, not executable SQL.


Ke