Databases for Backend Systems
Importance of Databases in Backend Development
-
Databases are one of the most frequent components used in backend systems.
-
Backend engineers constantly interact with databases for:
- Storing data
- Retrieving data
- Updating data
- Deleting data
-
Understanding database concepts is crucial for efficient backend development.
Persistence and the Need for Databases
What is Persistence?
-
Persistence means storing data so that it survives after the program stops running.
-
Data remains available across:
- Different sessions
- Different system states
- Long periods of time
- Different physical locations
Example: To-Do List Application
-
A user:
- Creates tasks
- Marks tasks as completed
-
When the app is closed and reopened:
- The previous state must remain intact.
-
Without persistence:
- All tasks would disappear every time the app restarts.
Importance of Persistence
- Used extensively in everyday applications.
- Ensures data continuity and reliability.
What is a Database?
Broad Definition
-
A database is any structured storage system used to persist data.
-
Examples of databases include:
- Phone contact lists
- Browser local storage
- Cookies
- Session storage
- Text files storing structured information
Core Operations of a Database (CRUD)
A database system provides methods to:
- Create data
- Read data
- Update data
- Delete data
These operations are commonly called CRUD operations.
Databases in Backend Systems
Disk-Based Databases
When developers refer to databases in backend systems, they typically mean:
-
Disk-based databases
-
Data stored in:
- HDD (Hard Disk Drive)
- SSD (Solid State Drive)
Why Disk Storage Instead of RAM?
RAM (Primary Memory)
- Extremely fast
- Limited capacity
- Expensive
Typical RAM capacity:
- 8GB
- 16GB
- 32GB
- Maximum ~128GB for most systems
Disk Storage (Secondary Memory)
- Slower than RAM
- Much cheaper
- Much larger capacity
Typical disk storage:
- 512GB
- 1TB
- 2TB+
Trade-Off
| Storage Type | Speed | Capacity | Cost |
|---|---|---|---|
| RAM | Very Fast | Small | Expensive |
| Disk | Slower | Large | Cheap |
Use Case Difference
- Caching systems (e.g., Redis) → store data in RAM
- Databases (PostgreSQL, MongoDB) → store data on disk
Reason:
- Databases prioritize large storage capacity
- Caching prioritizes fast access
Database Management Systems (DBMS)
What is a DBMS?
A Database Management System (DBMS) is software that:
- Stores data
- Provides efficient access to data
- Handles database operations
Examples:
- PostgreSQL
- MySQL
- SQL Server
- MongoDB
Responsibilities of a DBMS
1. Data Organization
- Efficient storage structure.
- Enables faster queries and updates.
2. Data Access
Provides methods for:
- Create
- Read
- Update
- Delete
3. Data Integrity
Ensures data is:
- Accurate
- Valid
- Consistent
Example:
-
If a column stores payment amount, the DBMS ensures:
- Only numbers are inserted
- Strings or invalid data are rejected
4. Security
DBMS controls:
- User access
- Permissions
- Roles
Prevents unauthorized access to data.
5. Scalability and Load Handling
- Handles large datasets
- Manages system load efficiently
Why Not Use Text Files Instead of Databases?
Before DBMS systems existed, data was often stored in plain text files.
However, this approach has major problems.
Problem 1: Parsing Overhead
-
Applications must manually:
- Read the file
- Parse each line
- Extract fields
- Compare values
Example workflow:
- Read file
- Split lines
- Parse fields
- Search for matching records
Issues:
- Slow
- Error-prone
- Hard to maintain
Performance varies depending on programming language:
- Rust → faster parsing
- JavaScript/Python → slower compared to Rust
Problem 2: Lack of Structure
Text files have no enforced schema.
Consequences:
-
Any data format can be inserted.
-
Cannot enforce rules like:
- Numeric-only fields
- Required fields
- Data types
This leads to inconsistent data.
Problem 3: Concurrency Issues
Definition
Concurrency occurs when multiple users modify the same data simultaneously.
Example Scenario
Initial value:
amount = 40
Two users modify it simultaneously:
User A:
40 + 20 = 60
User B:
40 - 20 = 20
Possible final values:
- 60
- 20
Because updates overwrite each other unpredictably.
Problem
Text files lack concurrency control mechanisms.
Types of Database Management Systems
DBMS systems are broadly categorized into:
- Relational Databases
- Non-Relational Databases (NoSQL)
Relational Databases
Structure
Data is organized into:
- Tables
- Rows
- Columns
Relationships between tables are defined using:
- Foreign keys
Key Characteristics
- Strict schema
- Structured data
- Predefined columns and types
- Strong data integrity
Schema
A schema defines:
- Table structure
- Column names
- Data types
- Constraints
Data must follow this schema.
Advantages
- High data integrity
- Consistent data structure
- Powerful relational queries
Examples
- MySQL
- PostgreSQL
- SQL Server
Query Language
Relational databases use:
SQL (Structured Query Language)
Non-Relational Databases (NoSQL)
Structure
Data stored as:
- Documents
- Key-value pairs
- Collections
Example (MongoDB):
- Table → Collection
- Row → Document
Flexible Schema
Unlike relational databases:
- Each document may have different fields.
- No strict schema requirement.
Advantages
- High flexibility
- Faster development for prototypes
- Dynamic data structures
Example
- MongoDB
Relational vs Non-Relational Databases
| Feature | Relational DB | Non-Relational DB |
|---|---|---|
| Schema | Strict | Flexible |
| Data Format | Structured | Semi-structured |
| Relationships | Strong | Limited |
| Data Integrity | Strong | Weaker |
| Query Language | SQL | Varies |
Example Use Cases
CRM System (Relational Database)
CRM = Customer Relationship Management
Stores:
- Customer contacts
- Sales information
- Transaction data
Requirements:
- Accurate data
- Complex relationships
- Strong data consistency
Best fit:
- Relational database (PostgreSQL)
CMS System (Non-Relational Database)
CMS = Content Management System
Example: blogging platform.
Content may include:
- Text
- Images
- Code blocks
- Embedded videos
Because structure varies widely:
- MongoDB or NoSQL databases are suitable.
- In MongoDB, we will have to ensure data integrity in the Application layer.
Why PostgreSQL is a Popular Choice
1. Open Source
- Free
- Fully accessible source code
- Can be self-hosted
2. SQL Standard Compliance
- Queries follow SQL standards.
- Easy to migrate between databases.
Example migration:
- PostgreSQL → MySQL
Requires minimal query changes.
3. Extensibility
PostgreSQL offers:
- Many built-in features
- Extension support
- Custom capabilities
Documentation exceeds 1400 pages.
4. Reliability and Scalability
Widely trusted in production environments.
Used by:
- Startups
- Large enterprises
5. Excellent JSON Support
PostgreSQL supports:
- JSON
- JSONB (binary JSON)
This allows handling dynamic data structures similar to NoSQL databases.
Example:
- CMS content stored as JSON inside PostgreSQL.
SQL Basics
SQL Definition
SQL = Structured Query Language
Used for:
- Querying data
- Updating data
- Managing databases
SQL runs on DBMS systems like PostgreSQL.
PostgreSQL Data Types
Auto-Increment Types
SERIAL
- Integer
- Automatically increments for each new row
Example sequence:
1
2
3
4
BIGSERIAL
Same as SERIAL but supports larger numeric range.
Used for production systems.
Integer Types
PostgreSQL provides multiple integer sizes:
| Type | Capacity |
|---|---|
| SMALLINT | Small numbers |
| INTEGER | Medium numbers |
| BIGINT | Very large numbers |
Numeric Types
DECIMAL / NUMERIC
Used for precise numeric values.
Example:
DECIMAL(10,2)
Meaning:
- Total digits = 10
- Decimal digits = 2
Example valid value:
12345678.90
Use Case
Financial values:
- Prices
- Payment amounts
Accuracy is critical.
Floating Point Types
Examples:
- FLOAT
- REAL
- DOUBLE PRECISION
Characteristics:
- Faster calculations
- Less accurate
Use when:
- Small precision errors are acceptable.
Example:
- Scientific calculations
- Measurements
Tradeoff:-
Decimal/ Numeric are more accurate than FLOAT / REAL but does Faster Calculation.
String Data Types
CHAR(n)
- Fixed length
- Pads extra spaces if text is shorter.
Example:
CHAR(10)
"AB"
Stored as: "AB "
VARCHAR(n)
- Variable length
- Maximum size defined
Example:
VARCHAR(255)
Use CHAR for same lenght if you have like MO - monday, TO - tuesday. It will save the variable space taken by Varchar.
TEXT
- Unlimited length string
- Recommended in PostgreSQL.
Reason
- No performance difference compared to VARCHAR
- Avoids schema changes when text length increases.
- TEXT has indexing as well as same efficiency in performance, contrary to the common belief of not having it.
Boolean and Time Types
Boolean
Stores:
TRUE
FALSE
Date
Stores date only.
Example:
2026-03-13
Time
Stores time only.
Example:
12:30:45
TIMESTAMP
Stores:
- Date
- Time
Timestamp with Time Zone - TIMESTAMPZ
Stores:
- Date
- Time
- Time zone
UUID
UUID = Universally Unique Identifier
Example:
550e8400-e29b-41d4-a716-446655440000
Used as:
- Primary keys
- Unique identifiers across systems
Advantages:
-
Extremely low collision probability
-
Good for distributed systems
Ex:-
-
gen_random_uuid()
JSON Data Types
JSON
Stored as plain text JSON.
JSONB
Binary JSON format used internally by PostgreSQL.
Advantages:
- Faster querying
- Better indexing
Preferred over JSON in most cases.
Arrays
PostgreSQL allows storing arrays.
Example:
- Integer arrays
- JSON arrays
- Date arrays
ENUM (Custom type Properties)
- Only Allowed Values
- Data Integrity
- Documentation
- When creating ENUM, we should have one default value to make things easier for us.
Other Data Types
- INET (IP Address)
- MACADDR
- POINT (Geometrical Points)
- XML
Constraints (Priority wise)
-
PRIMARY KEY
-
NOT NULL
-
UNIQUE
-
DEFAULT ( 'CURRENT_TIMESTAMP', '1')
-
CHECK
-
-
Foreign Key (Iput should have a record in the Remote table)
-
-
REFERENCES table
ON DELETE (Referential Integrity)
- RESTRICT (Restrict the remote one if a record is found in the current Table)
- CASCADE (Delete both)
- SET NULL (Not if Field is set as NOT NULL)
- SET DEFAULT
Database Migrations
What are Migrations?
Migrations are version-controlled database schema changes.
Instead of manually running SQL commands, developers use migration files.
Example structure:
db/
└ migrations/
├ 001.sql
├ 002.sql
├ 003.sql
Each file contains SQL statements.
Migration Tool Workflow
Migration tools:
- Read migration files sequentially
- Execute SQL commands
- Track current database version
Example tools:
- dbmate
- go-migrate
Types of Migrations
Up Migration
Applies new changes: 'dbmate up'
Examples:
- Create table
- Add column
- Create index
Down Migration
Reverts previous changes.
Used for:
- Rollbacks
- Fixing production errors
- Created in Reverse Order to migrate down
Example:
DROP TABLE users;
Benefits of Migrations
1. Version Control
Database changes are tracked in Git.
2. Rollbacks
If something breaks in production:
- Database can revert to previous state.
Database Relationships
One-to-One Relationship
Each record in table A corresponds to exactly one record in table B.
Example:
users
user_profiles
Implementation:
- user_profiles.user_id = users.id
One-to-Many Relationship
One record in table A corresponds to multiple records in table B. We don't keep it as Primary Key, but keep it as Foreign Key.
Example:
projects → tasks
Implementation:
- tasks.project_id references projects.id
Many-to-Many Relationship
Multiple records from both tables relate to each other.
Example:
users ↔ projects
Solution:
-
Create linking table
-
Example:
project_members
Contains:
- project_id
- user_id
Composite primary key:
(project_id, user_id)
Indexing
What is an Index?
An index is a lookup structure that speeds up queries.
Analogy:
- Similar to the index page in a book.
Instead of scanning every row, the database:
- Looks up the index
- Finds row location
- Fetches data directly
Without Index
Database performs sequential scan.
For large datasets (millions of rows):
- Very slow
With Index
Database performs direct lookup.
Much faster queries.
When to Use Indexes
Create indexes for fields used in:
- WHERE clauses
- JOIN conditions
- ORDER BY clauses
Database Triggers
What is a Trigger?
A trigger is a database function automatically executed when certain events occur.
Example:
- Automatically update
updated_atcolumn whenever a row changes.
Trigger workflow:
- Row updated
- Trigger fires
- Timestamp updated automatically
Seeding Data
What is Seeding?
Seeding = inserting test data into a database.
'dbmate new seed_data'
Purpose:
- Development testing
- Simulating real user data
Typically done using migration scripts.
Backend Engineer Database Workflow
Typical backend database workflow:
- Design API endpoints
- Model database schema
- Write migration files
- Create indexes and constraints
- Write SQL queries
- Use parameterized queries
- Return results to frontend
Parameterized Queries
Definition
Parameterized queries allow safe insertion of user inputs into SQL queries.
Purpose:
- Prevent SQL Injection attacks
Example:
Instead of:
SELECT * FROM users WHERE id = " + userInput
Use:
SELECT * FROM users WHERE id = ?
User input is treated as data, not executable SQL.