kiran sabne's blog

Utilize Table Inheritance in PostgreSQL for a More Efficient Database Design

kiran sabne — Mon, 29 May 2023 07:11:04 GMT

In every application development process, we often encounter scenarios where entities share common attributes while also possessing unique characteristics. We manage these parent-child-like relationships between entities using inheritance logic in our programs. Similarly, the PostgreSQL RDBMS provides a solution for addressing these cases through table inheritance. In this blog post, we will delve into the concept of table inheritance in PostgreSQL, discussing its advantages, disadvantages, and practical use cases.

Introduction & Definition

Table inheritance is a feature in PostgreSQL that allows you to create a hierarchy of tables based on a parent-child relationship. The child tables inherit the structure, constraints, and attributes of the parent table, while also having its definitions and additional attributes. This approach provides a convenient way to manage related data and simplify your database schema. To understand, let's see a simple and widely used scenario.

Suppose we have an application like HRMS or something wherein we have to manage employee data, and the organization has different types/categories of employees like full-time, part-time, contracts, special consultants etc. Each of these employees has some common attributes like first name, last name, email, joining date, date of birth etc but also specific attributes unique to those employee types. Below DDL script will help understand more,
Create the parent table called "employees" with common attributes:

-- Create the parent table called "employees" with common attributes:CREATE TABLE employees (    id SERIAL PRIMARY KEY,    name VARCHAR(100),    email VARCHAR(100),    hire_date DATE);-- Create child tables for each type of employee, inheriting from the "employees" table:CREATE TABLE full_time_employees () INHERITS (employees);CREATE TABLE part_time_employees () INHERITS (employees);CREATE TABLE contractors () INHERITS (employees);-- Add specific attributes to each child table:-- Adding columns to the full-time employees tableALTER TABLE full_time_employees ADD COLUMN salary NUMERIC(10, 2);ALTER TABLE full_time_employees ADD COLUMN vacation_days INTEGER;-- Adding columns to the part-time employees tableALTER TABLE part_time_employees ADD COLUMN hourly_rate NUMERIC(10, 2);ALTER TABLE part_time_employees ADD COLUMN hours_worked INTEGER;-- Adding columns to the contractors tableALTER TABLE contractors ADD COLUMN contract_rate NUMERIC(10, 2);ALTER TABLE contractors ADD COLUMN contract_duration INTEGER;

In this example, the parent table "employees" contains common attributes shared by all employees. The child tables, such as "full_time_employees", "part_time_employees", and "contractors", inherit these common attributes and allow for the addition of specific attributes related to each employee type.

With the use of table inheritance, we have done,

Maintained a centralized employee table for common attributes and shared functionality.
For unique attributes of each employee type, we created separate child tables for types with those attributes, ensuring data integrity and clarity.
Retrieval and Filtering of data while performing queries specific to each employee type using the child tables.

This approach provides flexibility in managing different types of employees while maintaining a consistent structure and enabling specific attributes for each employee type.

Similarly, it can be implemented for various product categories in an e-commerce database. Each product category has its unique attributes other than the common attributes. The parent table will be a product table having common attributes like product name, price, etc and child tables having attributes specific to each product type. This design can also be used in designing content management systems, where types of content are different.

Pros:

Table inheritance allows you to organize your data with the parent table containing common attributes shared by all child tables, while each child table can have its specific attributes. This logical organization makes it easier to manage and query data.
By centralizing common attributes in the parent table, you avoid duplicating columns across multiple tables.
Constraints defined on the parent table are automatically enforced on all child tables. This ensures data integrity and consistency throughout the inheritance hierarchy.
With table inheritance, you can perform queries on specific child tables to retrieve data relevant to a particular entity type. This allows for efficient filtering and retrieval of data based on specific attributes.

Cons:

The query execution planner will have to consider the structure and constraints defined for both parent and child tables, in turn adding complexity to query planning and can result in slightly longer planning time.
PostgreSQL uses indexes defined on the specific table, if any, for querying that table. This means that you may need to create separate indexes for each child table to optimize query performance.
Table inheritance is often used for dividing or partitioning large tables into more manageable chunks, which is also known as data segmentation. But it introduces additional complexity in making complex execution plans and may involve querying multiple child tables, causing a performance drop.
When performing maintenance operations like VACUUM or ANALYZE on a parent table, PostgreSQL will also process the child tables. This can increase the time required for these operations, especially if the inherited tables contain a significant amount of data.

Summary:

Table inheritance in PostgreSQL offers a robust method for organizing related data, enhancing code reusability, and preserving data integrity, which can streamline your database schema, minimize redundancy, and boost query adaptability. When devising your database, assess the entities, their shared attributes, and unique traits to establish if table inheritance is an appropriate strategy. Keep in mind the importance of meticulously designing your database schema and taking into account the particular requirements and access patterns of your application when employing table inheritance.

Handling Exceptions in Postgres

kiran sabne — Tue, 11 Apr 2023 09:36:43 GMT

In PostgreSQL, exception handling is implemented using the PL/pgSQL procedural language, which extends the capabilities of SQL with additional programming constructs, such as variables, loops, and conditionals. PL/pgSQL provides a comprehensive exception-handling mechanism that enables developers to catch and handle a wide range of errors that may occur during the execution of database functions and procedures.

PostgreSQL stops the execution of the block and the related transaction when a block contains an error. A block is a collection of statements that are contained within a BEGIN and END block structure in PL/pgSQL. In PostgreSQL, blocks are used to define unique procedures, triggers, and functions. The beginning and end of the block are indicated by the terms BEGIN and END, respectively.

Syntax

BEGIN    -- Code goes hereEXCEPTION    WHEN exception_type THEN        -- Exception handling code goes hereEND;

Besides Begin and End block, the EXCEPTION keyword indicates the start of the exception handling section, which is executed if an exception is thrown, inside which the WHEN clause specifies the type of exception that the exception handler will handle. The THEN keyword indicates the start of the code block that handles the exception.

PostgreSQL has a wide range of built-in exception types such as SQLSTATE, SQLERRM, NO_DATA_FOUND, and TOO_MANY_ROWS. Moreover, users can also create custom exceptions using the RAISE statement. For a complete list of condition names on the PostgreSQL website. To illustrate how PostgreSQL handles exceptions, let's take the basic example of a divide-by-zero exception. The PL/pgSQL block below demonstrates how to handle this exception:

do$$declare     result int;begin    SELECT 1/0 INTO result;    -- exception example based on SQL ERROR CODE    exception        WHEN division_by_zero THEN            result := NULL;end;$$language plpgsql;

In the above block, if the SELECT statement attempts to divide by zero, a division_by_zero exception is raised. The exception handling code, specified in the WHEN clause, sets the result variable to NULL in this case.

When an error occurs within the BEGIN...EXCEPTION block in PL/pgSQL, the execution is stopped and the control is transferred to the exception list. Then, PL/pgSQL scans the exception list to find the first match for the error that occurred. If a match is found, the statements inside the corresponding EXCEPTION block execute, and the control passes to the statement after the END keyword. If no match is found, the error propagates outwards and can be caught by the EXCEPTION clause of the enclosing block. In case there is no enclosing block with the EXCEPTION clause, PL/pgSQL aborts processing.

Example Syntax Code block for Multiple Exceptions:

do$$declare    rec record;    emp_name varchar = 'abc';begin    select             into strict rec    from     where employee_name = emp_name; -- example where clause.    -- exception example based on SQL ERROR CODE    exception        when sqlstate 'P0002' then            raise exception 'employee with name % not found', emp_name;        when sqlstate 'P0003' then            raise exception 'employee with name % is already present', emp_name;    -- exception example based on Expection Condition    exception        when too_many_rows then            raise exception 'Search query returns more than one rows';end;$$language plpgsql;

This exception-handling mechanism is crucial for handling errors gracefully and preventing application crashes or data corruption. It allows developers to provide better user experiences by responding appropriately to errors, rather than simply halting execution.

Traveling Salesman Problem Path Finder in SQL

kiran sabne — Tue, 04 Apr 2023 12:06:40 GMT

I came across a situation to find the path for a salesman based on city locations and roads connecting these cities. The base version of the scenario was the same problem as Traveling Salesman Problem. I wanted to try solving the problem with SQL.

Below is the problem statement, I got on internet for Traveling Salesman Problem, try reading through the inputs and outputs required. And I also posted solutions I made in both T-SQL and PostgreSQL variants.

First, is the City table containing two columns, Id and Name. This time, we're traveling between cities in Germany and we've been given a second table called Road that has the columns CityFrom, CityTo, and Time which contain average trip durations on a given route from one city to another. Below is create statement and data to populate rows.

create schema tripcreate table trip.city(    id int identity(1,1), -- in postgresl use serial datatype instead    name varchar(100));insert into trip.city(name) values ('Berlin'), ('Hamburg'), ('Muinch'), ('Amsterdam');select * from trip.city;create table trip.road(    city_from int,    city_to int,    time int);insert into trip.road(city_from, city_to, time) values (1, 2, 180), (1, 3, 365), (1, 4, 390), (2, 3, 490), (2, 4, 270), (3, 2, 420), (3, 4, 370), (4, 2, 245), (4, 3, 385);select *  from trip.road;

The trip path should cover all cities, starting from Berlin. The main task is to return travel paths starting from Berlin and covering all cities in descending order by the total time taken.

The output should contain the following columns

path the city names, separated by N' -> ',
last_city_id the ID of the last city visited,
total_time the total time spent driving,
places_count the number of places visited; it should equal 4.

The below solution is T-SQL(SQL Server) based variant:

;with travel (path, last_city_id, total_time, places_count )as(    select        cast(name as nvarchar(max)),        id,        0,        1    from trip.city c    where c.name = 'Berlin' -- since starting from berlin, then we initially return the anchor point    union all     select        t.path + N' -> ' + c.name, -- concate the city name with existing travel path name         c.id,         t.total_time + r.time,        t.places_count + 1    from travel t    join trip.road r        on t.last_city_id= r.city_from    join trip.city c        on c.id = r.city_to    where charindex(c.name, t.path) = 0 -- where part was to prevent revisiting the same city twice in a path)select * from travel where places_count = 4 order by total_time desc;

And below is PostgreSQL varient

WITH RECURSIVE travel (path, last_city_id, total_time, places_count) AS (  SELECT    CAST(name AS text),    id,    0,    1  FROM trip.city c  WHERE c.name = 'Berlin'  UNION ALL  SELECT    t.path || ' -> ' || c.name,    c.id,    t.total_time + r.time,    t.places_count + 1  FROM travel t  JOIN trip.road r    ON t.last_city_id = r.city_from  JOIN trip.city c    ON c.id = r.city_to  WHERE position(c.name IN t.path) = 0)SELECT *FROM travelWHERE places_count = 4ORDER BY total_time DESC;

In the codes above, we used common table expression (CTE) to recursively build up a list of travel paths between cities. The travel CTE contains four columns: path (the travel path so far), last_city_id (the ID of the last city visited), total_time (the total travel time so far), and places_count (the number of places visited so far).

The first part of the CTE returns to the starting point in Berlin. Then, the union all clause is used to recursively build up the travel paths. In the second part of the CTE, you're joining the travel CTE with the road and city tables to find the next city to visit. The charindex function (in T-SQL) or position function (in PostgreSQL) is used to make sure you don't revisit any cities on the same path. You can also use the substring function in PostgreSQL as an alternative.

Finally, the SELECT statement is used to return all travel paths that visit all four cities and are ordered in descending order by total travel time.