In the future, I will see how the performance will be go bad with more records, but I will probably using inheritance function of PostgreSQL. I wrote some of my thoughts about it, in this article.
Because I don’t have too much ticket actually (~120 from the last months what I have migrated into this instance) it is not affecting the performance badly. But if their number will increase, it can make the query and insert times. For this, I will implement inheritance in the future. In this article, I will discuss this topic. What is it? Why it is good? How it can be handled with Entity Framework? I will try to cover these questions in this article.
What is inheritance? Why is it good?
Let us say, we have a table like tickets mentioned in the previous article. Now, that table contains every single row. It means that whenever I do a query, it check every single row in that table. After a many rows, this effect performance badly.
How could it be prevent to check many of those lines? One answer can be: split the data among more tables, for example one table for every months like: tickets_2021_01, tickets_2021_01, tickets:2021_02, and so on… Idea is good and PostgreSQL has a built-in feature to do this: it is called inheritance.
We would have one parent table, called tickets. And based on this table, it is possible to create other table using this as base. Other tables are connected with the parent table, so whenever I query from tickets table, query will check other tables where too. It makes sense to make CHECK on every other table to be assumed that only tickets are stored there which are within the interval. It makes query more efficient, because query will ignore those tables where the CHECK tells that data cannot be there.
For example: let’s say, we have tickets with 12 million records from the last 6 months. When we make query to list March tickets, it will check every 12 million lines. But with this inheritance it will check that table where there can be March records.
How to use it from Entity Framework?
When I wrote this article, I do not know that EF would support it natively. From query view, there is no issue, if parent table is queried that will query the children too (where the query criteria can be applied). From inserting view, as I know, there is no built in checking function onto EF to figure out which child table should be inserted. So I can see 2 possible solution:
- Create model in EF for every child table or
- Create a trigger in PostgreSQL, when parent table got an insert, move it to child table instead
In my opinion, the 2. option is better and simpler. By that action, we can use this inheritance feature without changing anything in our EF definitions. I will drive through a sample, about this trigger implementation with a sample database and tables.
How to use INSERT trigger with inherited tables?
First, I create a parent table, then 3 child table. In this example, this will contain people’s name and their department name. Every child table have a CHECK for department name. Let’s create these tables and CHECKs:
CREATE TABLE people(id serial, name varchar(60), department varchar(60)); CREATE TABLE people_it () INHERITS (people); CREATE TABLE people_hr () INHERITS (people); CREATE TABLE people_other () INHERITS (people); ALTER TABLE people_it ADD CONSTRAINT people_department_check CHECK (department = 'it'); ALTER TABLE people_hr ADD CONSTRAINT people_department_check CHECK (department = 'hr');
First, we need to see what the problem is. In EF only people table would be defined. So EF will perform every query, insert and update action on that. Let’s execute an insert and see what happens.
INSERT INTO people (name, department) VALUES ('John', 'it'); SELECT * FROM people_it; id | name | department ----+------+------------ (0 rows) SELECT * FROM people_hr; id | name | department ----+------+------------ (0 rows) SELECT * FROM people; id | name | department ----+------+------------ 1 | John | it (1 row)
As you can see, people_it table is empty. Record has been written onto people table instead of a child table. Now, I will create a function and a trigger. Function looks like this. Note: in case of a trigger functions the NEW.* is automatically filled with the new line data which is inserted.
CREATE OR REPLACE FUNCTION people_insert_trigger_func() RETURNS trigger AS $$ BEGIN IF NEW.department = 'it' THEN INSERT INTO people_it (id, name, department) VALUES (NEW.id, NEW.name, NEW.department); ELSEIF NEW.department = 'hr' THEN INSERT INTO people_hr (id, name, department) VALUES (NEW.id, NEW.name, NEW.department); ELSE INSERT INTO people_other (id, name, department) VALUES (NEW.id, NEW.name,NEW.department); END IF; RETURN NULL; END $$ LANGUAGE 'plpgsql';
The trigger definition is the following. As it can be seen, it is executed BEFORE the insert. As the function return NULL, nothing will be inserted into people table.
CREATE TRIGGER people_insert_trigger BEFORE INSERT ON people FOR EACH ROW EXECUTE PROCEDURE people_insert_trigger_func();
And now, remove the existing record, then execute some new inserts:
INSERT INTO people (name, department) VALUES ('John', 'it'), ('Kate', 'hr'); INSERT INTO people (name, department) VALUES ('James', 'hr'); INSERT INTO people (name, department) VALUES ('Susan', 'marketing');
After execute some query, this is the result:
SELECT * FROM people_other; id | name | department ----+-------+------------ 22 | Susan | marketing (1 row) SELECT * FROM people_it; id | name | department ----+------+------------ 23 | John | it (1 row) SELECT * FROM people_hr; id | name | department ----+-------+------------ 24 | Kate | hr 25 | James | hr (2 rows) SELECT * FROM people; id | name | department ----+-------+------------ 23 | John | it 24 | Kate | hr 25 | James | hr 22 | Susan | marketing (4 rows)
Final words
By this INSERT trigger, it is possible to treat our tables from the outside (e.g.: from EF) as regular table, but inside the database, we will have separated data for faster processing.