- What is Primary Key ?
- Issues due to tables without primary key
- How do we end up creating tables without primary key by mistake ?
- Performance issues caused by tables missing primary key ?
- How can we find the tables without primary key ?
- How can we restrict creation of tables without primary key ?
- Conclusion
What is Primary Key ?
Primary keys are essential part of all RDBMS including MySQL. They serve as a unique identifier for each row in a table, allowing for efficient searching, sorting, and indexing of data. Tables without primary keys can lead to several issues which we will discussing further in this blog.
Issues due to tables without primary key
- Duplicate data or Data integrity issues: A primary key helps ensure data integrity by preventing null values and enforcing integrity checks within and between other tables. Without primary key, there is a greater risk of data inconsistency. Primary key essentially would prevent any duplicate rows addition to the table and most importantly any update/delete/select query would go for a full table scan even if they have to fetch a single record.
- performance degradation: Without a primary key foreign key referencing wouldn’t be possible, and without proper indexing MySQL may have to perform full table scans to find specific rows, which can be slow and resource-intensive
- Cause Replication lag: In the Row based replication when the source performs a massive update/delete operation in a single statement on a table without primary key and when it gets replicates to the replica each row goes for a full table scan to update/delete.
How do we end up creating tables without primary key by mistake ?
- Creating tables using statements like [ create table table_name_new select * from table_name; ]
- In GTID based environments wouldn’t allow such statements doing multiple transaction in single statement however if you are not using GTID it would get succeed and create the copy of table without any indexes
- Instead the new table should be created using below statements which would preserve the indexes while copying the structure
- create table table_name_new like table_name;
- insert into table_name_new select * from table_name;
mysql> show create table server\G
*************************** 1. row ***************************
Table: server
Create Table: CREATE TABLE `server` (
`ServerName` varchar(128) NOT NULL,
`port` int(11) NOT NULL DEFAULT '3306',
`dateAdded` datetime DEFAULT CURRENT_TIMESTAMP,
`comments` varchar(255) DEFAULT NULL,
PRIMARY KEY (`ServerName`,`port`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.01 sec)
Copying a table – Bad Practise | Copying a table – Good Practise |
---|---|
mysql> create table server_new select * from server; | mysql> create table server_new like server;
Records: 1 Duplicates: 0 Warnings: 0 |
NOTE: This way it created a new table without indexes | NOTE: This way creates new table with primary and other indexes |
Performance issues caused by tables missing primary key ?
Let’s analyse this with an example. Let’s say we have a 2-3 nodes with regular ROW based replication between the source and 2 replicas and there is a table without a primary key and it has millions of records.
- Replication lag
- Now there might be some use case of queries which would be doing bulk update/delete on thousands of records as below
- update customer set is_active=’Y’ where id in (1,2,3,4,5,6,7,8,9,10,11…………………………………………………………….100000);
- delete from customer where is_active=’N’; *** matching condition has more than 100k records ****
- On the replica nodes when the above transactions replicates it becomes 100000 individual transactions due to ROW based replication and it would go for full table scan for each of those update/delete and this would cause huge lag on the slave which becomes unpredictable to estimate how much time its going to take to catchup.
- Now there might be some use case of queries which would be doing bulk update/delete on thousands of records as below
NOTE : Select statements can be slower on tables with or without primary key or any other index for that matter because MySQL has to do a full able scan each time to find the rows matching the where condition with out proper indexing.
How can we find the tables without primary key ?
We can query information_schema tables to get the list of tables without primary key using the below.
SELECT
tbl.table_schema,
tbl.table_name,
kcu.constraint_name,
kcu.column_name
FROM
information_schema.tables tbl LEFT JOIN information_schema.key_column_usage kcu ON tbl.table_schema = kcu.table_schema
AND tbl.table_name = kcu.table_name
AND kcu.constraint_name = 'PRIMARY'
WHERE tbl.table_schema NOT IN ( 'mysql', 'information_schema','performance_schema')
AND kcu.constraint_name IS NULL AND tbl.table_type = 'BASE TABLE';
How can we restrict creation of tables without primary key ?
MySQL has added few features to in recent versions of mysql in 8.0 to address this old known issue through variables like sql_require_primary_key variable and REQUIRE_TABLE_PRIMARY_KEY_CHECK option with change replica source. Otherwise in the older versions of MySQL we should monitor regularly for non primary key tables and identify them before they really grow big enough and the performance degradation due to them become significant.
sql_require_primary_key
Oracle MySQL introduced a much needed precautionary variables sql_require_primary_key from 8.0.13 which is disabled by default however it is a dynamic variable which can be enabled globally and at session level without a DB restart. Enabling this variable avoids all sorts of transactions that could try to create a table without primary key and even it would block the transactions to drop primary keys.
Enabling sql_require_primary_key parameter globally
mysql> show global variables like '%sql_require_primary_key%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| sql_require_primary_key | OFF |
+-------------------------+-------+
1 row in set (0.01 sec)
mysql> set global sql_require_primary_key=ON;
Query OK, 0 rows affected (0.01 sec)
Creating a new table without Primary key
mysql> CREATE TABLE `audit_sales` (
-> `product_id` int(11) DEFAULT NULL,
-> `retail_price` double DEFAULT NULL,
-> `quantity` int(11) DEFAULT NULL,
-> `city` varchar(50) DEFAULT NULL,
-> `state` varchar(10) DEFAULT NULL
-> ) ENGINE=InnoDB;
ERROR 3750 (HY000): Unable to create or change a table without a primary key, when the system variable 'sql_require_primary_key' is set. Add a primary key to the table or unset this variable to avoid this message. Note that tables without a primary key can cause performance problems in row-based replication, so please consult your DBA before changing this setting.
Creating Persons_delta table along with loading data from Persons
mysql> create table Persons_delta select * from Persons;
ERROR 3750 (HY000): Unable to create or change a table without a primary key, when the system variable 'sql_require_primary_key' is set. Add a primary key to the table or unset this variable to avoid this message. Note that tables without a primary key can cause performance problems in row-based replication, so please consult your DBA before changing this setting.
Creating a primary key less table from an existing primary keyless table
mysql> create table Persons_delta1 like Persons_delta;
ERROR 3750 (HY000): Unable to create or change a table without a primary key, when the system variable 'sql_require_primary_key' is set. Add a primary key to the table or unset this variable to avoid this message. Note that tables without a primary key can cause performance problems in row-based replication, so please consult your DBA before changing this setting.
Altering Persons table to drop its primary key
mysql> alter table Persons drop primary key;
ERROR 3750 (HY000): Unable to create or change a table without a primary key, when the system variable 'sql_require_primary_key' is set. Add a primary key to the table or unset this variable to avoid this message. Note that tables without a primary key can cause performance problems in row-based replication, so please consult your DBA before changing this setting.
REQUIRE_TABLE_PRIMARY_KEY_CHECK
From mysql 8.0.23 even Change replica source or change master takes an option REQUIRE_TABLE_PRIMARY_KEY_CHECK to establish its own primary key checks policy. This option takes few values as below. The most exciting of all is GENERATE which allows replica to generate its own invisible primary key.
Value | Description |
---|---|
ON | All replicated CREATE TABLE or ALTER TABLE statement must result in a table that contains a primary key. |
OFF | The replica sets sql_require_primary_key = OFF and all CREATE TABLE or ALTER TABLE statement are bypassed and no primary key validation is done. |
STREAM | The replica uses the sql_require_primary_key settings that is replicated from the source for each transaction. which is the default value and default behavior. |
GENERATE | From MySQL 8.0.32 the replica is capable to generate an invisible primary key for any InnoDB table that lacks a primary key |
NOTE : We would probably do some testing around this but this is a really nice feature to have.
Conclusion
To summarise, primary keys are crucial for proper functioning and performance of a RDBMS database. Tables without primary keys could lead to data inconsistencies, performance issues, Slowness in querying or updating data and replication lag. It’s extremely important to ensure that each table in your database has a primary key defined. Upgrading to mysql 8.0.30 or later also would help where you can take advantages of invisible primary keys. Primary key would ensure the updation/deletion would happen efficiently based on the primary key access instead of full table scans.
Choosing the right primary key is also as important as having a primary key. Columns like an ID INT column , UUID column or an auto increment column, Phone Number , Aadhar number , PAN number , SSN are some good examples of unique columns to be primary key. Also while choosing INT date types make sure to choose them unsigned or bigint which would increase the max int values it can hold. Defining primary key columns or ID columns as unsigned int can also help us to avoid situations like maxing out int, small int limits etc.
Hope you find this information useful. Keep reading and follow us for more exclusive content for open source databases.
Trackbacks/Pingbacks