For backend developers, accessing a database is crucial.
Core user data is usually stored securely in databases like MySQL or Oracle.
Daily tasks often involve creating databases and tables to meet business needs, but tables are created much more frequently.
This article will focus on table creation since ignoring key details can lead to costly issues in maintenance after deployment.
By the way, Poor database design practices can also cause your API to respond slowly during high concurrency.The following image shows the performance test results of an API using the EchoAPI tool.
Today, let’s discuss 18 tips for creating tables in a database.
Many of the details mentioned in this article stem from my own experiences and challenges faced during work, and I hope they will be helpful to you.
When creating tables, fields, and indexes, giving them good names is incredibly important.
Names serve as the face of tables, fields, and indexes, leaving a first impression.
Good names are concise and self-descriptive, making communication and maintenance easier.
Poor names are ambiguous and confusing, leading to chaos and frustration.
Bad Examples:
Field names like abc, abc_name, name, user_name_123456789 will leave you baffled.
Good Example:
Field name as user_name.
A gentle reminder: names should also not be too long, ideally kept within 30 characters.
It’s best to use lowercase letters for names, as they are easier to read visually.
Bad Examples:
Field names like PRODUCT_NAME, PRODUCT_name are not intuitive. A mix of upper and lower case is less pleasant to read.
Good Example:
Field name as product_name looks more comfortable.
Often, names may contain multiple words for better understanding.
What separator should be used between multiple words?
Bad Examples:
Field names like productname, productName, product name, or product@name are not recommended.
Good Example:
Field name as product_name.
Using an underscore _ between words is strongly advised.
For table names, it’s recommended to use meaningful, concise names along with a business prefix.
For order-related tables, prepend the table name with order_, such as order_pay, order_pay_detail.
For product-related tables, prepend with product_, like product_spu, product_sku.
This practice helps in categorizing tables related to the same business together quickly.
Additionally, if a non-order business might need to create a table named pay, it can be easily distinguished as finance_pay, preventing name conflicts.
Field names allow for maximum flexibility but can easily lead to confusion.
For example, using flag to denote status in one table while using status in another can create inconsistencies.
Standardizing to status for representing state is advisable.
When a table uses another table's primary key, append _id or _sys_no to the end of the field name, for example, product_spu_id or product_spu_sys_no.
Additionally, standardize creation time as create_time and modification time as update_time, with deletion status fixed as delete_status.
Other common fields should also maintain a uniform naming convention across different tables for better clarity.
In a database, there are various types of indexes, including primary keys, regular indexes, unique indexes, and composite indexes.
A table generally has a single primary key, usually named id or sys_no.
Regular and composite indexes can use the ix_ prefix, for example, ix_product_status.
Unique indexes can use the ux_ prefix, such as ux_product_code.
When designing tables, ample freedom exists in choosing field types.
Time-formatted fields can be date, datetime, or timestamp, etc.
Character data types include varchar, char, text, etc.
Numeric types comprise int, bigint, smallint, and tinyint.
Selecting an appropriate field type is crucial.
Overestimating types (e.g., using bigint for a field that will only store values between 1 and 10) wastes space; tinyint would suffice.
Conversely, underestimating (e.g., using int for an 18-digit ID) will lead to data storage failures.
Here are some principles for choosing field types:
After defining field names and selecting appropriate field types, the focus should shift to field lengths, like varchar(20) or bigint(20).
What does varchar indicate in terms of length—bytes or characters?
The answer: In MySQL, varchar and char represent character length, while most other types represent byte length.
For example, bigint(4) specifies the display length, not the storage length, which remains 8 bytes.
If the zerofill property is set, numbers less than 4 bytes will be padded, but even if filled, the underlying data storage remains at 8 bytes.
When designing a table, it’s crucial to limit the number of fields.
I’ve seen tables with dozens or even hundreds of fields, leading to large data volumes and low query efficiency.
If this situation arises, consider splitting large tables into smaller ones while retaining common primary keys.
As a rule of thumb, keep the number of fields per table below 20.
Create a primary key when setting up a table.
Primary keys inherently come with primary key indexes, making queries more efficient, as they don’t require additional lookups.
In a single database, primary keys can use AUTO_INCREMENT for automatic growth.
For distributed databases, particularly in sharded architectures, it’s best to use external algorithms (like Snowflake) to ensure globally unique IDs.
Moreover, keep primary keys independent of business values to reduce coupling and facilitate future expansions.
However, for one-to-one relationships, such as user tables and user extension tables, it’s acceptable to directly use the primary key from the user table.
Before MySQL 8, the default storage engine was MyISAM; from MySQL 8 onward, it’s now InnoDB.
Historically, there was much debate about which storage engine to choose.
MyISAM separates index and data storage, enhancing query performance but lacks support for transactions and foreign keys.
InnoDB, while slightly slower in queries, supports transactions and foreign keys, making it more robust.
It was previously advised to use MyISAM for read-heavy and InnoDB for write-heavy scenarios.
However, optimizations in MySQL have reduced performance differences, so using the default InnoDB storage engine in MySQL 8 and later is recommended without any additional modifications.
When creating fields, decide whether they can be NULL.
Defining fields as NOT NULL whenever possible is advisable.
Why?
In InnoDB, storing NULL values requires extra space, and they can also lead to index failures.
NULL values can only be queried using IS NULL or IS NOT NULL, as using = always returns false.
Thus, define fields as NOT NULL wherever feasible.
However, when a field is directly defined as NOT NULL, and a value is forgotten during input, it will prevent data insertion.
This can be an acceptable situation when new fields are added and scripts run before deploying new code, leading to errors without default values.
For newly added NOT NULL fields, setting a default value is crucial:
ALTER TABLE product_sku ADD COLUMN brand_id INT(10) NOT NULL DEFAULT 0;
Foreign keys in MySQL serve to ensure data consistency and integrity.
For instance:
CREATE TABLE class ( id INT(10) PRIMARY KEY AUTO_INCREMENT, cname VARCHAR(15) );
This creates a class table.
Then, a student table can be constructed that references it:
CREATE TABLE student( id INT(10) PRIMARY KEY AUTO_INCREMENT, name VARCHAR(15) NOT NULL, gender VARCHAR(10) NOT NULL, cid INT, FOREIGN KEY (cid) REFERENCES class(id) );
Here, cid in the student table references id in the class table.
Attempting to delete a record in student without removing the corresponding cid record in class will raise a foreign key constraint error:
a foreign key constraint fails.
Thus, consistency and integrity are preserved.
Note that foreign keys are only usable with the InnoDB storage engine.
If only two tables are linked, it might be manageable, but with several tables, deleting a parent record requires synchronously deleting many child records, which can impact performance.
Thus, for internet systems, it is generally advised to avoid using foreign keys to prioritize performance over absolute data consistency.
In addition to foreign keys, stored procedures and triggers are also discouraged due to their performance impact.
When creating tables, beyond specifying primary keys, it’s essential to create additional indexes.
For example:
CREATE TABLE product_sku( id INT(10) PRIMARY KEY AUTO_INCREMENT, spu_id INT(10) NOT NULL, brand_id INT(10) NOT NULL, name VARCHAR(15) NOT NULL );
This table includes spu_id (from the product group) and brand_id (from the brand table).
In situations that save IDs from other tables, a regular index can be added:
CREATE TABLE product_sku ( id INT(10) PRIMARY KEY AUTO_INCREMENT, spu_id INT(10) NOT NULL, brand_id INT(10) NOT NULL, name VARCHAR(15) NOT NULL, KEY `ix_spu_id` (`spu_id`) USING BTREE, KEY `ix_brand_id` (`brand_id`) USING BTREE );
Such indexes significantly enhance query efficiency.
However, do not create too many indexes as they can hinder data insertion efficiency due to additional storage requirements.
A single table should ideally have no more than five indexes.
If the number of indexes exceeds five during table creation, consider dropping some regular indexes in favor of composite indexes.
Also, when creating composite indexes, always apply the leftmost matching rule to ensure the indexes are effective.
For fields with high duplication rates (like status), avoid creating separate regular indexes. MySQL may skip the index and choose a full table scan instead if it’s more efficient.
I’ll address index inefficiency issues in a separate article later, so let’s hold off on that for now.
The range of types available for time fields in MySQL is fairly extensive: date, datetime, timestamp, and varchar.
Using varchar might be for API consistency where time data is represented as a string.
However, querying data by time ranges can be inefficient with varchar since it cannot utilize indexes.
Date is intended only for dates (e.g., 2020-08-20), while datetime and timestamp are suited for complete date and time.
There are subtle differences between them.
Timestamp: uses 4 bytes and spans from 1970-01-01 00:00:01 UTC to 2038-01-19 03:14:07. It’s also timezone-sensitive.
Datetime: occupies 8 bytes with a range from 1000-01-01 00:00:00 to 9999-12-31 23:59:59, independent of time zones.
Using datetime to save date and time is preferable for its wider range.
As a reminder, when setting default values for time fields, avoid using 0000-00-00 00:00:00, which can cause errors during queries.
MySQL provides several types for floating-point numbers: float, double, decimal, etc.
Given that float and double may lose precision, it’s recommended to use decimal for monetary values.
Typically, floating numbers are defined as decimal(m,n), where n represents the number of decimal places, and m is the total length of both integer and decimal portions.
For example, decimal(10,2) allows for 8 digits before the decimal point and 2 digits after it.
During table structure design, you may encounter fields needing to store variable data values.
For example, in an asynchronous Excel export feature, a field in the async task table may need to save user-selected query conditions, which can vary per user.
Traditional database fields don’t handle this well.
Using MySQL’s json type enables structured data storage in JSON format for easy saving and querying.
MySQL also supports querying JSON data by field names or values.
Unique indexes are frequently used in practice.
You can apply unique indexes to individual fields, like an organization’s code, or create composite unique indexes for multiple fields, like category numbers, units, specifications, etc.
Unique indexes on individual fields are straightforward, but for composite unique indexes, if any field is NULL, the uniqueness constraint may fail.
Another common issue is having unique indexes while still producing duplicate data.
Due to its complexity, I’ll elaborate on unique index issues in a later article.
When creating unique indexes, ensure that none of the involved fields contain NULL values to maintain their uniqueness.
MySQL supports various character sets, including latin1, utf-8, utf8mb4, etc.
Here’s a table summarizing MySQL character sets:
Character Set | Description | Encoding Size | Notes |
---|---|---|---|
latin1 | Encounters encoding issues; rarely used in real projects | 1 byte | Limited support for international characters |
utf-8 | Efficient in storage but cannot store emoji | 3 bytes | Suitable for most text but lacks emoji support |
utf8mb4 | Supports all Unicode characters, including emoji | 4 bytes | Recommended for modern applications |
It’s advisable to set the character set to utf8mb4 during table creation to avoid potential issues.
When creating tables in MySQL, the COLLATE parameter can be configured.
For example:
CREATE TABLE `order` ( `id` BIGINT NOT NULL AUTO_INCREMENT, `code` VARCHAR(20) COLLATE utf8mb4_bin NOT NULL, `name` VARCHAR(30) COLLATE utf8mb4_bin NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `un_code` (`code`), KEY `un_code_name` (`code`,`name`) USING BTREE, KEY `idx_name` (`name`) ) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
The collation determines how character sorting and comparison are conducted.
Character collation depends on the character set, which for utf8mb4 would also start with utf8mb4_. Common types include utf8mb4_general_ci and utf8mb4_bin.
The utf8mb4_general_ci collation is case-insensitive for alphabetical characters, while utf8mb4_bin is case-sensitive.
This distinction is important. For example, if the order table contains a record with the name YOYO and you query it using lowercase yoyo under utf8mb4_general_ci, it retrieves the record. Under utf8mb4_bin, it will not.
Choose collation based on the actual business needs to avoid confusion.
Special attention is warranted for fields that consume substantial storage space, such as comments.
A user comment field might require limits, like a maximum of 500 characters.
Defining large fields as text can waste storage, thus it’s often better to use varchar for better efficiency.
For much larger data types, like contracts that can take up several MB, it may be unreasonable to store directly in MySQL.
Instead, such data could be stored in MongoDB, with the MySQL business table retaining the MongoDB ID.
To enhance performance and query speed, some fields can be redundantly stored.
For example, an order table typically contains a userId to identify users.
However, many order query pages also need to display the user ID along with the user’s name.
If both tables are small, a join is feasible, but for large datasets, it can degrade performance.
In that case, creating a redundant userName field in the order table can resolve performance issues.
While this adjustment allows direct querying from the order table without joins, it requires additional storage and may lead to inconsistency if user names change.
Therefore, carefully evaluate if the redundant fields strategy fits your particular business scenario.
When designing tables, ensure to add clear comments for tables and associated fields.
For example:
CREATE TABLE `sys_dept` ( `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT 'ID', `name` VARCHAR(30) NOT NULL COMMENT 'Name', `pid` BIGINT NOT NULL COMMENT 'Parent Department', `valid_status` TINYINT(1) NOT NULL DEFAULT 1 COMMENT 'Valid Status: 1=Valid, 0=Invalid', `create_user_id` BIGINT NOT NULL COMMENT 'Creator ID', `create_user_name` VARCHAR(30) NOT NULL COMMENT 'Creator Name', `create_time` DATETIME(3) DEFAULT NULL COMMENT 'Creation Date', `update_user_id` BIGINT DEFAULT NULL COMMENT 'Updater ID', `update_user_name` VARCHAR(30) DEFAULT NULL COMMENT 'Updater Name', `update_time` DATETIME(3) DEFAULT NULL COMMENT 'Update Time', `is_del` TINYINT(1) DEFAULT '0' COMMENT 'Is Deleted: 1=Deleted, 0=Not Deleted', PRIMARY KEY (`id`) USING BTREE, KEY `index_pid` (`pid`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COMMENT='Department';
Detailed comments clarify the purpose of tables and fields.
Particularly for fields representing statuses (like valid_status), it immediately conveys the intent behind the data, such as indicating valid versus invalid.
Avoid situations where numerous status fields exist without comments, leading to confusion about what values like 1, 2, or 3 signify.
Initially, one might remember, but after a year of operation, it’s easy to forget, potentially leading to significant pitfalls.
Thus, when designing tables, meticulous commenting and regular updates of these comments are essential.
That wraps up the technical section of this article,If you have a different opinion, let me know?.
Haftungsausschluss: Alle bereitgestellten Ressourcen stammen teilweise aus dem Internet. Wenn eine Verletzung Ihres Urheberrechts oder anderer Rechte und Interessen vorliegt, erläutern Sie bitte die detaillierten Gründe und legen Sie einen Nachweis des Urheberrechts oder Ihrer Rechte und Interessen vor und senden Sie ihn dann an die E-Mail-Adresse: [email protected] Wir werden die Angelegenheit so schnell wie möglich für Sie erledigen.
Copyright© 2022 湘ICP备2022001581号-3