Database design database

For the automatic creation and modification of the database scheme, the Database scripts (Flyway) is used.

We are using generated jpa metamodel by configuration, where criteria api is used.

...
// id
if (filter.getId() != null) {
  predicates.add(builder.equal(root.get(AbstractEntity_.id), filter.getId()));
}
...

Pros and cons of the use of uuid as the primary key for the database.

Pros:
- Easier data migration between the environments (e.g. cloning the pre-production to production)
- generating the identifier on the part of the client (distributed system, offline access)
- impossibility of tipping the url (the guid is not sequential)
Cons:
- double size of the indexes compared to bigint (8B vs 16B) - for numbers, see the next chapter with testing
- the uuid is unreadable at the first sight directly in the database - the uuid is saved in the data type binary(16). If the uuid was saved to be readable, this would imply more disadvantages:
  - uuid native data type for the databse in question (e.g. uuid in posqresql) would result in the need to maintain a set of entities according to the chosen database on the application level
  - string - the size of the column would be of char(36), which would make the indexes 4 times bigger compared to the use of data type bigint
- some databases always save the primary key together with the data (clustered index, MySQL/InnoDB), which may lead to data fragmentation
- illegible / unmemorable url

Listing and searching according to the string guid in PostgreSQL:

SELECT encode(i.id, 'hex')::uuid, i.* FROM idm_identity i WHERE i.id = uuid_send('5cd7ec1c-6b34-45b7-9835-e533e9374ebf'::uuid)

Generating uuid without uuid-ossp package instaled in PosgreSQL:

INSERT INTO idm_configuration(id, created, creator, name, VALUE, secured, confidential) VALUES (
	decode(md5(random()::text || clock_timestamp()::text), 'hex'),
	now(),
	'[SYSTEM]',
	'idm.sec.core.tree.defaultType', 
	(SELECT id FROM idm_tree_type WHERE default_tree_type = TRUE),
	TRUE,
	FALSE);

The aim of the testing should be to verify the steady time of data record / reading in the chart with an increasing number of records and the size of the charts and indexes. Two charts in the binding 1:N (master/detail) with the primary keys, indexes for the uniqueness of the record name (master name and detail name in one master) and for the index of the relation between the masters - detail for searching all the details of the master in question were created (todo: link to git for the structure, when it appears in develop).

How to read the individual records:

with a number of 5000 records saved in the master chart, the insert of one record into the master chart took 9ms.
with a number of 1943455 records saved in the detail chart, the size of all the indexes for the detail chart was 171MB.
with a number of 19505030 records saved in the detail chart, the PK index size for the detail chart was 417MB.

Number of records (master)	Number of records (detail)	Insert (master)	Insert (detail)	Load details by master	Index Size (master)	Index size (detail)	PK size (detail)
`bigint` as the primary key in PostgreSQL 9.4
5000	96995	9ms	10ms	1ms	0,5MB	8MB	2MB
10000	203597	11ms	11ms	2ms	1MB	18MB	4MB
50000	967475	11ms	11ms	2ms	4MB	85MB	19.5MB
100000	1943455	11ms	12ms	2ms	8MB	171MB	39MB
1000000	`19505030`	11ms	`12ms`	2.5ms	73MB	1703MB	`417MB`

Number of records (master)	Number of records (detail)	Insert (master)	Insert (detail)	Load details by master	Index Size (master)	Index size (detail)	PK size (detail)
`binary(16)` representing uuid as the primary key in PostgreSQL 9.4
5000	97652	11ms	12ms	2.5ms	1MB	16M	4.5MB
10000	194804	11.2ms	13,5ms	3ms	1.5MB	31MB	9MB
50000	973836	11.4ms	13.5ms	3ms	5MB	156MB	45.5MB
100000	1947062	11.7ms	14.5ms	3ms	10MB	313MB	95.2MB
1000000	`19380690`	30ms	`30ms`	3.5ms	68MB	2653MB	`750MB`

The testing took place on localhost (16GB RAM, 2 processors, regular HDD, default database configuration, logging turned off), so the individual times are not fully representative - it is the stability that matters the most. With the increasing number of records, the disadvantage of the non-sequential uuid PK starts to come through, when there are more accesses to the disk - work with different pages (the default set-up size of one page is 8 kB - can contain approx. 200 uuid) in which the index itself is saved.

After replacement of HDD by SSD:

`binary(16)` representing uuid as the primary key in PostgreSQL 9.4 - SSD disk

Number of records (master)	Number of records (detail)	Insert (master)	Insert (detail)	Load details by master	Index Size (master)	Index size (detail)	PK size (detail)
1000000	`19 380 690`	5ms	`5ms`	3.5ms	68MB	2653MB	`750MB`
1500000	28 732 177	5ms	5ms	4ms	134MB	4707MB	1130MB
1500000	28 732 177	5ms	5ms	4ms	134MB	4707MB	1130MB
1500000	28 732 177	5ms	5ms	4ms	134MB	4707MB	1130MB

The use of binary(16) identifiers in PostgreSQL 9.4 on the number of records tested is (~40M) stable.

Database design database

Database scripts

JPA metamodel

Primary keys (bigint vs. uuid)

Primary keys testing