Я использую постгрес 9.4.
select version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit
У меня есть представление в моей базе данных, которое имеет два столбца, один целочисленный и один текстовый.
\d+ gff_attributes
+----------------+---------+-------------+-----------+---------------+
| Column | Type | Modifiers | Storage | Description |
|----------------+---------+-------------+-----------+---------------|
| seqfeature_id | integer | | plain | <null> |
| gff_attributes | text | | extended | <null> |
+----------------+---------+-------------+-----------+---------------+
View definition:
SELECT qv.seqfeature_id,
string_agg((t.name::text || '='::text) || qv.value, ';'::text
ORDER BY t.name) AS gff_attributes
FROM term t,
seqfeature_qualifier_value qv
WHERE qv.term_id = t.term_id
GROUP BY qv.seqfeature_id;
Это объединяет данные из двух таблиц seqfeature_qualifier_value
(~ 55 000 000 строк) и term
(~ 11 000 строк).
\d+ seqfeature_qualifier_value
Table "public.seqfeature_qualifier_value"
Column | Type | Modifiers | Storage | Description
---------------+---------+--------------------+----------+-------------
seqfeature_id | integer | not null | plain |
term_id | integer | not null | plain |
rank | integer | not null default 0 | plain |
value | text | not null | extended |
Indexes:
"seqfeature_qualifier_value_pkey" PRIMARY KEY, btree (seqfeature_id, term_id, rank)
"seqfeaturequal_sfid" btree (seqfeature_id)
"seqfeaturequal_trm" btree (term_id)
"seqfeaturequal_type_value" btree (term_id, value)
Foreign-key constraints:
"fkseqfeature_featqual" FOREIGN KEY (seqfeature_id) REFERENCES seqfeature(seqfeature_id) ON DELETE CASCADE
"fkterm_featqual" FOREIGN KEY (term_id) REFERENCES term(term_id)
Rules:
rule_seqfeature_qualifier_value_i AS
ON INSERT TO seqfeature_qualifier_value
WHERE (( SELECT seqfeature_qualifier_value.seqfeature_id
FROM seqfeature_qualifier_value
WHERE seqfeature_qualifier_value.seqfeature_id = new.seqfeature_id AND seqfeature_qualifier_value.term_id = new.term_id AND seqfeature_qualifier_value.rank = new.rank)) IS NOT NULL DO INSTEAD NOTHING
Has OIDs: no
\d+ term
Table "public.term"
Column | Type | Modifiers | Storage | Description
-------------+------------------------+---------------------------------------------------+----------+-------------
term_id | integer | not null default nextval('term_pk_seq'::regclass) | plain |
name | character varying(255) | not null | extended |
definition | text | | extended |
identifier | character varying(40) | | extended |
is_obsolete | character(1) | | extended |
ontology_id | integer | not null | plain |
Indexes:
"term_pkey" PRIMARY KEY, btree (term_id)
"term_identifier_key" UNIQUE, btree (identifier)
"term_name_ontology_id_is_obsolete_key" UNIQUE, btree (name, ontology_id, is_obsolete)
"term_ont" btree (ontology_id)
Foreign-key constraints:
"fkont_term" FOREIGN KEY (ontology_id) REFERENCES ontology(ontology_id) ON DELETE CASCADE
Rules:
rule_term_i1 AS
ON INSERT TO term
WHERE (( SELECT term.term_id
FROM term
WHERE term.identifier::text = new.identifier::text)) IS NOT NULL DO INSTEAD NOTHING
rule_term_i2 AS
ON INSERT TO term
WHERE (( SELECT term.term_id
FROM term
WHERE term.name::text = new.name::text AND term.ontology_id = new.ontology_id AND term.is_obsolete = new.is_obsolete)) IS NOT NULL DO INSTEAD NOTHING
Has OIDs: no
Теперь, если я хочу выбрать подмножество строк на основе столбца seqfeature_id
, я могу довольно быстро получить результаты, используя явное сравнение:
explain (analyze, verbose) select *
from gff_attributes
where seqfeature_id = 3596159;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GroupAggregate (cost=337.27..734.68 rows=1 width=24) (actual time=11.690..11.690 rows=1 loops=1) |
| Output: qv.seqfeature_id, string_agg((((t.name)::text || '='::text) || qv.value), ';'::text ORDER BY t.name) |
| Group Key: qv.seqfeature_id |
| -> Hash Join (cost=337.27..733.56 rows=110 width=24) (actual time=11.600..11.628 rows=6 loops=1) |
| Output: t.name, qv.seqfeature_id, qv.value |
| Hash Cond: (qv.term_id = t.term_id) |
| -> Index Scan using seqfeaturequal_sfid on public.seqfeature_qualifier_value qv (cost=0.56..394.66 rows=110 width=17) (actual time=0.036..0.055 rows=6 loops=1) |
| Output: qv.seqfeature_id, qv.term_id, qv.rank, qv.value |
| Index Cond: (qv.seqfeature_id = 3596159) |
| -> Hash (cost=194.09..194.09 rows=11409 width=15) (actual time=11.539..11.539 rows=11413 loops=1) |
| Output: t.name, t.term_id |
| Buckets: 2048 Batches: 1 Memory Usage: 540kB |
| -> Seq Scan on public.term t (cost=0.00..194.09 rows=11409 width=15) (actual time=0.009..5.108 rows=11413 loops=1) |
| Output: t.name, t.term_id |
| Planning time: 0.455 ms |
| Execution time: 11.753 ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Но когда он сочетается с запросом, который возвращает много seqfeature_id
с использованием оператора IN, все значительно замедляется (~ 2 минуты).
explain (analyse, verbose)
select * from gff_attributes
where seqfeature_id in (
select seqfeature_id
from seqfeature_qualifier_value
where term_id = (select term_id
from term
where name = 'SRB_ortholog_id')
and value = '1')
;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Merge Join (cost=12911531.62..13619325.85 rows=251228 width=36) (actual time=121504.409..173449.696 rows=102 loops=1) |
| Output: qv.seqfeature_id, (string_agg((((t.name)::text || '='::text) || qv.value), ';'::text ORDER BY t.name)) |
| Merge Cond: (qv.seqfeature_id = seqfeature_qualifier_value.seqfeature_id) |
| InitPlan 1 (returns $0) |
| -> Index Scan using term_name_ontology_id_is_obsolete_key on public.term (cost=0.29..8.30 rows=1 width=4) (actual time=0.036..0.037 rows=1 loops=1) |
| Output: term.term_id |
| Index Cond: ((term.name)::text = 'SRB_ortholog_id'::text) |
| -> GroupAggregate (cost=12905524.15..13607037.46 rows=502457 width=24) (actual time=121295.372..172418.928 rows=3687424 loops=1) |
| Output: qv.seqfeature_id, string_agg((((t.name)::text || '='::text) || qv.value), ';'::text ORDER BY t.name) |
| Group Key: qv.seqfeature_id |
| -> Sort (cost=12905524.15..13044570.67 rows=55618608 width=24) (actual time=121295.315..132671.659 rows=22189814 loops=1) |
| Output: qv.seqfeature_id, t.name, qv.value |
| Sort Key: qv.seqfeature_id |
| Sort Method: external merge Disk: 1639072kB |
| -> Hash Join (cost=336.70..2328594.94 rows=55618608 width=24) (actual time=13.358..41289.820 rows=55545757 loops=1) |
| Output: qv.seqfeature_id, t.name, qv.value |
| Hash Cond: (qv.term_id = t.term_id) |
| -> Seq Scan on public.seqfeature_qualifier_value qv (cost=0.00..1215886.08 rows=55618608 width=17) (actual time=0.063..12230.988 rows=55545757 loops=1) |
| Output: qv.seqfeature_id, qv.term_id, qv.rank, qv.value |
| -> Hash (cost=194.09..194.09 rows=11409 width=15) (actual time=13.278..13.278 rows=11413 loops=1) |
| Output: t.name, t.term_id |
| Buckets: 2048 Batches: 1 Memory Usage: 540kB |
| -> Seq Scan on public.term t (cost=0.00..194.09 rows=11409 width=15) (actual time=0.011..6.207 rows=11413 loops=1) |
| Output: t.name, t.term_id |
| -> Sort (cost=5999.16..5999.20 rows=14 width=4) (actual time=0.404..0.436 rows=102 loops=1) |
| Output: seqfeature_qualifier_value.seqfeature_id |
| Sort Key: seqfeature_qualifier_value.seqfeature_id |
| Sort Method: quicksort Memory: 29kB |
| -> HashAggregate (cost=5998.76..5998.90 rows=14 width=4) (actual time=0.345..0.368 rows=102 loops=1) |
| Output: seqfeature_qualifier_value.seqfeature_id |
| Group Key: seqfeature_qualifier_value.seqfeature_id |
| -> Bitmap Heap Scan on public.seqfeature_qualifier_value (cost=88.22..5994.94 rows=1527 width=4) (actual time=0.102..0.290 rows=102 loops=1) |
| Output: seqfeature_qualifier_value.seqfeature_id, seqfeature_qualifier_value.term_id, seqfeature_qualifier_value.rank, seqfeature_qualifier_value.value |
| Recheck Cond: ((seqfeature_qualifier_value.term_id = $0) AND (seqfeature_qualifier_value.value = '1'::text)) |
| Heap Blocks: exact=102 |
| -> Bitmap Index Scan on seqfeaturequal_type_value (cost=0.00..87.83 rows=1527 width=0) (actual time=0.083..0.083 rows=102 loops=1) |
| Index Cond: ((seqfeature_qualifier_value.term_id = $0) AND (seqfeature_qualifier_value.value = '1'::text)) |
| Planning time: 1.010 ms |
| Execution time: 173942.270 ms |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Обратите внимание, что при выполнении подзапроса сам по себе он также выполняется быстро (‹1 с) и возвращает 102 строки.
explain (analyse, verbose)
select seqfeature_id
from seqfeature_qualifier_value
where term_id = (select term_id
from term where name = 'SRB_ortholog_id'
)
and value = '1'
;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| Bitmap Heap Scan on public.seqfeature_qualifier_value (cost=96.52..6003.24 rows=1527 width=4) (actual time=0.104..0.319 rows=102 loops=1) |
| Output: seqfeature_qualifier_value.seqfeature_id |
| Recheck Cond: ((seqfeature_qualifier_value.term_id = $0) AND (seqfeature_qualifier_value.value = '1'::text)) |
| Heap Blocks: exact=102 |
| InitPlan 1 (returns $0) |
| -> Index Scan using term_name_ontology_id_is_obsolete_key on public.term (cost=0.29..8.30 rows=1 width=4) (actual time=0.035..0.037 rows=1 loops=1) |
| Output: term.term_id |
| Index Cond: ((term.name)::text = 'SRB_ortholog_id'::text) |
| -> Bitmap Index Scan on seqfeaturequal_type_value (cost=0.00..87.83 rows=1527 width=0) (actual time=0.083..0.083 rows=102 loops=1) |
| Index Cond: ((seqfeature_qualifier_value.term_id = $0) AND (seqfeature_qualifier_value.value = '1'::text)) |
| Planning time: 0.215 ms |
| Execution time: 0.368 ms |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
Я очень смущен тем, почему оператор IN добавляет к запросу столько времени? Есть ли способ переписать этот запрос для повышения производительности?
explain (analyze, verbose)
вместо простого выводаexplain
- person a_horse_with_no_name   schedule 30.03.2016term
иseqfeature_qualifier_value
. - person Erwin Brandstetter   schedule 30.03.2016seqfeaturequal_sfid btree (seqfeature_id), seqfeaturequal_trm btree (term_id)
составным индексом на два объединенных:some_index_name(term_id, seqfeature_id)
Этот индекс может быть УНИКАЛЬНЫМ (это первичный ключ в обратном порядке). Такой индекс должен быть ИМХО стандартная практика для таблиц толстых соединений. (может быть, аналогично для столбцаvalue
и их комбинаций) Радости EAV... Если столбецvalues
имеет низкую кардинальность (несколько тысяч различных значений), вы также можете попытаться выжать это в отдельную доменную таблицу. - person wildplasser   schedule 30.03.2016rank
? Это похоже на поле порядка табуляции, и если это так, то ИМХО оно не должно быть частью первичного ключа. (максимум: часть другого ключа-кандидата) - person wildplasser   schedule 30.03.2016