QOT

MySQL DBA’s intelligent assistant

Last week in QOT: Index/Predicate Selectivity Analysis

Last weekend I’ve reached a milestone in my work on the statistical index analysis for QOT. The code is available on Launchpad.

An example is worth a thousand words, so I start with one (once again I’m using employees-db):

$ ./qot –info –host=192.168.200.1 –analyse=possible –schema=employees –query=”select * from dept_emp where (dept_no = ‘d005′ or dept_no = ‘d006′) and (emp_no = 10010 or emp_no = 10011)”

/* Output produced by qot 0.1.6 GPL */
/*
Query: select * from dept_emp where (dept_no = ‘d005′ or dept_no = ‘d006′) and (emp_no = 10010 or emp_no = 10011)

selectivity:
all rows

used tables:
dept_emp (all rows)

ordering:
no ordering

related existing indexes:

dept_emp.PRIMARY(emp_no, dept_no): lookup
predicate statistics:

[emp_no = ? AND dept_no = ?] rows
avg/min/max/avg %: 1 / 1 / 1 / 0.000301565
25/50/75 perc: 1 / 1 / 1
stddev: 0
distinct values: 331603

[emp_no = 10010 AND dept_no = ‘d005′] rows/%: 0 / 0
[emp_no = 10011 AND dept_no = ‘d005′] rows/%: 0 / 0
[emp_no = 10010 AND dept_no = ‘d006′] rows/%: 1 / 0.000301565
[emp_no = 10011 AND dept_no = ‘d006′] rows/%: 0 / 0

dept_emp.emp_no(emp_no): lookup
predicate statistics:

[emp_no = ?] rows
avg/min/max/avg %: 1.10525 / 1 / 2 / 0.000333307
25/50/75 perc: 1 / 1 / 1
stddev: 0.306882
distinct values: 300024

[emp_no = 10010] rows/%: 2 / 0.000603131
[emp_no = 10011] rows/%: 1 / 0.000301565

dept_emp.dept_no(dept_no): lookup
predicate statistics:

[dept_no = ?] rows
avg/min/max/avg %: 36844.8 / 17346 / 85707 / 11.1111
25/50/75 perc: 20117 / 21126 / 52245
stddev: 25144.4
distinct values: 9

[dept_no = ‘d005′] rows/%: 85707 / 25.8463
[dept_no = ‘d006′] rows/%: 20117 / 6.06659
*/

In short, the tool now performs index selectivity analysis in respect to the predicates of given query. Before the analysis the WHERE expression is transformed into disjunctive normal form (DNF) and then individual predicates (conjunctions) are analysed in the context of existing relevant indexes.

In this example QOT has found 3 relevant indexes: PRIMARY, dept_no and emp_no. For every index QOT made a query to the server and calculated 2 kinds of predicate selectivity statistics: general predicate selectivity independent of used constant values and selectivity with the constants used in the query. For the general selectivity the following metrics are calculated: average/min/max number of rows (average row count value and % from the total row count), 25, 50 and 75 percentiles, standard deviation from the average value and number of distinct values. For constants the number of rows is calculated (represented as absolute value and % from the total row count).

In particular you can see that the primary key average selectivity is 1 row and standard deviation is 0, i.e. every primary key value is present exactly once in the table (using this property you can for example easily find candidate keys).

emp_no is not unique but has very high selectivity close to 1. Unlike primary key this index doesn’t include all fields used in WHERE-clause but it’s twice shorter than the primary key and thus requires less memory and less disk reads which means chances are the query will be executed faster with this index than with the primary key.

On the other hand dept_no has very low selectivity in general (only 9 distinct values for > 300 000 rows!)  and in particular for values dept_no = ‘d005′ and dept_no = ‘d006′ (25% and 6% respectively). So you usually don’t want to use dept_no index in this kind of query.

RSS 2.0 | Trackback | Comment

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>