SQL
Techniques - data manipulation language
SQL
Syntax:
Introduction:
- Although efforts were made to standardise SQL syntax via
the ANSI-92 SQL standard, most database manufacturers
either still use their own prior versions of SQL (eg.
Sybase WATCOM & TRANSACT) or have extended the ANSI
standard to provide further functionality (eg. Microsoft
"CUBE", "PIVOT")
- The following attempts to simplify how to use SQL in the
various formats using Delphi & the Borland Database
Engine
- SQL code that tends to be proprietary has been placed
under their proprietary headings in each section
- code in square brackets indicate optional coding
Data manipulation techniques:
- General syntax for
an SQL SELECT statement:
- SELECT T1.[Field name], T2.[Field name] FROM
MyTable T1, MyTable2 T2
- [WHERE (fieldname operator comparisonValue)
AND fieldname2 LIKE
"wildcardSearchValuewildcard"]
- [GROUP BY T1.[field name], ...]
- [HAVING filter_condition]
- [ORDER BY T1.fieldname2 [ASC | DESC ]]
- SELECT statements:
- these retrieve data and create a result set
- common fieldnaming syntax options in SQL:
- * selects all fields and metadata
- if using only 1 table in SQL statement
fieldname itself can be used but if
includes spaces or reserved words,
enclose in square brackets:
- SELECT Fieldname1,
Fieldname2, [Field Name3] FROM
MyTable
- if using more than 1 table, must identify
fieldname with table or table alias if
alias used in FROM clause:
- SELECT MyTable.[Field name]
FROM MyTable
- SELECT T1.[Field name],
T2.[Field name] FROM MyTable T1,
MyTable2 T2
- examples of syntax:
- Calculated fields:
- if using calculated fields which
aggregate data such as SUM,
COUNT, then MUST include ALL
non-aggregate fields in GROUP BY
clause at end.
- SQL syntax may vary here
depending on database being used:
- Common to all
SQL:
- aggregate
functions:
- Sum; Count; Avg;
Max; Min;
- these must be
followed by either:
- (*) -
computes all rows
including those with null
values - only used with
count!
- (fieldname)
- computes all rows where
fieldname value is NOT
NULL
- (DISTINCT
fieldname) -
computes all rows where
fieldname value is NOT
NULL and is unique
(doesn't count
duplicates)
- operators:
- +, -, *, /, AND, OR,
LIKE (see also WHERE
clause)
- finding out
summary data from a
dependent table using
nested query:
- SELECT
t1.PatientName, (SELECT
COUNT(*) FROM
anotherTable WHERE
anotherTable.keyFieldname
= t1.lookupfieldname) as
NumberDependents
- FROM patientTable
t1
- Sybase/SQL
Server:
- date
functions:
- DatePart(param,datetime)
returns the selected
param which may be year,
month, day, dy (day of
year), weekday, hour,
minute, second
- DateName(param,datetime)
as for datepart but
returns string instead of
an integer, thus month
=> 'July', day =>
'24'
- DateDiff(param,datetime1,datetime2)
returns an integer of
number of params between
the dates but may cause
overflow if exceeds int.
maximum
- DateAdd(param,number,datetime1)
returns a datetime of
number of params added to
the date
- GetDate()
returns current datetime
- Convert(DateTime,string)
converts a string to
datetime according to
servers date format
setting (default is mdy):
- to change server's
date format setting,
precede SELECT statement
with SET DATEFORMAT dmy
- Convert(Char(lgth),datetime,DateFormatStyleNumber)
converts datetime to
string of length lgth and
formatted according to
style number:
- 0 = Mon dd yyyy hh:mi
[AM|PM]
- 1 = mm/dd/yy
- 101 = mm/dd/yyyy
- 6 =dd Mon yy
- 106 =dd Mon yyyy
- 8 = hh:mi:ss
- 3 = dd/mm/yy
- 103 = dd/mm/
- conversion example:
get today's date at 6.30
am:
- Convert(datetime,
Convert(
Char(10),getDate(),101)+'
6.30AM') {101 means
that time in getDate is
ignored!}
- dates must be
mm/dd/yyyy if default
server data format, and
enclosed in single
quotes
- string
functions:
- CharIndex(substring,
string) returns start
position of 1st
occurrence of substring
within string
- PatIndex(pattern,string)
returns start position of
1st occurrence of pattern
within string, pattern
can use wildcards:
- % = any string of
zero or more characters
- _ = any single
character
- [W-Z] = any single
character in range W to Z
- [^W-Z] = any single
character NOT in range W
to Z
- Soundex(string)
returns a soundex value
for string which can be
used to compare with
another soundex value
- Lower(string);
Upper(String) - case
conversion
- LTrim(string);
RTrim(string) - trims
trailing/leading space
characters
- SubString(string,
start, lgth) - returns
string of length lgth
characters found in the
string starting at
position start
- Stuff(str1,start,lgth,str2)
- embeds str2 into str1
at str1 position start
replacing lgth characters
of
- Concatenation: join
two strings using +
- math
functions:
- ABS = absolute value;
ROUND; SIGN; CEILING =
smallest integer >=
argument; FLOOR = largest
integer <= argument;
- EXP; LOG; LOG10;
SQRT; POWER;
- SIN; COS; TAN; ACOS;
ASIN; ATAN; ATN2; COT;
DEGREES; RADIANS;
- PI; RAND;
- NB. all above must be
followed by (fn syntax)
where fn syntax is
usually an expression
but:
- is blank for PI, RAND
- has additional
arguments for ATN2,
ROUND(arg,precision);
POWER(arg, power);
- system
functions:
- see manual, examples
are: CURRENT_USER; USER;
USER_ID; USER_NAME;
HOST_ID;HOST_NAME;
- case
statements:
- CASE WHEN
criteria1 THEN Result1
WHEN criteria2 THEN
Result2 ELSE Result3 END
'CalcFieldName'
- NB. result can be set
to equal NULL
- MS Access:
- select
options:
- SELECT {* | ALL |
DISTINCT | DISTINCTROW |
TOP n {percent}}
- calculated
fields:
- SELECT fieldname1
& "," &
fieldname2 AS
calcFieldname; - returns
calculated field
concatenating
fieldname1.data with
comma then
fieldname2.data
-
- FROM options:
- FROM tablename {IN
externalDatabase}
- crosstab
queries:
- these are specialised
queries available only in
MS Access:
- TRANSFORM
function(fieldname1) as
FieldAliasName
- SELECT statement as
per usual
- GROUP BY statement as
per usual
- PIVOT Pivot_fieldname
{IN (comma separated
array of column
headings)}
- function may be any
MS Access SQL aggregate
function
- aggregate
functions:
- Sum, Avg, Min, Max,
Count, StDev, StDevP,
Var, VarP, First, Last
- string
functions:
- strings in double
quotes
- patterns:
- ? = matches any
single character
- * = matches zero or
more characters
- dates:
- math
functions:
- expr1 {NOT} BETWEEN
expr 2 AND expr3 nb. but
handling differs to
equiv. in ANSI-92 SQL
-
- Local SQL using
BDE (Paradox/dBase):
- can't rename a field
with the "as"
keyword as generates
compile error
- date
functions:
- EXTRACT(param FROM
datetimeField) AS
'FieldAliasForSmallIntegerField';
where param = YEAR,
MONTH, DAY, HOUR, MINUTE,
SECOND
- conversion
functions:
- CAST(fieldname AS
datatype) AS
'FieldAliasForDatatypeField',
where datatype = DATE,
TIME, TIMESTAMP,
SMALLINT,
INTEGER,NUMERIC, FLOAT,
CHARACTER(LGTH),
VARCHAR(LGTH), BOOLEAN,
BYTES but cannot be
text/blob
- string
functions:
- Concatenation:
- eg. field1 || ",
" || field2 creates
a calculated field:
"field1Value,
field2Value"
- LOWER; UPPER; case
conversions
- SUBSTRING(fieldname
FROM start [FOR lgth])
(NB. sq. bracketed part
indicates optional)
- TRIM([LEADING|TRAILING}BOTH]
trimmedCharacter FROM
fieldname)
- strings must be
enclosed with double
quotes
-
- FROM clause:
- Common SQL:
- FROM table1
[table1alias], table2
[table2alias]
- Sybase/SQL Server:
- JOIN statements:
- must use WHERE clause
to join tables and
asterix to signify type
of join - LEFT or RIGHT
OUTER joins (TRANSACT
syntax):
- FROM maintable WHERE
maintable.field1 [*= | =
| =* ]
lookupTable.keyField
- Local SQL via BDE
(Paradox/dbase/heterogeneous
joins):
- JOIN statements
(ANSI92 style):
- join via a foreign or
primary key relationship:
- SELECT fieldnames
- FROM table1 KEY JOIN
table2
- join via same named
fields:
- SELECT fieldnames
- FROM table1 NATURAL
JOIN table2
- outer joins - include
rows that don't match
- FROM table1 [LEFT |
RIGHT] OUTER JOIN table2
on table1.field1 =
table2.Keyfield
- inner joins - include
rows that match
- FROM table1 INNER
JOIN table2 on
table1.field1 =
table2.Keyfield
- multiple joins need
careful bracketing to
ensure SQL runs correctly
(if having problems
create one in Access then
view SQL):
- FROM (table 1 LEFT
JOIN table2 (LEFT JOIN
table3 on table2.field2 =
table3.keyField) on
table1.field1 =
table2.keyField)
- create new table from
merger of same
data-structured data from
multiple tables (eg. mail
merge of addresses):
- SELECT fieldnames
FROM table1
- UNION
- SELECT fieldnames
FROM table2
- UNION
- SELECT fieldnames
FROM table3
- heterogeneous joins
from different databases
in Delphi:
- SELECT
tablename1.fieldname1,
tablename2.fieldname2
- FROM
'':DatabaseAlias1:tablename1'',
'':DatabaseAlias2:tablename2''
-
- WHERE clause:
- only need to use if need to
filter data BEFORE it is Grouped
- Common SQL:
- Comparison predicates:
- >,<,>=,<=,<>,=
- field1 IS [NOT] NULL
- Sybase/SQL Server:
- see also under FROM
clause for joins
- LIKE patterns:
- % = any string of
zero or more characters
- _ = any single
character
- ESCAPE
"escapeCharacter"
= signifies that where
escapeCharacter occurs,
the following character
is taken literally &
not as a wildcard
- eg. WHERE
(PercentValue LIKE
"%10^%%" ESCAPE
"^") finds
strings containing 10%
- Comparison
predicates:
- field1 [NOT] IN
(comma separated values)
- field1 [NOT] BETWEEN
value1 AND value2
- Subquery
comparisons:
- field1
comparison_predicate ANY
| ALL
(NestedSelectStatement)
- eg. WHERE
(H."pur_price"
> ANY (SELECT
O."price" FROM
"old_sales.dbf"))
- where [NOT] EXISTS
(NestedSelectStatement)
- MS Access:
- LIKE patterns:
- * = any string of
zero or more characters
- Local SQL via BDE
(Paradox/dBase):
- LIKE patterns:
- % = any string of
zero or more characters
- _ = any single
character
- ESCAPE
"escapeCharacter"
= signifies that where
escapeCharacter occurs,
the following character
is taken literally &
not as a wildcard
- eg. WHERE
(PercentValue LIKE
"%10^%%" ESCAPE
"^") finds
strings containing 10%
- Comparison_predicates:
- field1 [NOT] IN
(comma separated values)
- field1 [NOT] BETWEEN
value1 AND value2
- Subquery
comparisons:
- field1
comparison_predicate SOME
| ANY | ALL
(NestedSelectStatement)
- eg. WHERE
(H."pur_price"
> ANY (SELECT
O."price" FROM
"old_sales.dbf"))
- where [NOT] EXISTS
(NestedSelectStatement)
- GROUP BY clause:
- only required if either:
- aggregate function used
in select clause, or,
- use the HAVING clause
- must include ALL non-aggregate
fields in the Select clause
- HAVING clause:
- allows a filter on the result set
after it is brought to client
allowing filtering on summary
data on groups of rows, but
increases network traffic!
- only use if need to filter data
AFTER it has been aggregated by a
Group
- ORDER BY clause:
- only need to use this if wish to
order data and:
- Group By clause is NOT
used, and,
- desired data order is not
the SELECT clause
order???
- Declaring variables:
- Sybase/SQL Server:
- variables must be
identified by prefixing
with the @ symbol
- variables must be
declared BEFORE the
SELECT clause, the
following example shows
how to determine how long
a SQL statement takes to
run:
- DECLARE @start_time
DATETIME
- SELECT @start_time =
GetDate()
- SELECT statement you
wish to run & time
- SELECT
DateDiff(second,
@start_time, GetDate())
'Elapsed time, sec'
- INSERT statements:
- adds new rows into table
- simple example:
- INSERT INTO tablename (fieldname1,
fieldname2)
- VALUES ('value1', 'value2')
- getting data from another table:
- INSERT INTO tablename1
- SELECT * FROM tablename2 WHERE
fieldname < xx
- UPDATE statements:
- changes data values for existing rows in a table
- example:
- UPDATE tablename
- SET fieldname = expression
- {WHERE fieldname = xx}
- {ORDER BY fieldname} -
optionally used to specify the order of
the update
- DELETE statements:
- deletes rows in table
- example:
- DELETE FROM tablename
- WHERE fieldname = xx
-
How to degrade query
performance:
- creation of cartesion products of one table against
another by incorrect JOIN clauses
- tables not being appropriately indexed
- tables not being adequately normalised resulting in
oversized tables
- not having denormalised summary tables updated regularly
to provide fast summary queries
- use of HAVING clause instead of WHERE clause, as the
filtering is then done at the client end requiring the
whole pre-HAVING query result set to be sent across
network to the client
- use of NOT, LIKE or <> operators as these force the
server to read the whole table rather than use an index,
thus instead of saying NOT "P%" try using <
"P%" OR > "P%"
- ensure WHERE clause uses fields in the beginning of a
complex index otherwise that index cannot be used
- ensure the most specific WHERE statement is placed first
thereby ensuring the smallest result set for the next
WHERE statement to act on, these needs knowledge of
likely numbers of records that will be accessed with each
statement alone
- use of out of date database optimiser statistics - these
are generated by the server based on prior queries and
are saved in the database and provide information as to
likely occurrences of a field value in a table. Although
these can be over-ridden in the SQL, these statistics
should be dropped if either:
- db is moved from a test to a production
environment
- db is redeployed to a larger group of users
- a significant amount of data has been added or
deleted from the database
- the demographics of the db user changes
singnificantly (eg. from end-user to management
staff)
- perform query at busy times of day for network traffic
- hardware problems such as inadequate memory/cache,
fragmented disks, other apps running
Advanced
SQL Techniques:
- Finding sequential days when a value rises:
- assuming a table has a date field and a value field
- SELECT v.datefld, v.valuefld
- FROM tablename v JOIN tablename a
- ON ((a.valuefld >= v.valuefld) and (a.datefld = DATEADD(dd,1,v.datefld)))
- OR ((a.valuefld <= v.valuefld) and (a.datefld = DATEADD(dd,-1,v.datefld)))
- GROUP BY v.datefld, v.valuefld //to remove duplicates
- Finding change in a value from sequential records:
- assuming a table has a date field and a value field
- this method uses subqueries to determine record number & the next
record number in a table via the ranking variable
- SELECT v.datefld as StartDate, a.datefld as EndDate, v.valuefld as
StartValue, a.valuefld as EndValue,
- Change = SUBSTRING('- +',SIGN(a.valuefld-v.valuefld)+2,1)+CAST(ABS(a.valuefld-v.valuefld)
AS varchar) //ensure result includes + or - symbol
- FROM (SELECT datefld, valuefld, ranking = (SELECT COUNT(DISTINCT
datefld) FROM tbl u WHERE u.datefld <= l.datefld) FROM tbl l) v
- LEFT OUTER JOIN
- (SELECT datefld, valuefld, ranking = (SELECT COUNT(DISTINCT
datefld) FROM tbl u WHERE u.datefld <= l.datefld) FROM tbl l) a
- ON (a.ranking = v.ranking+1)
- WHERE a.datefld IS NOT NULL
- //to sample only every 3rd value, add limiter in where clause using
the modulus symbol %:
- //could add following line to get change from first day of month to
first day of next month:
- AND DATEPART(dd,a.datefld) = 1 AND DATEPART(dd,v.datefld) = 1
- //what if no guarantee that first date for a month is actually the day
= 1 (eg. weekdays only included in table)??
- Finding change in value from first value one month to first value next
month:
- SELECT v.datefld as StartDate, a.datefld as EndDate, v.valuefld as
StartValue, a.valuefld as EndValue,
- Change = SUBSTRING('- +',SIGN(a.valuefld-v.valuefld)+2,1)+CAST(ABS(a.valuefld-v.valuefld)
AS varchar) //ensure result includes + or - symbol
- FROM tbl v, tbl a
- WHERE v.datefld = (SELECT Min(datefld) FROM tbl u WHERE DATEPART(mm,u.datefld)
>
- Statistical methods:
- Finding the mode of a set of values:
- SELECT TOP 1 WITH TIES valuefld, COUNT(*) AS NumberInstances
- FROM tbl
- GROUP BY valuefld
- ORDER BY NumberInstances DESC
- Creating a crosstab histogram of value ranges:
- eg. find sum of occurrences of patients by age group and by sex: