SQL techniques - data manipulation language

SQL Techniques - data manipulation language

SQL Syntax:

Introduction:

Although efforts were made to standardise SQL syntax via the ANSI-92 SQL standard, most database manufacturers either still use their own prior versions of SQL (eg. Sybase WATCOM & TRANSACT) or have extended the ANSI standard to provide further functionality (eg. Microsoft "CUBE", "PIVOT")
The following attempts to simplify how to use SQL in the various formats using Delphi & the Borland Database Engine
SQL code that tends to be proprietary has been placed under their proprietary headings in each section
code in square brackets indicate optional coding

Data manipulation techniques:

General syntax for an SQL SELECT statement:
- SELECT T1.[Field name], T2.[Field name] FROM MyTable T1, MyTable2 T2
- [WHERE (fieldname operator comparisonValue) AND fieldname2 LIKE "wildcardSearchValuewildcard"]
- [GROUP BY T1.[field name], ...]
- [HAVING filter_condition]
- [ORDER BY T1.fieldname2 [ASC | DESC ]]
SELECT statements:
- these retrieve data and create a result set
- common fieldnaming syntax options in SQL:
  1. * selects all fields and metadata
    - SELECT * FROM tablename
  2. if using only 1 table in SQL statement fieldname itself can be used but if includes spaces or reserved words, enclose in square brackets:
    - SELECT Fieldname1, Fieldname2, [Field Name3] FROM MyTable
  3. if using more than 1 table, must identify fieldname with table or table alias if alias used in FROM clause:
    - SELECT MyTable.[Field name] FROM MyTable
    - SELECT T1.[Field name], T2.[Field name] FROM MyTable T1, MyTable2 T2
- examples of syntax:
  - Calculated fields:
    - if using calculated fields which aggregate data such as SUM, COUNT, then MUST include ALL non-aggregate fields in GROUP BY clause at end.
    - SQL syntax may vary here depending on database being used:
      - Common to all SQL:
        
        aggregate functions:
        
        Sum; Count; Avg; Max; Min;
        
        these must be followed by either:
        
        (*) - computes all rows including those with null values - only used with count!
        
        (fieldname) - computes all rows where fieldname value is NOT NULL
        
        (DISTINCT fieldname) - computes all rows where fieldname value is NOT NULL and is unique (doesn't count duplicates)
        
        operators:
        
        +, -, *, /, AND, OR, LIKE (see also WHERE clause)
        
        finding out summary data from a dependent table using nested query:
        
        SELECT t1.PatientName, (SELECT COUNT(*) FROM anotherTable WHERE anotherTable.keyFieldname = t1.lookupfieldname) as NumberDependents
        
        FROM patientTable t1
      - Sybase/SQL Server:
        
        date functions:
        
        DatePart(param,datetime) returns the selected param which may be year, month, day, dy (day of year), weekday, hour, minute, second
        
        DateName(param,datetime) as for datepart but returns string instead of an integer, thus month => 'July', day => '24'
        
        DateDiff(param,datetime1,datetime2) returns an integer of number of params between the dates but may cause overflow if exceeds int. maximum
        
        DateAdd(param,number,datetime1) returns a datetime of number of params added to the date
        
        GetDate() returns current datetime
        
        Convert(DateTime,string) converts a string to datetime according to servers date format setting (default is mdy):
        
        to change server's date format setting, precede SELECT statement with SET DATEFORMAT dmy
        
        Convert(Char(lgth),datetime,DateFormatStyleNumber) converts datetime to string of length lgth and formatted according to style number:
        
        0 = Mon dd yyyy hh:mi [AM|PM]
        
        1 = mm/dd/yy
        
        101 = mm/dd/yyyy
        
        6 =dd Mon yy
        
        106 =dd Mon yyyy
        
        8 = hh:mi:ss
        
        3 = dd/mm/yy
        
        103 = dd/mm/
        
        conversion example: get today's date at 6.30 am:
        
        Convert(datetime, Convert( Char(10),getDate(),101)+' 6.30AM') {101 means that time in getDate is ignored!}
        
        dates must be mm/dd/yyyy if default server data format, and enclosed in single quotes
        
        string functions:
        
        CharIndex(substring, string) returns start position of 1st occurrence of substring within string
        
        PatIndex(pattern,string) returns start position of 1st occurrence of pattern within string, pattern can use wildcards:
        
        % = any string of zero or more characters
        
        _ = any single character
        
        [W-Z] = any single character in range W to Z
        
        [^W-Z] = any single character NOT in range W to Z
        
        Soundex(string) returns a soundex value for string which can be used to compare with another soundex value
        
        Lower(string); Upper(String) - case conversion
        
        LTrim(string); RTrim(string) - trims trailing/leading space characters
        
        SubString(string, start, lgth) - returns string of length lgth characters found in the string starting at position start
        
        Stuff(str1,start,lgth,str2) - embeds str2 into str1 at str1 position start replacing lgth characters of
        
        Concatenation: join two strings using +
        
        math functions:
        
        ABS = absolute value; ROUND; SIGN; CEILING = smallest integer >= argument; FLOOR = largest integer <= argument;
        
        EXP; LOG; LOG10; SQRT; POWER;
        
        SIN; COS; TAN; ACOS; ASIN; ATAN; ATN2; COT; DEGREES; RADIANS;
        
        PI; RAND;
        
        NB. all above must be followed by (fn syntax) where fn syntax is usually an expression but:
        
        is blank for PI, RAND
        
        has additional arguments for ATN2, ROUND(arg,precision); POWER(arg, power);
        
        system functions:
        
        see manual, examples are: CURRENT_USER; USER; USER_ID; USER_NAME; HOST_ID;HOST_NAME;
        
        case statements:
        
        CASE WHEN criteria1 THEN Result1 WHEN criteria2 THEN Result2 ELSE Result3 END 'CalcFieldName'
        
        NB. result can be set to equal NULL
      - MS Access:
        
        select options:
        
        SELECT {* | ALL | DISTINCT | DISTINCTROW | TOP n {percent}}
        
        calculated fields:
        
        SELECT fieldname1 & "," & fieldname2 AS calcFieldname; - returns calculated field concatenating fieldname1.data with comma then fieldname2.data
        
        FROM options:
        
        FROM tablename {IN externalDatabase}
        
        crosstab queries:
        
        these are specialised queries available only in MS Access:
        
        TRANSFORM function(fieldname1) as FieldAliasName
        
        SELECT statement as per usual
        
        GROUP BY statement as per usual
        
        PIVOT Pivot_fieldname {IN (comma separated array of column headings)}
        
        function may be any MS Access SQL aggregate function
        
        aggregate functions:
        
        Sum, Avg, Min, Max, Count, StDev, StDevP, Var, VarP, First, Last
        
        string functions:
        
        strings in double quotes
        
        patterns:
        
        ? = matches any single character
        
        * = matches zero or more characters
        
        dates:
        
        dates bounded by #
        
        math functions:
        
        expr1 {NOT} BETWEEN expr 2 AND expr3 nb. but handling differs to equiv. in ANSI-92 SQL
      - Local SQL using BDE (Paradox/dBase):
        
        can't rename a field with the "as" keyword as generates compile error
        
        date functions:
        
        EXTRACT(param FROM datetimeField) AS 'FieldAliasForSmallIntegerField'; where param = YEAR, MONTH, DAY, HOUR, MINUTE, SECOND
        
        conversion functions:
        
        CAST(fieldname AS datatype) AS 'FieldAliasForDatatypeField', where datatype = DATE, TIME, TIMESTAMP, SMALLINT, INTEGER,NUMERIC, FLOAT, CHARACTER(LGTH), VARCHAR(LGTH), BOOLEAN, BYTES but cannot be text/blob
        
        string functions:
        
        Concatenation:
        
        eg. field1 || ", " || field2 creates a calculated field: "field1Value, field2Value"
        
        LOWER; UPPER; case conversions
        
        SUBSTRING(fieldname FROM start [FOR lgth]) (NB. sq. bracketed part indicates optional)
        
        TRIM([LEADING|TRAILING}BOTH] trimmedCharacter FROM fieldname)
        
        strings must be enclosed with double quotes
  - FROM clause:
    - Common SQL:
      - FROM table1 [table1alias], table2 [table2alias]
    - Sybase/SQL Server:
      - JOIN statements:
        
        must use WHERE clause to join tables and asterix to signify type of join - LEFT or RIGHT OUTER joins (TRANSACT syntax):
        
        FROM maintable WHERE maintable.field1 [*= | = | =* ] lookupTable.keyField
    - Local SQL via BDE (Paradox/dbase/heterogeneous joins):
      - JOIN statements (ANSI92 style):
        
        join via a foreign or primary key relationship:
        
        SELECT fieldnames
        
        FROM table1 KEY JOIN table2
        
        join via same named fields:
        
        SELECT fieldnames
        
        FROM table1 NATURAL JOIN table2
        
        outer joins - include rows that don't match
        
        FROM table1 [LEFT | RIGHT] OUTER JOIN table2 on table1.field1 = table2.Keyfield
        
        inner joins - include rows that match
        
        FROM table1 INNER JOIN table2 on table1.field1 = table2.Keyfield
        
        multiple joins need careful bracketing to ensure SQL runs correctly (if having problems create one in Access then view SQL):
        
        FROM (table 1 LEFT JOIN table2 (LEFT JOIN table3 on table2.field2 = table3.keyField) on table1.field1 = table2.keyField)
        
        create new table from merger of same data-structured data from multiple tables (eg. mail merge of addresses):
        
        SELECT fieldnames FROM table1
        
        UNION
        
        SELECT fieldnames FROM table2
        
        UNION
        
        SELECT fieldnames FROM table3
        
        heterogeneous joins from different databases in Delphi:
        
        SELECT tablename1.fieldname1, tablename2.fieldname2
        
        FROM '':DatabaseAlias1:tablename1'', '':DatabaseAlias2:tablename2''
  - WHERE clause:
    - only need to use if need to filter data BEFORE it is Grouped
    - Common SQL:
      - Comparison predicates:
        
        >,<,>=,<=,<>,=
        
        field1 IS [NOT] NULL
    - Sybase/SQL Server:
      - see also under FROM clause for joins
      - LIKE patterns:
        
        % = any string of zero or more characters
        
        _ = any single character
        
        ESCAPE "escapeCharacter" = signifies that where escapeCharacter occurs, the following character is taken literally & not as a wildcard
        
        eg. WHERE (PercentValue LIKE "%10^%%" ESCAPE "^") finds strings containing 10%
      - Comparison predicates:
        
        field1 [NOT] IN (comma separated values)
        
        field1 [NOT] BETWEEN value1 AND value2
      - Subquery comparisons:
        
        field1 comparison_predicate ANY | ALL (NestedSelectStatement)
        
        eg. WHERE (H."pur_price" > ANY (SELECT O."price" FROM "old_sales.dbf"))
        
        where [NOT] EXISTS (NestedSelectStatement)
    - MS Access:
      - LIKE patterns:
        
        * = any string of zero or more characters
    - Local SQL via BDE (Paradox/dBase):
      - LIKE patterns:
        
        % = any string of zero or more characters
        
        _ = any single character
        
        ESCAPE "escapeCharacter" = signifies that where escapeCharacter occurs, the following character is taken literally & not as a wildcard
        
        eg. WHERE (PercentValue LIKE "%10^%%" ESCAPE "^") finds strings containing 10%
      - Comparison_predicates:
        
        field1 [NOT] IN (comma separated values)
        
        field1 [NOT] BETWEEN value1 AND value2
      - Subquery comparisons:
        
        field1 comparison_predicate SOME | ANY | ALL (NestedSelectStatement)
        
        eg. WHERE (H."pur_price" > ANY (SELECT O."price" FROM "old_sales.dbf"))
        
        where [NOT] EXISTS (NestedSelectStatement)
  - GROUP BY clause:
    - only required if either:
      - aggregate function used in select clause, or,
      - use the HAVING clause
    - must include ALL non-aggregate fields in the Select clause
  - HAVING clause:
    - allows a filter on the result set after it is brought to client allowing filtering on summary data on groups of rows, but increases network traffic!
    - only use if need to filter data AFTER it has been aggregated by a Group
  - ORDER BY clause:
    - only need to use this if wish to order data and:
      - Group By clause is NOT used, and,
      - desired data order is not the SELECT clause order???
  - Declaring variables:
    - Sybase/SQL Server:
      - variables must be identified by prefixing with the @ symbol
      - variables must be declared BEFORE the SELECT clause, the following example shows how to determine how long a SQL statement takes to run:
        
        DECLARE @start_time DATETIME
        
        SELECT @start_time = GetDate()
        
        SELECT statement you wish to run & time
        
        SELECT DateDiff(second, @start_time, GetDate()) 'Elapsed time, sec'
INSERT statements:
- adds new rows into table
- simple example:
  - INSERT INTO tablename (fieldname1, fieldname2)
  - VALUES ('value1', 'value2')
- getting data from another table:
  - INSERT INTO tablename1
  - SELECT * FROM tablename2 WHERE fieldname < xx
UPDATE statements:
- changes data values for existing rows in a table
- example:
  - UPDATE tablename
  - SET fieldname = expression
  - {WHERE fieldname = xx}
  - {ORDER BY fieldname} - optionally used to specify the order of the update
DELETE statements:
- deletes rows in table
- example:
  - DELETE FROM tablename
  - WHERE fieldname = xx

How to degrade query performance:

creation of cartesion products of one table against another by incorrect JOIN clauses
tables not being appropriately indexed
tables not being adequately normalised resulting in oversized tables
not having denormalised summary tables updated regularly to provide fast summary queries
use of HAVING clause instead of WHERE clause, as the filtering is then done at the client end requiring the whole pre-HAVING query result set to be sent across network to the client
use of NOT, LIKE or <> operators as these force the server to read the whole table rather than use an index, thus instead of saying NOT "P%" try using < "P%" OR > "P%"
ensure WHERE clause uses fields in the beginning of a complex index otherwise that index cannot be used
ensure the most specific WHERE statement is placed first thereby ensuring the smallest result set for the next WHERE statement to act on, these needs knowledge of likely numbers of records that will be accessed with each statement alone
use of out of date database optimiser statistics - these are generated by the server based on prior queries and are saved in the database and provide information as to likely occurrences of a field value in a table. Although these can be over-ridden in the SQL, these statistics should be dropped if either:
- db is moved from a test to a production environment
- db is redeployed to a larger group of users
- a significant amount of data has been added or deleted from the database
- the demographics of the db user changes singnificantly (eg. from end-user to management staff)
perform query at busy times of day for network traffic
hardware problems such as inadequate memory/cache, fragmented disks, other apps running

Advanced SQL Techniques:

Finding sequential days when a value rises:
- assuming a table has a date field and a value field
- SELECT v.datefld, v.valuefld
- FROM tablename v JOIN tablename a
- ON ((a.valuefld >= v.valuefld) and (a.datefld = DATEADD(dd,1,v.datefld)))
- OR ((a.valuefld <= v.valuefld) and (a.datefld = DATEADD(dd,-1,v.datefld)))
- GROUP BY v.datefld, v.valuefld //to remove duplicates
Finding change in a value from sequential records:
- assuming a table has a date field and a value field
- this method uses subqueries to determine record number & the next record number in a table via the ranking variable
- SELECT v.datefld as StartDate, a.datefld as EndDate, v.valuefld as StartValue, a.valuefld as EndValue,
- Change = SUBSTRING('- +',SIGN(a.valuefld-v.valuefld)+2,1)+CAST(ABS(a.valuefld-v.valuefld) AS varchar) //ensure result includes + or - symbol
- FROM (SELECT datefld, valuefld, ranking = (SELECT COUNT(DISTINCT datefld) FROM tbl u WHERE u.datefld <= l.datefld) FROM tbl l) v
  - LEFT OUTER JOIN
  - (SELECT datefld, valuefld, ranking = (SELECT COUNT(DISTINCT datefld) FROM tbl u WHERE u.datefld <= l.datefld) FROM tbl l) a
  - ON (a.ranking = v.ranking+1)
- WHERE a.datefld IS NOT NULL
- //to sample only every 3rd value, add limiter in where clause using the modulus symbol %:
  - AND v.ranking%3=0
- //could add following line to get change from first day of month to first day of next month:
  - AND DATEPART(dd,a.datefld) = 1 AND DATEPART(dd,v.datefld) = 1
- //what if no guarantee that first date for a month is actually the day = 1 (eg. weekdays only included in table)??
Finding change in value from first value one month to first value next month:
- SELECT v.datefld as StartDate, a.datefld as EndDate, v.valuefld as StartValue, a.valuefld as EndValue,
- Change = SUBSTRING('- +',SIGN(a.valuefld-v.valuefld)+2,1)+CAST(ABS(a.valuefld-v.valuefld) AS varchar) //ensure result includes + or - symbol
- FROM tbl v, tbl a
- WHERE v.datefld = (SELECT Min(datefld) FROM tbl u WHERE DATEPART(mm,u.datefld) >
Statistical methods:
- Finding the mode of a set of values:
  - SELECT TOP 1 WITH TIES valuefld, COUNT(*) AS NumberInstances
  - FROM tbl
  - GROUP BY valuefld
  - ORDER BY NumberInstances DESC
- Creating a crosstab histogram of value ranges:
  - eg. find sum of occurrences of patients by age group and by sex:
    - SELECT