Category Archives: sql

Parameterise a IN clause in T-SQL

Parameters are a great feature in any SQL implementation. You can protect yourself – and more likely – your clients from SQL injection by always using parameterised queries for every call to the database.

I’ve seen some implementations in e-commerce packages where SQL injection would be a breeze.

The Scenario

For a recent client I had a .Net app which was to query records but exclude a set from a specific set of criteria. Normally this is either done with a join or an IN clause. In this instance as I am passing the criteria by parameter I needed a IN Clause.

A IN clause is used to either include or exclude records from a result. An example would be:

Select * from TableA where TableA.name IN ('John','Frank','Dave')

This would include only those rows where column name matched one of ‘John’ or ‘Frank’ or ‘Dave’.

The Problem

Now there is no current way to send a array of items through a parameter. So your choices are:

  • Invoke the query once for each criteria item, or
  • Use dynamic SQL

Both of these are highly inefficient, to say the least.

The Solution

Now before you start pulling your hair out, there is a simple and elegant solution. In T-SQL you can use XML, yes I did write XML that is not a typo.

Here’s how

In T-SQL you can have script which looks like:

Declare @x xml
@x = '<L>
        <I>John</I>
        <I>Frank</I>
        <I>Dave</I>
     </L>'

Select * from TableA where TableA.name in (
     Select * from ( select T.c.value('.','uniqueidentifier') as source
     from @x.nodes ('/L/I') T(c) ) Idss )

The inner query converts the XML into a result set by breaking out the individual <I> nodes within the <L> node into logical rows. The ‘T.c.value’ then accesses the logical row and extracts the text within. Finally this is presented as a subquery result named Idss (apparently that part is important otherwise you’ll get an error).

Now all you would need to do is to place your XML in the @x parameter. I chose a parameter type of string as that was simpler to handle.

Conclusion

The XML solution is quite elegant requiring minimal effort on the client side to pass the parameters to the server. This is much faster than dynamic SQL, as the server store the plan away and will save you recompiles on the server – always a performance killer.

You’ve seen how simple it is to specific a list of values for an IN clause as a parameter. A lot simpler than either of thought!

The mighty Oracle pipeline clause

Several months ago I was working on a problem with one of my clients. This involved taking a blob of data and performing a transformation on it. (what else would you do with a database eh?) However the catch is that the transform is performed according to user defined rules which are dynamic. The user’s have the ability to add/remove/update these rules.

A rule consisted of several columns of sparse data. The first columns described the data pattern which caused a match and then the remained gave the split of the new data. Basically it took financial data and split it out by low level organisation. So a company can see how each element of cost/income was split over the business.

The rules matched based on a set of hierarchical ordering. The more pattern columns which contained data were matched first and you can determine that less columns matched later. This was controlled by a table which was used to sort the rules. More in detail below.

So how do you go about solving this problem? The first instinct is to simply perform a join across the data table and rules and sorting and then use a massive case to remove the duplicates and select the proper rule to insert into the result table. Fine yes that would work, but would consume vast amounts of temporary space as Oracle built the join and then sorted it and then selected the proper rule and row of data. In most instances several rows of data would be created (if not dozens) for each real data row and at most only one would pass through.

Still it seemed the fastest way to do it. The plan didn’t look too bad and so  tested it on some real data. It was slow, very slow. And occasionally it would cause an error. (OMG oracle has bugs!). So I had to find another way to get the job done, and fast.

I came upon an idea where I load the rules all into memory. Basically into a array with the rules sorted in the proper order. I could then take a sample of a a couple of columns of data – which were mandatory – and that would reduce the number of rules to compare down to a handful. I could then trim down to the single rule easily enough and return the rule id for further processing. As the rules were all in memory this should be really, really fast. Now to hook it up!

So I can load up the rules pretty fast using BULK LOAD facility of Oracle PL/SQL and keep them in memory for fast look up. But now how do I hook up that process with the data in the table and have it work as fast as possible. The secret here is two things. Pipelining and Parallel processing.

Pipelining is where a PL/SQL function can return results as each item is calculated and then Oracle can process them as part of a larger PL/SQL statement as it goes. Otherwise the DBMS would have to wait for the entire set of results to be returned before continuing. It sounds like perhaps in this situation this is not needed, but it is.

As the results of each function are independant of any other invocation having the function run in parallel could boost performance significantly. In my experience it does make a difference. You’ll need to get Oracle to use multiple cores or processors obviously for this to work in practise.

So, now I’ve got all the pieces and the PL/SQL looked something like this:

The SQL Statement:
insert into output( ENT_ID, VALUE_, RULE_ID )
      select /*+ parallel(data,2) */  aqsr.ent_id, data.value_*aqsr.ent_percent,
        ASM.RULE_ID
      from reslt_m data
        inner join
          table(GroupdData(CURSOR(select /*+ parallel(GD, 2) */ * from Groupd_Data GD ))) asm
          on data.cost_centre_id=asm.cost_centre_id and
           data.expense_type_id=asm.expense_type_id and data.measure=asm.measure_id and
           data.year_=asm.year_ and data.data_type_id=asm.data_type_id
        inner join rules_ratios aqsr on asm.rule_id=aqsr.UID_

and the pipelined function GroupdData

 function GroupdData(p_inpcur in ref_basedatacur_t) return G_GrpdData_tab
       PIPELINED PARALLEL_ENABLE (Partition p_inpcur by any)
 as
       /* definitions left out */

 begin  
         loop
            fetch p_inpcur bulk collect into l_groupify_rec limit 10000;
            exit when l_groupify_rec.count = 0;
          
            for l_idx in l_groupify_rec.first..l_groupify_rec.last
            loop
                  /* tedious code to check and find rules left out ..
                     fill in rule_id to return to query              */
                      l_rec.rule_id := rules(l_idx)(l_rul_idx).UID_;
                      pipe row(l_rec);
                      exit;
                    end if;
              end loop;
          end if;
          end loop;
      end loop;
      close p_inpcur;
      return;
   end;

And that’s all there is to it! Okay so I’ve left out some of the implementation nitty gritty details but you get the idea.

This code would appear to run slowly, but don’t be deceived it is very very fast. And I intend on using similar techniques in the future. That’s only if a single SQL statement is either too slow or problematic.