Bringing Regular Expressions into T‑SQL
SQL Server never added a built‑in regular‑expression engine, so if you want to perform pattern matching inside a query you have to bring an external component into play. The most straightforward route is to tap into the VBScript.RegExp COM object that ships with Windows. By calling the object from T‑SQL you can write a tiny wrapper that accepts a pattern and a target string and tells you whether the pattern matches.
To get this working you’ll need a few prerequisites. First, the database engine must be SQL Server 2000 or newer; the sp_OA* extended procedures were introduced in that version and are the only way to drive COM objects from T‑SQL. Second, the operating system must have the Windows Scripting Host installed. On a standard Windows Server 2000 or later installation that is already there. If you are running an older edition of Windows you can download the latest Windows Script Host package from Microsoft and install it before continuing.
The next step is to make sure the SQL Server service account has permission to create and destroy COM objects. The sp_OACreate and sp_OADestroy procedures are subject to the EXECUTE AS context of the caller, so the account that runs the query must have the sp_OAxxxx permissions. In a typical environment this means granting the public role the EXECUTE privilege on those procedures, or explicitly granting it to the user that will call your regular‑expression function.
Even though you are dealing with a COM component, the overhead of the call is small compared to the cost of a full string scan, especially if you set the Global flag to False and only need a single match. The VBScript.RegExp object is very efficient at parsing a string once for a pattern, so the function we’ll write in the next section will feel like native T‑SQL in most use cases.
Because the function returns a boolean value you can use it directly in the WHERE clause of a SELECT, or you can build more elaborate expressions by chaining multiple function calls. The pattern syntax follows the standard JavaScript regular‑expression rules, which most developers are familiar with. That includes character classes, quantifiers, anchors, and alternation. By exposing the IgnoreCase flag as a parameter you can perform case‑insensitive matches without writing two separate patterns.
Another benefit of this approach is that you keep the regex logic in a single place. If you need to adjust a pattern for future business rules you just edit the UDF’s call to sp_OASetProperty @objRegExp, 'Pattern', @regexp and all queries that use the function will automatically use the new pattern. This reduces duplication and makes maintenance easier.
When you decide to use the function in a production environment, be mindful of the sp_OAxxxx execution cost. Although the function itself is lightweight, every call involves a context switch from T‑SQL to the COM subsystem and back. For large result sets you might consider applying the function in a derived table or a CROSS APPLY that limits the number of rows processed, or you can pre‑filter with a more straightforward LIKE pattern if the regex is simple.
In short, the combination of SQL Server’s OLE Automation procedures and the VBScript.RegExp object gives you a powerful, albeit slightly unconventional, way to perform regular‑expression matching inside your database. The rest of this guide walks through the exact steps you need to create a reusable UDF, how to invoke it safely, and a concrete example of how to detect special characters in a data set.
Building and Using the User‑Defined Function
Below is a compact user‑defined function that encapsulates the OLE Automation calls into a single, reusable bit‑returning function. The code is written for SQL Server 2000 and later, and it expects the caller to supply a source string, a regular‑expression pattern, and an optional ignore‑case flag.
Each step in the function follows a strict pattern. The first call creates the COM object; if that fails we return . Subsequent calls set the object’s properties: Pattern holds the regex, Global controls whether the search continues after the first match, and IgnoreCase matches the optional bit flag. Once the object is configured, the Test method runs the actual match against the supplied source string and writes the result (true or false) into the @results variable. Finally the object is destroyed and the function returns the bit value.
When you drop this UDF into your database you must grant EXECUTE permission to the roles or users that will invoke it. Because the function calls sp_OAxxxx, you also need EXECUTE rights on those procedures. A common pattern is to add the database role db_owner or a custom role that you create for data‑quality routines.
Once the function is in place you can use it in a query just like any other scalar function. For example:
This query pulls all orders and flags those whose notes contain characters outside the alphabet and space. Because the function returns a bit, you can even filter:
Notice that the function is short, clear, and encapsulates all the OLE Automation plumbing. When you need to switch from case‑sensitive to case‑insensitive matching you only change the third argument. Likewise, you can swap in a new pattern without touching the surrounding SQL.
Performance considerations: the function uses the Global property set to False, which means the COM engine stops searching after the first match. That keeps the call fast for typical validation scenarios. If you need to extract multiple matches or get the actual matched text you would have to use a different method, such as the Execute method and looping through the Matches collection. That’s beyond the scope of this quick‑start guide, but the VBScript.RegExp documentation explains how to do that.
Overall, the UDF provides a clean bridge between T‑SQL and the full power of regular expressions. Once you have the function, you can build more complex logic on top of it, such as email validation, phone‑number formatting, or even simple data cleansing before insert.
Practical Example and Testing
To illustrate how the function can be used in a real‑world scenario, let’s walk through a simple test case that looks for special characters in a text column. Suppose you have a table dbo.Messages with a Body column that may contain user‑supplied text. You want to identify rows that contain any character other than letters and spaces, which could indicate spam or malformed input.
The regular‑expression pattern for “anything that is not a letter or space” is [^a-zA-Z ]. Because the pattern uses a negated character class, the Test method will return if any such character is present. You can pass for the @ignorecase flag to ignore case differences; otherwise you can leave it as the default for case‑sensitive matching.
Here’s a query that pulls a few sample rows and shows the result of the test:
Assume the table contains the following rows:
The query would return:
Rows 2, 3, and 4 contain exclamation marks, commas, or digits - all of which trigger the ContainsSpecial flag. Row 1 and 5 are clean and return .
Now imagine you want to delete all messages that have special characters. A single UPDATE statement can do that thanks to the UDF:
Because the function is deterministic and stateless, SQL Server can evaluate it in a straight‑line fashion for each row in the table, making the DELETE operation as efficient as any other predicate. Keep in mind that large tables might benefit from indexing on the Body column, or from partitioning the table to limit the number of rows examined.
Beyond detecting special characters, the same pattern can be adapted to many other checks. For example, to find email addresses you could use [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}, or to locate dates in MM/DD/YYYY format you could use (0[1-9]|1[0-2])/(0[1-9]|[12][0-9]|3[01])/\d{4}. By swapping the pattern string you keep the same function and the same call syntax.
When you’re writing production code, it’s a good practice to wrap the UDF call in a TRY…CATCH block if you suspect the pattern might be malformed. Although the Test method returns False when the pattern is invalid, the COM object throws an error that can be caught in T‑SQL. That adds resilience, especially when patterns come from a configuration table.
Finally, remember that the UDF is not a silver bullet. For very large text fields or heavy workloads you might consider moving the regex logic to a CLR function, which can compile the pattern once and reuse it across queries. However, for most administrative and data‑cleaning tasks, the OLE Automation approach described here is quick to deploy, easy to maintain, and works on any SQL Server 2000 or newer instance without extra licensing or components.





No comments yet. Be the first to comment!