Regular Expressions
Have you ever stumbled upon a simple problem that requires you to search a particular word or phrase inside a big chunk of text? Well, sure you have. That’s why every programming language provides searching capabilities in strings, after al!
But what happens when you need to locate a specific pattern inside a big chunk of text? Such patterns include an e-mail address, or an IP addres. Patterns do not have specific characters in them, but they are recognizable inside a text by the human eye. But how can you spot them inside a text using a programming language?
That’s where regular expressions come handy.
Regular expressions are not a programming language. They are a “searching convention”. They provide a description that lets you search a specific pattern inside a text, and return all of its instances.
LANGUAGES THAT HAVE BUILT-IN REGULAR EXPRESSION SUPPORT
- Delphi – Delphi does not have built-in regex support. Delphi for .NET can use the .NET framework regex support. For Win32, there are several PCRE-based VCL components available.
- Gnulib – Gnulib or the GNU Portability Library includes many modules, including a regex module. It implements both POSIX flavors, as well as these two flavors with added GNU extensions.
- Java – Java 4 and later include an excellent regular expressions library in the java.util.regex package.
- JavaScript – If you use JavaScript to validate user input on a web page at the client side, using JavaScript’s built-in regular expression support will greatly reduce the amount of code you need to write.
- .NET (dot net) – Microsoft’s new development framework includes a poorly documented, but very powerful regular expression package, that you can use in any .NET-based programming language such as C# (C sharp) or VB.NET.
- PCRE – Popular open source regular expression library written in ANSI C that you can link directly into your C and C++ applications, or use through an .so (UNIX/Linux) or a .dll (Windows).
- Perl – The text-processing language that gave regular expressions a second life, and introduced many new features. Regular expressions are an essential part of Perl.
- PHP – Popular language for creating dynamic web pages, with three sets of regex functions. Two implement POSIX ERE, while the third is based on PCRE.
- POSIX – The POSIX standard defines two regular expression flavors that are implemented in many applications, programming languages and systems.
- Python – Popular high-level scripting language with a comprehensive built-in regular expression library
- REALbasic – Cross-platform development tool similar to Visual Basic, with a built-in RegEx class based on PCRE.
- Ruby – Another popular high-level scripting language with comprehensive regular expression support as a language feature.
- Tcl – Tcl, a popular “glue” language, offers three regex flavors. Two POSIX-compatible flavors, and an “advanced” Perl-style flavor.
- VBScript – Microsoft scripting language used in ASP (Active Server Pages) and Windows scripting, with a built-in RegExp object implementing the regex flavor defined in the JavaScript standard.
- Visual Basic 6 – Last version of Visual Basic for Win32 development. You can use the VBScript RegExp object in your VB6 applications.
- wxWidgets – Popular open source windowing toolkit. The wxRegEx class encapsulates the “Advanced Regular Expression” engine originally developed for Tcl.
- XML Schema – The W3C XML Schema standard defines its own regular expression flavor for validating simple types using pattern facets.
- XQuery and XPath – The W3C standard for XQuery 1.0 and XPath 2.0 Functions and Operators extends the XML Schema regex flavor to make it suitable for full text search.
- XPath 2.0 Functions and Operators extends the XML Schema regex flavor to make it suitable for full text search.
SUPPORT IN OTHER LANGUAGES
Let’s face it. Some widespread languages do not implement the necessary regex functions.
C++
C++ is one good example for that. With C++, you will only have limited access to search functions in strings. Although C++ has a file named “regex.h” that you can include, I couldn’t find any documentation about that, and I think it’s practically useless.
The other alternatives are regular expressions implemented into 3rd party libraries. After extensive search, I found that the most popular libraries are the Boost::Regex library that is a part of the Boost Libraries Project, and the PCRE library (which stands as Perl Compatible Regular Expressions). The last one can also be used with plain C, and it includes a C++ wrapper for use with C++.
Objective-C/Cocoa (OS X)
Things are a lot easier in this. You can install a regex library without having to use the Make utility in OS. Libraries come inside a framework. And you can include this framework inside any of yur projects without having to install it into a general occasion.
There are 3 frameworks that have great support for regex and are Cocoa specific.
- RegexKit. I haven’t used that. But I have reasons to believe it’s the best option.
- Omnigroup’s OmniFrameworks.The OmniGroup are one of the most profficient developers for OS X. Inside the OmniGroup’s OmniFoundation framework you will find a regex library. However, although powerful, omnigroup’s frameworks have also a steep learning curve. Nevertheless, I highly recommend it.
- AGRegex.It is simple and does the job. Though it may be a little outdated. It uses PCRE 4.3 (but it’s now in the 7.6 release). However, there are little you can’t do with it. It will cover almost every need in regex.
REFERENCES
I have searched a lot for a regular expressions manual that is platform independent, and so far I can only recommend
http://www.regular-expressions.info/
This site has a good tutorial, and a good explanation on regular expressions. It won’t tell you how to add regular expression support, but it will tell you how to make a good search engine.
<<THIS DOCUMENT WILL BE REVISED WITH SUGGESTIONS FROM USERS. USERS, PLEASE POST ADDITIONAL INFORMATIOKN HERE. INFORMATION NEEDED ARE: HOW TO ADD REGEX SUPPORT TO MORE LANGUAGES, AND WHERE TO FIND MORE REFERENCES AND TUTORIALS ABOUT REGULAR EXPRESSIONS>>