10. Input Validation

Validation using Regular Expressions

All computer software needs proper input to be able to produce proper output. This is why we must verify all input from non-trusted sources (such as users, other software, input files, etc.). This process is called input validation.

There is yet another reason: security. Many systems today support scripting and, for example, takes user input and puts it in SQL queries to central databases. Here it is important to protect the system from script injections and such.

How can input be validated? Some things are easy such as numbers that consists of the characters 0 through 9 and perhaps the odd +, -, ., e or E if we wish to allow sientific notation and real values. Other things such as e-mail addresses, URLs, etc. are harder to check, but there still is a pattern, even though it is complex. When dealing with human language, these patterns are so varied and complex that they cannot be expressed as a computer program - just try a grammar checking word processor to see how stupid it is. In these cases we can at least try some validation such as ensuring that the data does not contain any illegal characters.

All these cases can be performed using regular expressions, or short regexps, or even shorter REs. When writing an RE, you are writing the pattern to match input against.

Before I delve any further into the subject there are a few things that I need to bring up. First, there are several RE dialects, they all work in pretty much the same way, but some details may differ. This chapter work with Qt's implementation. Second, REs can be used in many contexts, for example splitting input into nice chunks. This chapter deals only with input validation. Third, and last, this chapter will just cover the very basic and scratch the surface. For more information, look at the recommended reading links at the end of this chapter.

The official documentation of Qt's REs is huge and great. I will try to provide a gentle introduction, but leave the details to the official documentation.

The simplest RE thinkable is just one character, for example A. This RE will match any string containing A. The A does not have to be in any particular position in the string, there can be more than one A, but there has to be at least one A.

As a single character RE is rather limited we can match more characters. To allow more characters, simply put a group of them between square brackets. The RE [ABC] will match any string containing A, B or C. As with the single character case, only one is needed and there can be more than one. To avoid having to write the entire alphabeth to allow all letters, or [0123456789] to allow any number these, and some other, special character ranges can be put as [A-Z] (upper case letters), [a-z] (lower case letters)and [0-9] (numbers).

There are three more important points to grasp to be able to use REs to match most input cases.

First, by putting many single characters or grouped characteds together, we force a match in that order. For example [ABC][123]a matches A1a, xB3a, yvC1ax but not x1Aay, ya1A nor a. mail matches mailbox, junk mail, email, etc.

Secondly, we can match the start of a string with ^ and the end with $. This means that ^[ABC][123]$ only matches A1, A2, A3, B1, B2, B3, C1, C2 and C3. As we match the start and end of the string, nothing else is allowed before or after the RE.

Third, we can control the number of apperances that we allow of each character or character group using {lower-limit, upper-limit}. So [A-Z]{0,15} matches up to 15 upper case letters, [0-9]{1,3} matches all values between 0 and 999.

Instead of putting {lower-limit, upper-limit} it is possible to use the version {exact-count} or to omitt either the lower or upper limit (and thus have it ignored by QRegExp). It is also possible to write, for example, [0-9]+. This matches one or more digits. [0-9]* matches zero or more digits while [0-9]? matches zero or one digit.

By putting a ^ as the first character inside a character group we negate the effect. For example [^0-9]$ ensures that the last character of the string is not a digit.

There are two more tricky characters to match. First, the dash, "-", must be last in a group otherwise it will be interpreted as a group of characters (like a-z). Then there is the space, " ", it must be at the end of a group (but before any dash).

Let us try to develop a new pattern to match a serial number. The number is formatted as this: a single character W, S or G specify the product series (workstation, server or gaming), a four digit number follows, then a slash and a two digit number followed by a dash and a number. W1003/01-26, S2055/99-2 and G9900/03-801 are valid examples. So, first we match the start of the string using a caret, ^, then we match the W, S or G using [WSG]. The four digit number is easy: [0-9]{4}, so is the slash (notice that a back-slash is a special character, look at the offical documentation of details). The two digit number and dash are also easy [0-9]{2}-. The last number is put as [0-9]+. Then we simply match the end of the input using $. The final RE is ^[SWG][0-9]{4}/[0-9]{2}-[0-9]+$.

The RegExpExperiment

To make it easier to develop and test REs this chapter includes a very trivial RE evaluation application called RegExp Experiment. The design and implementation is described below.

The application will consist of a dialog with to line edits, one for the text to match and one for the RE. The result will be displayed and there will be two buttons: check and close. So, lets get going.

First, create a new folder and in that folder, create a new QtDesigner project.

Add a dialog to the new project and put 2 push buttons, 2 line edits, 4 labels and 2 spacers in it as shown in figure 10-1. The figure also shows the layouts. The top 2 by 3 widgets are put in a grid layout. The buttons and the horizontal spacer are put in a horizontal layout. Finally, whats left is put in a vertical layout.

The Widgets and the Layout

Figure 10-1 The Widgets and the Layout.

Now, alter the captions according to figure 10-2. Name the top line edit to leText and the other line edit to leRE. The left-most button is called bCheck and the right-most bClose. The label below the line edits is to be called lResult. Also, name the dialog fRegExpExperiment and set the caption to RegExp Experiment. Save it as fregexpexperiment.ui

The Captions

Figure 10-2 The Captions.

Now double click on the form to create fregexpexperiment.ui.h. The add two protected slots: init() and check() (using the object explorer). Hook up the signal-slot connections as shown in figure 10-3.

The Connections

Figure 10-3 The Connections.

Example 10-1 shows the implementation of the two slots. init() initializes lResult to not show any strange text while check() uses a QRegExp to try to match the RE onto the text. Make sure to include <qregexp.h> in the implementation of the dialog (using the object explorer).

void fRegExpExperiment::init()
{
  lResult->setText( "n/a" );
}

void fRegExpExperiment::check()
{
  QRegExp re( leRE->text() );

  lResult->setText( QString::number( re.search( leText->text() ) ) );
}

Example 10-1

The QRegExp::search( ... ) method will return the first location where the RE is matched, or -1 if the RE cannot be matched. When matching against the beginning of the string the value must be 0 or -1, otherwise it may be >=0 it matched.

To complete the project, add a C++ source file and implement a trivial main() as shown in example 10-2.

#include <qapplication.h>

#include "fregexpexperiment.h"

int main( int argc, char **argv )
{
  QApplication a( argc, argv );

  fRegExpExperiment *m = new fRegExpExperiment();
  a.setMainWidget( m );
  m->show();

  return a.exec();
}

Example 10-2

Finally, run qmake && make from the prompt and it should work. Now experiment with the REs described in the first section of this chapter.

Validating User Input using QValidator

Now we know how to match REs and how to design REs. Now we need to apply this to out great Qt project without too much fuzz. This is done using QValidator's decendants such as QRegExpValidator and QDoubleValidator.

A validator can be assigned to line edits, combo boxes, spin boxes, etc. When assigned, it validates the input and classifies it as either valid, fixable or invalid. If the input is invalid the user must fix it before it can leave the widget. It it is fixable the user can fix it or the validator will try if the user presses the Enter or Return key. No action is needed for valid data. Example 10-3 shows how to assign our serial number RE to a line edit using QRegExpValidator.

QRegExp re( "^[SWG][0-9]{4}/[0-9]{2}-[0-9]+$" );
QRegExpValidator *validator = new QRegExpValidator( re, 0 );

QLineEdit *leSN = new QLineEdit( parent );
leSN->setValidator( validator );

Example 10-3

The QIntValidator and QDoubleValidator are available to check that the input is a valid number and that it is inside a given range. If you cannot validate your input using any of the QValidator's decendants it is quite easy to inherit it. This is described in detail in the official documentation for QValidator.

There is one more detail. When using setText to alter the text of, for example, a line edit, the validator is not used. Such input must be validated manually to ensure correctness.

Summary

The example code for this chapter can be found here.

Recommended Reading

This is a part of digitalfanatics.org and is valid XHTML.