Pages Menu
TwitterRssFacebook
Categories Menu

Posted by on Mar 15, 2018 in Mobile Technologies, What's new | 1 comment

All about regex() in xls form: when, how and examples in the humanitarian and development fields.

All about regex() in xls form: when, how and examples in the humanitarian and development fields.

Coding in xls form has become more and more documented and developed on the web. By following a few tutorials, you can start your xls form from scratch and build it up from there (http://xlsform.org/ and https://docs.opendatakit.org/form-regex/ being incredibly useful of course- you can also check out this MDC Toolkit website we have worked on with our partner Tdh as well https://www.mdc-toolkit.org). However, there are some aspects of the coding that still remain in the dark and thus very blurry in our minds. I don’t know about you guys, but the first time I saw a regex() formula, it was mysterious, mesmerising and scary at the same time. I had no idea why that ended out working in the form and how to reproduce it if needed.

Regex (or regular expression) refers to the specific, standard textual syntax for representing patterns that matching text or integers need to conform to (source: https://en.wikipedia.org/wiki/Regular_expression). As http://www.regular-expressions.info perfectly shows, regex() are rich in terms of possibilities. However, only some specific parts of it come in useful in the humanitarian and development fields in my opinion. In this blog post, we will first briefly have a look at when and how to use regex() in xls form. We will then dig into more concrete examples of regex() use in our respective fields of work.

 

Part 1 : When to use regex()

When familiar with regex() structure, you might be tempted to use it whenever you can. However, regex() can get very complicated and sometimes it is not worth the pain.  Let’s have a look at when regex() is most necessary and useful.

The most common and useful use of regex() is when you want the answer to your “text” or “integer” type of questions to have a specific format: you can do so by inputting a regex() in your question’s “constraint” column. The regex() formula will then make your answer be for instance, a 5-digit number or a word written in capital case letters etc.

Here’s an example of how a regex can look like in xls form:

EXAMPLE 1

And its result in your data collection app (ODK Collect here):

Example 2

 

Part 2 : How to use regex()

The regex() constraint formula is expressed in the constraint column as follows:

regex(${value_to_apply_regex},[regular expression])

${value_to_apply_regex} can also be expressed by “.” when in the constraint column of that same variable.

As for the regular expression [regular expression] part of the regex() it has to be expressed between apostrophes (‘…’) for any “text” types of questions and between ‘^…$’ for any « integer » type of questions or « text » questions that have been transformed into integer by adding « numbers » in the « appearance » column.

Inside the regular expression, you can use […] to express all the characters that you allow into that part of the answer and {…} to express how many characters the enumerator can input from the range of characters that you have allowed to be used. You can also use “+” in order to express the fact that the answer will contain several different parts in it. For any capital/lower case letters or numbers, there is no need to enumerate them. You can simply determine the ranges that can be used, such as:

  • A-Z: all capital case letters can be used
  • a-z: all lower case letters can be used
  • 0-9: all digits can be used

Furthermore, as you are restraining the answer’s format by using regex(), it makes sense to make the question mandatory so that the enumerator will not to be able to skip it. By doing so, input a “yes” into the “required” column.

For further information on how to code a regex() you can consult those very useful websites: http://www.regular-expressions.info and https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html. However, be cautious as most of the available regex() information on the web is expressed in XML, which is a more complex codification.

 

Part 3: Examples of regex() in surveys

1. Simple example

Let’s first start with a simple example of a regex() in order to better understand how it is constructed. Suppose that you create a “text” question requesting to enter the code of the country where the survey takes place. The regex() will be expressed as follows:

regex(.,’[A-Z]{2}’)

It implies that only capital case letters can be used [A-Z] and that only 2 of them can be entered {2} for the country code to be validated as a response.

Example 3

2. Email address

Source: http://www.regular-expressions.info/email.html

Let’s suppose that you want to ask for the interviewee’s email address and that you want to avoid any typing errors in the answer and make sure that it has a —-@—-.— format. You will have to create a “text” type of question and use the following regex() in the constraint column of your question:

regex(., ‘[A-Za-z0-9._%+-]+@[A-Za-z0-9-]+[.][A-Za-z]{2,4}’)

Now let’s have a look at how this formula is built up:

« ., »: refers to the variable/question to be constrained ; here it is the question requesting the interviewee’s email address.

« [A-Za-z0-9._%+-] »: it refers to all the characters that you can input before the « @ » in your email address. Here, capital/lower case letters, numbers and several specific symbols (. _ % + -) are allowed. The number of characters that you can input is unlimited here.

« +@[A-Za-z0-9-] »: it refers to all the characters that you can input after the « @ » symbol in the email address. Here, Capital letters, regular letters, numbers and « – » symbols are allowed. The number of characters that you can input is unlimited here.

« +[.][A-Za-z]{2,4} »: it refers to the end of your email address. First a dot (.) is required. Then 2 to 4 characters in capital or regular letters are required (i.e. .com, .int etc.)

« ‘…’ »: as this is a text question, you can put the regular expression between apostrophes.

Example 4

3. Name and Surname

Source: https://forum.opendatakit.org/t/how-to-condition-text-entry-in-uppercase/10541/6

This is another example of a “text” type of question which format you would like to constraint. Here you are asking for the interviewee’s name and surname and want to make sure that both start with a capital case letter and that there is a space in-between:

regex(.,’[A-Z]{1}[a-z]{1,50}[ ]{1}[A-Z]{1}[a-z]{1,50}’)

The formula is composed of the following elements:

« ., »: refers to the variable/question to be constrained ; here it is the question requesting the interviewee’s name and surname.

« [A-Z]{1}[a-z]{1,50} »: this implies that the first letter of the name has to be a capital case letter and that the rest of it is written in lower case letters (up to a maximum of 50 letters). 50 is a safe bet as usually names are much shorter than that.

« [ ]{1} »: here you require a mandatory (and unique) space after the name.

« [A-Z]{1}[a-z]{1,50} »: this implies exactly the same structure as for the name, but this time for the surname.

« ‘…’ »: as this is a text question, you can put all the regular expression between apostrophes.

Example 5

4. Household/serial numbers (containing numbers only)

Source : https://forum.opendatakit.org/t/regex-constraint/1070/7 and https://www.datawinners.com/find-answers-app/xlsform-how-to-limit-the-number-of-digits/

For each « integer » type of question, ODK allows you to enter up to 9-digit integers. If you wish to limit those integers to a number containing less than 9 digits or to make the number contain more than 9 digits, you can use regex() in the constraint in order to do so. Typical examples of such “integer” questions in the humanitarian and development fields are household/serial/registration numbers or p-codes.

There are two different ways to do so, as expressed in this table:

Type of regex to be used When? How?
An  “integer“ type of question
  • For numbers with a maximum of 9 digits. The reason to this is that all “integer” questions are limited to a maximum of 9 digits in xls form.
  • For numbers that don’t start by a “0”.
  •  “integer” type of question
  • regex formula in constraint column with regular expression expressed between ‘^–$’.
  • “yes” in the “required” column
A “text” type of question
  • For numbers potentially with more than 9 digits.
  • For numbers that potentially start with a “0”.
  •  “text” type of question
  • regex formula in constraint column with regular expression expressed between ‘^–$’.
  • “numbers” into the “appearance” column.
  • “yes” in the “required” column

4.1 A regex() formula for an “integer” type of  question – Example:

regex(., ‘^[0-9]{8}$’)

This formula requires that your number has exactly 8 digits, starting by a digit from 1 to 9 and followed by digits between 0 and 9.

As it is an « integer » type of question, the regular expression is to be put between ‘^–$’, where « ^ » shows where the number has to begin and « $ » where it has to end.

example new

 4.2 A regex() formula for a “text” type of question – Examples:

regex(., ‘^[0-9]{10}$’) + “numbers” in the “appearance” column

This formula requires that your number has exactly 10 digits and it can start by any digit, “0” included.

Even though it is a “text “ type of question, by specifying “numbers” in appearance you transform it into an “integer” type of question. Thus the same rules as for the integer questions apply, meaning that your regular expression is to be put between ‘^…$’.

regex(., ‘^[0-9]{5,}$’) + “numbers” in the “appearance” column

This formula requires that your number has 5 or more digits ({5,}) and it can start by any digit, “0” included.

The regular expression is also to be put between ‘^…$’.

example new 2

5. Codes containing numbers and letters

A last example of regex() constraint that you might need to codify in your survey is a code containing both numbers and letters. Project codes and registration numbers are a few examples of those codes.

As the codes contain both numbers and letters, you will need to create a “text” type of question. Also, there is no need to mention “numbers” in appearance.

Example – “text” type of question requesting the project’s code:

regex(.,’[M][O][B][0-9]{2}’)

This code needs to start with MOB and by followed by two numbers from 0 to 9 (i.e. MOB21, MOB76). This regex() is composed of the following:

« ., »: refers to the variable/question to be constrained ; here it is the question requesting the project’s code.

« [M][O][B] »: you necessarily have to start your code with MOB (in capital case letters).

« [0-9]{2} »: your code then has to contain two digits from 0 to 9.

« ‘…’ » : as this is a text question, you can put all the regular expression between apostrophes.

example new 3

You will find all the examples from this blog post in the following xls form: “Examples of Regex formulas in xls form“. And don’t forget, testing is key to success in Mobile Data Collection, so test, test, test!”

1 Comment

Leave a Reply to Yann Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

A propos de l'auteur :

Claudia Grigore