Pages Menu
TwitterRssFacebook
Categories Menu

Posted by on Aug 11, 2015 in Mobile Technologies, What's new | 9 comments

Advanced XLS forms coding – part 1 of 2

Advanced XLS forms coding – part 1 of 2

By now, many adepts of mobile data collection know the basics about how to code XLS forms, with good online resources for support.

However, past the basics of the syntax needed to make a form work, not that much is available and it’s exactly that gap I’ll try to address here. So let’s assume you have coded a couple forms already and they work. But now you have to code a long one, a half-dozen (paper) pages, full of tiny multiple choices and tables with a complicated survey-flow pattern… how to make this faster? This will be the first part of two posts on this thematic.

Sorry, what’s your name again?

When coding a long form, a naming convention can go a long way in streamlining the process. There isn’t one “best” answer to this, however it boils down to:

“Do what you want, but do it for a reason.”

The fewer decisions you make without really having some reasoning behind it, the better your coding method will be. The following tips have all been set up with this in mind. Therefore feel free to take them whole, or adapt to needs & preferences.

Same street name, different towns

I started some time ago assigning similar names to the “list name” items (that make up the multiple choice items in the “choice” tab) and to the prompt’s name in the “survey” sheet. Some differentiators can used, such as add “L” at the end. This way, you can easily edit any given list choice without having to constantly lookup the main survey sheet (and the other way around). You then know that to edit the relationHosts prompt, you need to edit the relationHostsL list in the choice tab. If you place the different list choice in alphabetical order, it also makes it easier to find it in the choice tab (which is why you would put the differentiators at the end).

The exception to this would be very common choice lists shared by multiple prompts, such as Yes/No, Increase/Decrease/Stable, etc.

Beauty in simplicity

When making up a new variable name, I try to be both concise & descriptive. For example, let’s say we want to assign a name a prompt about vulnerable groups of internally displaced people by shelter type. There are many ways to name this:

  • vulnerable_groups_displaced_people
  • vulnerableGroupIDP
  • VULNGRP_IDP
  • Question_5_2A

Of course, the first one is the most descriptive, but isn’t concise. What if you have that question nested inside a group of questions that is related to Protection and Security, which is itself nested inside a group of vulnerability factors? If you’re being consistent, you might have named the groups “protection_security” and vulnerability_factors, so in the database you have a column name such as “vulnerability_factors/protection_security/vulnerable_groups_displaced_people”. There are limits to the length that can be assigned to those columns, and they are database-specific. Bumping into those can lead to problems further down the road.

The very last option of naming the prompt after the question number (probably issued in the paper form), though commonly used, has many shortcomings:

  • Adding/removing questions means the numbering will have to be adjusted (useless work)
  • It forces you to go back and forth between the XLS form, the paper form & the output to know what means what
  • Finally, your XLS form isn’t self-contained anymore: to really know how the form is structured, you almost need to refer to this other document (word or printed form).

From my perspective, the 3rd option will work best, especially if the expression “Vulnerable groups” comes up often: in no time VLNGRP will be synonymous to “vulnerable_group”. It strikes a balance between being descriptive enough to be recognizable at first glance, while not being too cumbersome.

Familiar faces

In the choice list, I tend to favor using numbers as values instead of letters. There are pro & cons to this, however I like the objectivity that comes with assigning numbers to each options: a 5 is a 5, whereas the choice “Religious Minorities” could have a number of “values” assigned to it, endless combinations of camel-case or underscore separators, capital letters, etc.

Another potential benefit of this (although it can certainly be possible using letters as values as well) is to use the same value for very common options that appear in many lists:

  • -88 for “Others”
  • -99 for “Do not know”
  • 0 for “No”, 1 for “Yes”

You can establish similar conventions for any common value in your survey

The upside being that when coding constraint, relevant conditions or “If other, please specify” prompts, the value to check for will always be the same. No need to go check the choice list for that, ever. Common relevant/constraint can also be copied & pasted, because the action to take if the answer is “Other” or “No” are often similar.

On the other hand, there is nothing wrong with using readable names for those values instead of numbers – as explained above, there are pros and cons to both. Depending who will be working with the output, and what your analysis set-up is, you might decide that it is better for you to have those value readable right in the output file –especially if your analysis system isn’t programed and a real person will have to manually interpret the data directly from the server output.

First things first

For a short form, it might make sense to just code everything prompt by prompt. However, for a long or complex form, the best structure to adopt or the kind of relevant / constraints / calculation that should be set might not always be obvious: this might come either through testing or through the training sessions of the end user (if that’s possible in your situation). It might be more efficient to code those advanced elements progressively, as the form matures.

I would suggest keeping it to the labels, hints, prompt name & types, and adding already a little bit of structure (grouping questions either for the presentation on the phones if you have larger screen, or because they are thematically related, etc).

Patterns, everywhere

When coding a more complex survey, some common patterns arise such as questions about rankings:

What are your top 3 three priorities in order of importance?

  • Food
  • Lodging
  • WASH
  • Clothing
  • Health
  • No other priorities

In a paper form, this would translate into small boxes with numbers 1-2-3 written next to a list of priorities. However, with ODK Collect, there isn’t necessarily just one way to approach such a prompt. For ease of analysis, I tend to prefer avoiding multiple choices when possible, as they are generally more difficult to deal with when comes the time to setup the analysis. My inclination in this case would be to code 3 select_one prompts, with the choice list being the list above for each.

Logically, if a respondent expresses not having any other needs at the 2nd prompt, you wouldn’t want the next one to show up. They also shouldn’t be able to select the same need twice.

Rankings

A ranking of 3 priorities with constraint & relevant patterns

A few things to note:

·         To code any similar group of question, I can simply copy-past all 3 prompts and then use “Find & Replace”:

find_replace

Find & Replace: a quick way to duplicate similar patterns in a form

  • Copy the full 3 rows of the men’s priority questions and past them further down.
  • Select the 3 questions you just pasted, then ctrl+F
  • In that box, simply replace the common root for the men’s prompt by the common root for the woman’s prompt & hit “Replace all”:

       Which will the yield:

rankings_2

A quickly adapted patterns of the same priority needs, but for woman.

       Pay attention to the number of replacement (Excel will tell you how many replacements have been made). You want to make sure you selected only the cells in which you want excel to find & replace: in our case, the 3 rows we just copied. If Excel tells you it made 110 replacements, you might want to ctrl + z this and retry!

The “name” column have been automatically taken care of, because they all share a common root, NEEDPRIORW, that replaced the NEEDPRIORM for men. Because we selected the relevant & constraints columns as well when we hit ctrl+F, Excel replaced those as well.

Setting up the same pattern for various different groups would take a few minutes at most, whereas coding each of those questions one by one can quickly become tedious and error-prone. Even if those ranking questions do not refer to the needPriority list of the original prompt, all that is needed then is to replace the list name as well and you can even use the find & replace method for that, too. The relevant and constraints conditions are still correctly codded, assuming you followed the previous advice of consistently using “0” as a value for “No” (or in that case, “No other needs”)

The wizardry explained

As an aside, it might be worthwhile explaining how the previous “constraint” works (the “relevant” pattern is well-documented on the net already, so we’ll skip that part). It is meant to prevent the user from selecting the same option twice, as you can’t have “Food” as both your first and second priority. The notation:

       not(.=${NEEDPRIORW_1}) and not(.=${NEEDPRIORW _2})

  • The “.” before the equal signs, in all XLS forms, means “the current prompt”
  • A CONSTRAINT expression must always evaluate to TRUE if the user is to be allowed to go to the next prompt
  • In case the current prompt (“.”) is equal to NEEDPRIORW_1, then that will evaluate to “TRUE”. Since we don’t want the user to be able to proceed to the next prompt in that case, we inverse its value by wrapping it into a not() statement. If the current prompt is different from the NEEDPRIORW_1, then the expression would yield “FALSE” and after negation it will evaluate to “TRUE”, and the user will be able to proceed.
  • For NEEDPRIORW_3, we want the value to be different from both the 1st and the 2nd prompt in the series. We therefore added the “and” statement between the 2 conditions: if any of the condition is not met (so the selection is equal to either the 1st or 2nd prompt) then the whole expression will evaluate to “FALSE” and the user will have to change his selection.

As those constructs get more complex, it might be worthwhile to test them more thoroughly, to ensure they work as intended. For example, making a mistake here might prevent the user from selecting different answers between the 3 prompts – definitely not what we want!

 

Part 2 to follow in a week’s time!

9 Comments

  1. Language::English
    Thanks a lot! Very inspiring for a beginner XLSForm coder.

    Language::Français
    Merci beaucoup! Très instructif pour un codeur XLSForm débutant.

  2. Oh CartONG! you always making my life less difficult!! :)
    This was super useful. =)
    Thanks!!!

  3. Sébastien – content de le lire!

    Ana – Happy to help. There’s a lot that can be done with XLS forms!

  4. Thanks a lot Francis. This has been very useful to me and i really got to learn a lot from this. In fact this was a difficult problem for me a week ago.

    • Great to read!

  5. I have a question with multiple sections. I am also creating hidden fields for each option. I need to create an expression in the calculation field where it searches the question field which would look like {1,2,3} with 3 selections made and then return a value if that value if present. For example. 1=TV if tv is present in original question then return value of 1 in the hidden TV field. Having issues because my current expression if(${Q18infor} = ’1′,1,0) will not work because the Q18infor field is returning multiple numbers, not just 1. Is it possible to have a If Contains Then statement? rather than just an if then statement?

    • Hi Grant,

      Thanks for your question. That’s an interesting question that highlight a subtle difference in 2 ways in which you can refer to variables in your survey.

      As you correctly pointed out, if(${variable}=3,1,0) will only work if that’s the only one of the multiple options selected, because for multiple choices many values selected will be space-separated and put together in the same field. However, selected(${variable},’3,) will be fine, like so:

      if((selected(${variable},’3′),1,0)

      The difference being that ${}=’something’ will look for an exact match with ‘something’ and will not tolerate any difference.
      selected(${},’something’) will check that ‘something’ has been selected, but won’t mind if additional selections has been made.

      I tend to used selected() structure when it refers to either select_one or select_multiple for that reason – it’s a bit more resilient. You may also one day decide that a select_one question now needs to be allowing multiple selections, but if there’s any checks on it as the above, you’ll get subtle mistakes. Or your form may be used by someone else, and someone else may one day change it, not knowing the difference and even not notice the mistake immediately!

  6. Very informative. Please Keep doing this

  7. Hey Francis,

    Thank you very much for the detail. However, I have one question. I have a repeat group under which i want users to input “Item” name. Whenever they add another group, i dont want them to add same “Item” name which they entered in the first field. I am using XLS forms. If you have any suggestions, i would appreciate it. I hope the question is clear.

Trackbacks/Pingbacks

  1. Advanced XLS forms coding: part 2 of 2 | CartoBLOG - […] This post is a follow-up on last week’s “Advanced XLS forms coding: part 1 of 2“. […]

Post a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

A propos de l'auteur :

Francis Vachon

With a background physics engineering, Francis is working as information management at CartONG... when he's not managing reforestation teams in Québec!