countingup.com

A/B Testing at Countingup

9 minute read

Michele Fiordispina

As a fintech company grows, the need for a data-driven decision-making process follows.

While there is a drive to provide our customers with new, exciting and useful features, we have to make sure that changes to existing ones won't lead to a detrimental experience either.

A tried and tested method to bring new functionality while simultaneously ensuring their high quality, is to use A/B tests.

In this blog post, we will look at how we improved our signup conversion by leveraging an in-house A/B testing solution.

The problem

One of the things we strive to do at Countingup is to speed up the sign-up process for our customers.

Ideally, we would like our customers to start taking care of their business as quickly as possible.

One of the things we noticed recently was a considerable amount of blurry/distorted photos. This type of data (selfies, documents, etc.) is needed for compliance purposes, so this is not a step that we cannot absolutely skip.

Unfortunately, these photos would be later rejected during the sign-up process, after some automated processing occurs. The user would then be prompted again to take a new photo. The whole process could take a couple of hours...

Wouldn't it be great if we could avoid that?

What's A/B testing?

A/B testing is a methodology that allows us to compare two or more different versions of a single variable.

In software development, this is a technique commonly used to assert the performance of new (or slightly modified) features against the currently defined user experience.

During an A/B test, the user pool is split into two groups. These are commonly referred to as Control and Variant groups.

Users who receive the new feature fall into the variant group. Conversely, users that fall within the control group won't. The performance of the new feature will then be compared with the control group to measure its effectiveness.

Below is a direct example of what both the control and variant could see during an A/B test.

A/B example

This test has a mixture of both new and reworked features.

  • The addition of the logo is a new feature that didn't exist before
  • The Log-in button is the same as before, except for its colour

Such a methodology enables us to describe why a feature is important (using data) or, more importantly, can show us if a reworked feature yields good or bad results.

As an example, an improved sign-up experience could lead to more users:

  • We can measure the number of new users on both paths to test the effectiveness of this improvement
  • If the conversion drops, we can always disable this feature thus minimising the loss of conversion

Measurement

No A/B test is complete without measurement. Analytics is the tool we need to use to measure the effectiveness of new features.

At Countingup, we use Amplitude to analyse all the usage data coming from the Countingup app so that we can provide a better experience for our small business owners.

One of the important key points about measurement is the funnel conversion analysis.

Funnel conversion can help describe where a drop off occurs within a user's journey. Combining funnels from control and variant groups will tell us where we are increasing our conversion ratio.

How do we do it?

Let's look at how we added support for A/B testing at Countingup.

Prerequisite

A/B tests have a prerequisite: our code needs to enable/disable features at runtime. This is also known as Feature Flags or Feature Toggle. There are some cool projects out there, like LaunchDarkly or Optimizely, that offer feature flags as services (or even A/B testing directly).

We already have a microservice that serves feature flags in our infrastructure, so part of the problem is already dealt with.

As we mentioned on a previous blog post, we deploy changes often. This includes small but also fairly large features.

We use this microservice to ease the development of such large features. While behind a flag, these features are not visible by the user thus allowing us to keep deploying and testing up until the feature is ready. This is made possible by the flags being set per-user, rather than globally.

What we want to achieve now is to leverage this implementation as a starting point to build our A/B testing solution.

Some implementation details

Our feature-flags service

We'll start with a quick and simplistic view of what our feature flags service looked like before we make this A/B enhancement.

Feature flags

The Countingup app would make a request to our backend to know what feature flags are enabled for a user.

A/B testing enhancement

Let's add the concept of a test and users participating in a test

A/B test tables

Here the configuration and details are JSON blobs. This will give us a fair amount of flexibility for extra functionality we might add in the future.

The configuration block contains a few values needed to correctly set up an A/B test such as, flag name, analytics identifiers, etc. A parameter worth mentioning is percentage. This is an integer on the [0, 100] range that defines the ratio of users that will fall within the variant group.

  • 0 and 100 are special values, as they will make the result predictable. Useful for testing and debug
  • 50 is the standard value intended to be used to make A/B tests
  • Other values within the range might be used for other purposes (more on that later)

details describes some properties related to a user performing a specific test. It contains a flag that tells us if a user is participating as part of the variant or control group.

When and how do we assign an A/B test?

A new Create endpoint has been developed to allow A/B tests creation. This is used by us when we want to start a new test.

Bear in mind, this is not the point where we assign users to tests. Doing that at this stage also means dealing with a massive amount of DB requests at once.

Assignment is done when a user requests their feature flags by the GetUserFeatures endpoint.

Feature flags with A/B test

We now have a set of rules to apply before serving the feature flags.

  • Check if there is a new test and, if true, assign it to the user
    • Toss a coin to decide if the user will participate as part of the control or variant group
  • For each active test, identify the user as either part of the control or variant group.
    • Send this information to Amplitude to keep track of this user experience
  • Get the feature flags of the tests where the user participates as part of the variant group
    • Control group users will have the same experience as before, with no need for a specific flag
  • Merge the result with the existing feature flags

This all happens transparently to the user.

How did we improve signup conversion?

A simple warning

The idea we had in mind was to give the user a warning every time they take a photo (ID documents or selfies) that might have some issues.

We created a banner to show a warning whenever this would occur. Luckily, Expo provides a few routines that help us recognise:

  • A partially or totally missing face
  • Blurry output
  • Distorted picture
  • Document not in frame

Here's a screenshot of the feature. This is what it was presented to our customers (both control and variant groups):

Selfie result: no warning (left image), warning (right image)

Image quality warning Selfie

Running this code as an A/B test would give half the users this warning (potentially), while the other half wouldn't get any indication if the pictures they just took present any issue.

What about results?

We gathered some interesting results, as it looks like the change has improved our conversion rate.

Image quality funnel comparison

As we can see from the funnel comparison, the variant group (green) improved the sign-up conversion.

Image quality funnel analysis

Statistical significance (aka When do we stop?)

An A/B test requires a minimum sample size to be defined as statistically significant. In our case, Amplitude will let us know when a test has reached said size and significance. More on that on their website.

Possible improvements

There is some more that can be done with this tool. Even things that are not strictly related to A/B tests.

A/B tests can have multiple variant groups (A/B/n tests). This is called multivariate testing. Converting our existing solution to support it should be fairly straightforward.

Another interesting idea would be to use this tool to make something similar to a "staged release": let a portion of the user try a new feature before going live with the entire customer base. As hinted before, this could be achieved by using a test with an arbitrary value as percentage (within the (0, 100) range, of course).

Closing

I'm eager to use this technique more often, as I feel it gives us a clear picture of "how are we doing" and how engaged the customers are with our new features. There are certainly more ideas brewing at Countingup that could use an A/B test, so stay tuned for more.

That's it for now, 👋