The UK nudge unit’s report 2013-2015: an in-depth analysis
- by helga
- 0 comments
Do you recognise the importance of building realistic models of human behaviour before delivering public services, government communications and/or behaviour-change campaigns, ultimately aiming to improve people’s lives by enabling them to choose better options for themselves?
Then this post is for YOU 🙂
♦ ♦ ♦
Having worked for almost four years at the core of what is probably the best accomplishment in the UK inspired by behavioural economics (i.e. auto-enrolment), I downloaded and read with interest the report from the Behavioural Insights Team (AKA the Nudge Unit) covering their work over the last 2 years.
It’s over 50 pages with a summary of main projects, trials and interventions from 2013 to 2015. Policy areas include the Labour Market, Health & Wellbeing, Education, Home Affairs, Sustainability and International Development, among others.
Behavioural insight is definitely trending
The first insight from reading the report is how hugely popular the topic is becoming, in the UK and around the world. Whilst in 2010 the BIT team was apparently the only one of its kind in the world (sounds hard to believe – I would have thought that the US would be leading on behavioural economics, but it may have been only in academia) now there are many more similar teams in the UK and around the world.
In the UK, similar units sprang across other government departments. There is now the Public Health England Behavioural Insights Team and the Department of Health Behavioural Insights team (?? anyone care to explain the difference? Is the latter focused on Scotland and Wales?). Meanwhile, BIT moved to be a social purpose company under NESTA, but is still partially funded by the Cabinet Office.
There is also a growing global network in countries around the world, including Australia, Netherlands, Finland, Germany, Singapore, and similar units within the European Commission, the White House and the World Bank.
People love a new piece of jargon, don’t they? Everyone seems to be talking about behavioural science, behavioural economics, behavioural insight, and their potential to help defining and implementing effective public policy and social interventions… All out there trying to nudge people to make better choices for themselves, based on empirical insight of what works and what doesn’t.
One could only help but wonder: what were these teams called before? What were their job titles, what was their work grouped under previously? It’s hard to imagine a time when no consideration was given to how behaviour-change interventions actually fared in the real world, or what could be done to improve them.
It’s basically marketing applied to public services, with a focus on randomised trials
Ignoring the trending behavioural flow for a moment, BIT’s report of 150+ experiments is in fact a mixed bag of marketing best practice that has been around for a long time: market research (including qual/quant), A/B testing, segmenting audiences, targeting messages, tracking real data and analysing it, reporting on results, conducting iterative improvements and so on. So far so good.
BIT also seem to have fished some really old style marketing techniques used in the commercial sector – for example, informing people that they had been ‘chosen’ to receive information on a government programme increased the uptake of that programme. Who hasn’t received the ‘you have been chosen’ message before? And how long until you also go ‘yeah, right’ if the government starts using the same sort of message over and over, like commercial organisations do?
Some of the experiments are about improving user interface and design, areas totally within the usual remit of marketing and customer experience as well. It’s impressive the improvement achieved with changes in the layout of medical forms, an experiment which Imperial College London undertook with selected hospitals: an increase of 3% for medication doses entered into the system correctly (bringing it up to 100%), 52% improvement in prescriber’s contact numbers entered, and 85% improvement on the frequency of medications entered correctly (!). If the NHS doesn’t already have dedicated marketing & communications teams focused on UX and design, here is a good excuse to set them up.
Yes, there seems to be an added excitement in using the behavioural insight bandwagon to inspire and guide new experiments with hard evidence and empirical proof. This is definitely a positive thing – but it sounds a bit gimmicky if you ask me. Any rigorous social marketer (and the organisations that hire them) should have been following similar processes and conducting live experiments long before the behavioural insight trend.
Two shocking absences – where is the strategic plan and the value for money?
However, two notable absences stand out in BIT’s 2013-2015 report:
* the articulation of an overall, strategic plan under which BIT selects which experiments to focus on;
* consistent information about trial costs, and some sort of quantification on benefits if the results were to be applied to a larger scale.
Value for money?
Only the international experiments seem to merit a proper cost/benefit analysis (pages 44 to 48).
The behavioural tax letters in Guatemala are a great example. The best performing message in the trial would have generated additional tax revenues of more than US $750,000 in less than 3 months, 36x the cost of sending the letters. The work in partnership with the Australian government is also reported in detail and with impressive results.
There should have been a much clearer and more consistent effort to measure results to scale for the domestic areas as well?…
I’m not even saying that all interventions need to demonstrate a clear Return on Investment. Sometimes it’s a different type of metric that needs to be brought forward – increases in wellbeing, happiness, healthy living, are not easily translated into £££ (although it’s very common for organisations to quantify them in economic terms as well).
I mean, it’s great to know that, in general, small tweaks can have a positive outcome at low cost, but how much would it cost for exactly which improvement? Would it be worth it? There is no clear answer in most cases. And what is the real cost and potential benefit of spending time and budget testing a particular set of tweaks, as opposed to another set of tweaks? Where is the strategic drive for the various experiments across the different areas?
You know, whereas I previously mentioned that this sounds like marketing best practice applied to public spending, in reality you wouldn’t conduct as many randomised trials in a commercial setting not because you can’t, but because it’s not always easy to justify spending the money, time and effort.
Reading through the pages leave me wondering how BIT shortlist, select and prioritise the projects they decide to work on – if at all? The model seems to be ‘public interest organisation so and so in Essex/Somerset/ Yorkshire approached us to help out with this project’. And thus the trials keep following, covering various small-scale experiments with no central drive, from localised school projects to fairly random online experiments.
Sometimes I’m left wondering about what insight was already available on similar areas of intervention, or if anyone spent time looking for them beforehand. One salient example is the online security project described on pages 33 and 34 (choosing and recalling strong passwords for online accounts) – I’m pretty sure I have seen similar studies before. Perhaps previous studies could have been revisited here, instead of spending resource undertaking primary research?
In all honesty, the report does say that these were small, early-day experiments, and that there is potential for much more. But it’s a shame that some of that potential was not realised through an initial strategic plan covering the areas approached, no matter how early-stage and prone to change the plan was.
Perhaps the lack of focus reflects the ownership and management changes that the team underwent over the period, and the appearance of similar units focused on particular policy areas in other government departments.
How is success defined for each trial? And can improvements always be tracked back to the changes made?
The trials ran by BIT show a wide range of results, from no results or negative outcomes (more rarely) to, more often, very small positive results. It’s not always clear why a particular threshold is used to label an experiment as ‘successful’, or why a percentage increase of as little as 0.45% is considered of statistical relevance and even labelled as a ‘significant improvement’ (p.23)* .
Usually there is also no reference to other factors that could influence variation in results, or an expected variation range for each specific context – such as seasonality, time of the day when an email is sent, or other intrinsic and extrinsic factors to the target audience and communication channels used.
For example, for the trials at job centres in Essex (p.7) – where they asked a group of job seekers to complete and commit to a plan of action instead of reporting on past activity towards finding a job -, the pilot showed improvements in off-flow rates from people on benefits ranging from 1.7% to 5% over the course of the trial. Could the variation over time be explained by the different amounts of training or support available to staff managing the pilots? Or perhaps by normal seasonal employment rates that had little to do with the trials in the first place? Was the trial double-blind? Is 1.7% a good improvement, when compared to the usual variation? And by the way, is ‘off-flow rate from benefits’ the same thing as finding a job, i.e. were job seekers actually asked for more details on the outcome once they moved out of benefits? Quite surprisingly, job centre staff were asked about their levels of happiness in the pilot vs. control group, but job seekers weren’t…
In most cases, the improvements achieved in trials tend to be really small, with a handful of noteworthy exceptions.
A note on data analytics, graphs and charts
In an interesting experiment, when BIT used images in the context of campaigns for organ donation and to stop smoking, the click-through rates and campaign results were worse. There may be a wider lesson here: if you’re communicating something in the not-for-profit world, avoid messages that visually resemble commercial ad banners!
As for how the results are visually represented in the report itself, I say that BIT’s data analytics experts would clearly benefit from the input from other marketing specialists and marketing tools. I find most of the charts throughout the report simplistic and inconclusive.
For example, in the analysis of results for messages tested on a DVLA web page, a plain bar chart (p. 16) shows an increase in sign-ups from 0.5% to 0.9%* when comparing with the control group. Similarly, on p. 17, another bar chart shows tiny changes in click-through rates, from 0.1% to 2.1%, for over 20 variations in copy! There is no reference to number of messages served, time, unique vs repeat visitors, on-screen context, or guidance on how to interpret these results – in summary, nothing else that could help with analysing the data shown.
I know that it’s best to keep graphs as simple as possible, especially if the report is aimed at a diverse audience. But one question beckons after looking at these charts, no matter your professional background: so what?
A little care and love when producing infographics could have added a wealth of information and impact to the report.
Intervention areas and potential impact of nudges
There are 10 main areas of activity highlighted in the report, from Chapter 1 Economic Growth & the Labour Market, to Chapter 10 International Development (Chapter 11, covering work with other governments, could have easily been included in the previous).
I’m not sure how neatly BIT itself is divided in these focus areas, but there is confusion and redundancy between the different chapters. I’ve counted at least three ‘heads of’ whose scope includes education, for example. Perhaps the teams could be optimised to avoid such redundancies and potential knowledge drain, preferably supporting some sort of overarching strategic plan as I pointed out before.
– High potential sectors
More importantly, it’s clear that some areas have much more potential than others for ‘nudges’. My favourites within domestic policy are energy & sustainability, empowering consumers and reducing tax fraud – lots of opportunities there.
Regarding energy and consumer choice, as highlighted at the beginning of the relevant section:
our response to climate change is made all the more complex by the fact that the environmental costs appear far into the future and are (in the present) relatively intangible. (p. 40)
Alongside pensions and saving for retirement, it’s one of the areas where very delayed and complex feedback mechanisms make it really hard for people to make good choices for themselves.
There are some promising projects around energy-efficiency labels, and empowering consumers to compare utility bills and other services where usage patterns are complex and hard to make sense of.
On sustainability, let’s be honest about something though – the role of regulation has to be big too. Nudges won’t be able to replace huge, desperately needed policy changes. Would governments be able to ‘nudge’ Big Oil to invest in the energy alternatives we need, before they can ‘nudge’ consumers to choose such alternatives? Talking about ‘nudges’ at this scale would be nonsense – governments simply have to regulate better. We need much bigger changes to enable consumer choice, or the honesty to stand up to lobbies when the choices currently on offer are simply not viable in the long run. If we cannot burn more fossil fuels in order to avoid drastic climate change (let’s forget for a moment that we may already be too late), surely that choice has to be removed from the equation? And until then, don’t expect consumers to stop choosing air travel for their own leisure, at the expense of future generations.
– Low potential sectors
There are other domestic policy sectors where the work from the nudge unit isn’t clearly as beneficial.
On the Home Affairs front, I encountered only one good application example – a small adjustment in a reminder email which led to a 50% increase of black and minority ethnic applicants going through to the next stage in the recruitment process to join the police (p. 28).
The rest of the work falls a bit flat in terms of scope and objectives. Increasing voluntary departures of illegal migrants, reducing theft by nudging mobile phone theft victims, or supporting the Army’s recruitment processes doesn’t sound like the best prioritisation exercise in my opinion.
Who are these interventions primarily benefiting? Do they really pass the test of ‘helping people making better choices for themselves’? Nudging victims to prevent crime from taking place (instead of focusing on crime perpetrators and main enablers), or recruiting for the Army instead of investing in peace-making strategies, would NOT get my vote.
Broader challenges of social sciences and behavioural experiments
A key challenge for behavioural insights and studies is understanding if and how they apply in different contexts. The report goes as far as acknowledging that.
It was less than 2 weeks ago that a study across 100 psychology experiments published in top ranking journals found that 75% failed the replication test. This means that the findings originally reported weren’t the same when other scientists repeated the experiments.
In general, academics acknowledge considerable statistical weaknesses in neuroscience and cognitive studies. Similar concerns and variations stand behind experiments such as those detailed in BIT’s report as well, with serious implications for its validity, reliability and scalability.
From recommendation to action
This is a crucial point indeed.
I believe that in publicly funded projects, the strategic plan, the focus on results and value for money, are of utmost importance. What we are trying to achieve is improving social marketing through behavioural insight. It makes sense to focus on best practice and in delivering results, rather than creating proto-academic, pseudo-scientific experiments looking for universal truths. Academia is better suited for developing scientific insight.
In a typical marketing department, once you’re past the initial proof of concept – testing an upcoming marketing campaign with a subset of your target audience, for example – , and unless the results are completely inconclusive or even negative, implementing next steps and showing real results would naturally take precedence to undertaking further research. That is not the case with most of the experiments described in BIT’s report. There seems to be a significant gap between recommendation and action.
Shouldn’t there be a summary of real-world achievements right upfront, in the executive summary? A list of quantified gains and achievements that will follow directly from the trials? It definitely looks like a missed opportunity. I know BIT are an ‘insights’ team (the key is in the name!) rather than an implementation team, but the report would benefit immensely from showing the impact of this work in solving big, real-world problems.
As it stands, the report is perhaps a tad too self-centred, small scale and scatter-gun to truly inspire.
* Update 13/10/2015: I’m a marketer rather than a statistician. Since writing this post, I have read more about statistical significance in the context of behavioural economics, and about regression and ANOVA methods to analyse experimental data in research. I understand that a p-value – a function of observed sample results – below 5% usually indicates statistical significance. Nevertheless, the actual significance levels used in BIT’s studies are never referenced or explained anywhere in the report, so I’m still at odds with the percentages mentioned.
- A review of David Halpern’s book Inside the Nudge Unit, which should include other interesting aspects around BIT’s work
- A roundup of BX2015, the annual gathering of behavioural science experts from government and academia around the world, which London hosted a couple of days ago. I haven’t attended the conference, but I will do my best to summarise what went on from various sources.