Browse Catapults
Transport System Catapult - Logo Home

The Making of a Digital Twin

As part of our workstream exploring the potential, and the reality, of digital twins in the built environment, we’ve been writing a blog series. The series will explore digital twins from concept to reality; covering the history of the concept, how they’re made, the challenges of using them to create value, and the wider potential for how they might benefit cities. The series draws on our own research as well as practical experience of creating a digital twin of our own building, the Urban Innovation Centre in London. 


Knowing what a digital twin is is one thing, creating one is quite another. The vision of an exact digital replica of a physical asset, connected and sharing data in real time with its real-world reference, has been described by many innovators and technologists as a vision with seemingly endless benefits. Here, we want to look at how these visions are brought to reality: how to make a digital twin. 

Ideas are useless unless used. The proof of their value is in their implementation. Until then, they are in limbo.”

This is how Theodore Levitt may feel about today’s state of digital twins.  There’s a lot of literature on the potential for digital twins to change industries; the value they might provide, and the challenges around data security and governance (potentially for city- and country-scale digital twins). But how do we actually make one?

In this blog post, we’re setting out our view of the stages involved in creating a digital twin. It’s important to note that these steps don’t happen in isolation or in a perfect sequence. They all need to be considered at the same time. As an asset owner, you might define what you want to do, and find out you can’t collect that data, so you revisit your definition. Or you can, but not at the granularity you need. Or you find the perfect sensor, but it includes a microphone – and the building users aren’t happy with the idea of potentially being recorded. So you go back and redefine your data needs. 

The process is not perfectly linear, but in the interests of structuring it, we’ve set out six stages of developing a digital twin.

1. Define what you want to do

Excitement around the concept of digital twins may tempt proponents to take a data-first approach: “Data is valuable! How can I collect as much as possible?” This is not an outcomes-based approach and risks wasting resources. The start-up costs to create and integrate new data sources can be expensive; data isn’t automatically valuable and must be collected, cleaned and stored. This process of collecting – whether that’s installing sensors, writing importers, back-and-forth emails with another organisation – is time-consuming. It’s also risky to collect the data without a clear, defined use case or reason because it creates a lack of transparency around what the data might be used for in the future.

All in all, taking this approach risks wasting resources and alienating the people within the system. This can damage the perception of what you’re trying to do. Sensors, for example, can be and feel invasive, even when they’re collecting non-personal information; and collecting any information without communicating why (or that it’s even happening) can damage trust, especially if people feel it’s happening without their consent.

Instead, start by clearly defining the reason why you want to build a digital twin. What value do you want to get from it? What do you want to achieve? For example:

  • We want to understand how the occupancy in each meeting room affects the air quality of those spaces, so we can suggest planning longer meetings in rooms where easy measures can be taken to improve the air quality.
  • We want to understand how much energy each space within the office is using at different times of day so we can potentially re-purpose different areas to reduce overall energy usage and/or overall carbon intensity.
  • We want to know where people prefer to sit in the office, and why so that when we re-configure the office, we assign desks to those areas.

Now you have a way of describing the value of the digital twin, explaining the reasons for collecting the data (which might be particularly important if you’re collecting personal data), and focusing your resources on collecting the data you need.

The diversity of potential use cases is why digital twins come in all shapes and sizes; not all digital twins contain the same datasets because not all are used for the same purpose. If you want to maximise energy efficiency, you need to collect very different data compared to if you need to simulate crowd movements through a large building – and your digital twin needs to have very different functionality.

To make some of this a bit clearer, we’ll take one of these use cases as an example:

We want to monitor how the air quality in meeting rooms changes during meetings and affects how productive people feel, and simulate how the air quality might be affected by meetings of different sizes, so we can so we can recommend people to switch rooms if the air quality is bad, in order to make meetings more productive.

2. Work out what information you need to do that

Now you know what you need the digital twin for, you can work out the requirements – what data you need to collect to achieve that. For our example use case, we can split this into three parts:


We want to monitor how the air quality in meeting rooms varies during meetings and affects how healthy and productive people feel.

First, we need to monitor the air quality and feelings of productivity and wellbeing in meeting rooms, as well as the number of people in those rooms, and then we want to act to recommend switching rooms if the air quality gets bad.


We want to simulate and estimate how the air quality might vary in each meeting

Then we need to be able to simulate how the air quality might change during future meetings, and how this might affect how productive or well the people in that meeting feel.


So we can recommend people to switch rooms if the air quality is bad, in order to make meetings more productive and less draining

Then we want a way of nudging people to switch rooms if the simulation shows the air quality’s likely to make their meeting significantly less productive or enjoyable.

For the “monitor” part, we need to know the levels of air pollutants in each room, how many people are in each room, and how productive and “well” the people in that room feel.

Then, for the “simulate” and “control” (or perhaps “recommend”) part, we need to know where each room is located (so we don’t recommend people a room which is a long walk away), which rooms are booked for when, and how big the booked meetings are, so we can simulate the AQ in each future meeting, and recommend people rooms which are free and big enough for their meeting.

3. Work out how you’ll get that information

This stage might involve some data sources you already have access to, or that are openly accessible, such as building floor plans or urban outdoor air quality data. You might also need to gather new information – you may install a system (or multiple systems) of sensors in place to monitor air quality or occupancy, for example.

It might be that you collect the data with a single system or sensor. Or, you could be using a variety of different types of sensor, each managed by a different provider, each with a different API you can use to access the data.

You might also want to collect some qualitative data – asking people to rate their “feeling of wellbeing” or “productivity level” may not be as simple as a one-to-10 scale or a sensor. Collecting enough of this data to be valuable can be a difficult exercise as it requires a lot of continued engagement with people. Even something as simple as going into an app on your phone after every meeting to rate how much you enjoyed a meeting out of 10, requires effort and interest which can drop off quickly after it stops being a novelty.

However you do it – collecting data is unlikely to be as simple as sticking a sensor on the wall with Blu Tack and watching a graph appear. If you’re installing sensors, for example – have you got permission to stick them on the wall? How are they powered – does each one need a spare plug socket, and is there one in every room? What happens if someone unplugs it? Do they need access to the building’s wireless network, do you need to get permission for that? And even if they don’t, and use a wireless hub – how do you make sure you keep track of the hub?

Answering all these questions takes time, and requires the buy-in of other people – building users, building managers, IT teams. That’s one of the reasons why having a clear definition of the digital twin’s potential value is important – people will be much more willing to buy into the project if you can clearly explain why you need to do it, and how it’ll improve things for them.

In our example, we’ll use a variety of different data sources; some data provided by the facilities team in our building, some from our room booking system, plus two new sensor systems we’ll install and a simple phone app we’ll ask people to download.

4. Involve the people who’ll be affected

It’s easy to see a model of the air quality in a building, for example, as completely separate from the people in the building – we’re not collecting data about those people directly, and the result intends only to improve their experience of using it. But people will be part of the equation somewhere, and not involving those people – people using the meeting room, people installing the sensors, contractors carrying out repairs, guests using the building – can have a damaging effect both on the digital twin itself and on the value gained from it.

Mapping how people in the space will affect and be affected by the digital twin is an important step. This is an ongoing consultation and communication process, and should be aimed at ensuring the people affected by it understand the value you’re hoping to gain from it (and how that value will be passed onto them), what will be happening to make that real and what input they can or should provide.

Here are a few questions that need to be asked as part of this process, and why in our example case, it would have been so important to answer them:

How will you communicate what’s going on? Sensors feel invasive, and understandably people can be uncomfortable about new technologies – especially with the intent of “monitoring” – appearing in their workspaces if they haven’t had a chance to question the reasoning behind them.

We didn’t mention that air quality sensors have been installed in meeting rooms. Someone booked the room for a sensitive call and noticed a strange device on the wall – they worried it might be recording them, and unsure who to ask about it, they took it off the wall and put it outside the room. Unfortunately it got stuck to the floor, someone accidentally tripped over it, hurting themselves and breaking the sensor. We have to invest £150 for a new sensor, sort a new risk assessment, and the same thing happens again.

How will you prevent people from misusing or misconstruing the data? Once the initial cost of building the functionality and providing mechanisms to collect the data has been sunk, the “data is valuable” approach can reappear and tempt the possibility of using the data for new purposes; some of which might not have been consented to or supported by the people affected by them.

People were happy that data is being used to re-route people to new meeting rooms. However after a while someone in a different team noticed that the data also tells us who booked the room – and decided to map who holds the most productive and unproductive meetings. People who held “unproductive” meetings were then sent tips on how to be more productive – this was understandably unpopular, and as a result everyone stopped filling in the productivity survey, making the original purpose useless.


5. Collect that info into one place?

Now you need to bring all that information from its different sources into one place – whether that’s a database, or an API which allows access to all of the data – and this is where it starts to constitute a digital twin.

You probably need to do some engineering before bringing the data into the digital twin, to make the data comparable (for example, so you know which sensors relate to which rooms, which might be named differently by each sensor provider, and might not match the names on the floorplan), and to make the different systems interoperable. This likely also requires work based on each data source, such as writing importers for APIs or digitising information in hard copy form.

In our example, we need to write importers and engineer the data from each of the sensor systems, and the room booking software, so the room and floor names match and we can compare different rooms. We also need to digitise the floorplans, as we only have these in hard copy form from the building drawings.

6. Do things with the data

Now you’ve got access to the data, what you do next comes back to the use case – and is probably a combination of some of the following options, which you would have mapped out at the start of the process when the use case was defined.


Depending on the use case, it’s likely that you want to store historic data. This is particularly important if you want to use the data for some analysis, that might rely on re-analysing historical data. It’s important to think about how long you need to store the data for; whether any of the data constitutes personal data; and how you might build in flexibility for other potential future use cases (while also being careful that these use cases are properly justified). Do you want to store the raw data, the engineered data, Or the analysis? How will you justify storing personal data and how will you do so securely?

As well as answering these questions in order to build the infrastructure of the digital twin, it’s important to make these decisions clear to the people in the system.

“In our example, we might store the engineered data from the last two years in a local database. The raw data isn’t useful to us, and can always be re-accessed through the sensor providers’ APIs, but we want to store some of the engineered data in case we want to use it again for a new use case. We’re not storing any personal data, as the perceived productivity feedback and occupancy data doesn’t record any personal characteristics.”


When all the information is in one place, you might want to give multiple people, and maybe external organisations, access to it. How you do this comes back to what the use case is; if you want to make the raw data available, you might create an API with different permission levels allowing certain people to access different sets of the data. If you just want to make the results of some analysis available, you might just produce reports.

We’ll make the data available to certain people within the organisation (such as the IT and facilities teams) using an API with different levels of controlled access.


You might just monitor the raw data itself, to facilitate some very simple action – for example by setting up a text notification for the building security guard every time a footfall sensor counts a new person entering the building, so she knows to expect someone coming through the door.


You might do some analysis of the data to inform a policy or decision. This could be an offline decision – where for example the facilities team meet to review the analysis of which meeting rooms are most frequently used, to decide which rooms can be used for other purposes.

In our example, we’ll analyse the data from past meetings based on occupancy and air quality (AQ) to estimate how the air quality in each meeting room changes with meeting size and length.


You might then use that analysis – or other mechanisms – to make predictions about some future events. How will a national holiday affect traffic moving through a city?

We’ll use the results of the analysis to simulate the AQ variation over each proposed meeting (from the room booking schedule) before it begins – based on the analysis and the real-time information about the current air quality in the room.


This analysis and simulation could then be used to influence some automatic decision: for example, analysing the average number of people in a room throughout the day, then using that to adjust the heating schedule for that room directly from the BMS.

However, this adds a layer of risk. A benefit of the digital twin is that you can work between multiple systems, like by using data from occupancy sensors to influence the BMS. Using that interoperability to automate decisions – and connecting an automated system to a usually human-operated system – raises questions about accountability. If our occupancy sensors fail, tell the BMS no one’s in the building, and then that automatically shuts the BMS off for a whole day in the middle of winter – who is responsible? What about if our decision somehow damages the BMS software, and is not covered by our insurance?

Automating decisions has great potential, and has been done successfully in many cases – but questions around accountability and reliability of the data need to be addressed and clearly answered.


You might then visualise the data, or results of the analysis or simulation – in an understandable way.

This could involve anything from a dashboard to a series of reports, to a 3D model. Although, again, there can be a temptation to visualise everything and anything that possibly can be visualised. It’s important to focus this on what you want to use it for. Is there much point in spending a long time designing a 3D visualisation of the temperature change in every room, if that visualisation doesn’t help anyone decide which room is best for them to work in? If you just want to provide behavioural “nudges”, does a simple text message work?

Our visualisation will consist of a note on the tablet outside each meeting room, which people have to interact with to “start” the meeting before they go in. If the simulation suggests their meeting is likely to result in poor air quality, we put up a note that explains this and recommends another room.

We’ll also make monthly reports for each room, for how the air quality was affected by the occupancy and meeting length. We’ll design the format of these with the facilities team, who will explain what they need to be useful for future reference.


Our example digital twin has created value for the inhabitants of our office. However, the real value for cities is when the individual digital twins – of buildings, infrastructure systems, public utilities – become interoperable, and we can use data from multiple different digital twins at once to create wider value.

What if our example digital twin contributed data to, and took in data from, a London-wide digital twin? Transport for London could understand levels of overcrowding in Farringdon station during tube delays, as well as the number of people in each office in Clerkenwell and each office’s distance to each station, and send a notification out to building owners recommending them to divert people to different stations. The creation of shared networks of digital twins has the potential help citizens and policy makers to make greatly more informed decisions on how our cities work and how we use them.

The creation of this wider value relies on a lot of underlying infrastructure: open data standards, secure ways of storing and sharing data, reliable data collection methods, resource with which to build and maintain the individual digital twins, and buy-in and mechanisms for citizens – whether they be building users, residents, owners, users of public infrastructure – to be able to have a say in how that data is used, and what of their personal data they can make available. We’ll be exploring the practicalities of creating this wider value further on in the series.

We created the Advanced Building Information System (ABIS), a digital twin of our own building, the Urban Innovation Centre in London. To learn more about our practical experience of this, and the research that went into it, check out the video below.


Advanced Building Information System video

Scroll through the news posts or click HERE to go back to the News page