Data portability is a hard problem worth solving

Article

Published on September 27, 2022 by ilDon

Data portability can be defined as the ability to take your data with you when you switch services. It is a feature that is often requested by users, but it is also a feature that is often not as smooth as users would like it to be.

The benefit for users is that they can take their data with them when they switch services. This is especially important for users who have a lot of data, such as users who have been using a service for a long time. Data portability is also important for businesses. When users are not locked in to a service, they are more likely to switch to a new service if it is better than the service they are currently using. This is especially important for start-ups, as they need to be able to grow their user base quickly in order to be successful.

Many actors are involved in the data portability process. The user, the service provider, and the service that the user wants to switch to are all involved. The user is the one who wants to take their data with them when they switch services. The service provider is usually the one who has the data that the user wants to take with them. The service that the user wants to switch to is the one that the user wants to take their data to. The user, the service provider, and the service that the user wants to switch to are all involved in the data portability process.

Some level of coordination between such actors is needed in order to make the data portability process work. This coordination is not always easy to achieve. There are many initiatives that are trying to solve this coordination problem. On paper, the most notable of these initiatives would be the Data Transfer Project. This project looks promising as many tech giants are involved in it (Google, Microsoft, Facebook, Twitter, etc.). However, the last official update from their website was from 2018 when they announced that Apple had joined the group. Since then, there have been no news from the project.

The most obvious reason for this inactivity from the tech giants participating in the Data Transfer Project is that there are little to no commercial incentives for them to help users switch to competitors. On the contrary, their commercial strategies are based on keeping users locked in their services.

On top of that, there are technical challenges that need to be solved in order to make the data portability possible. These technical problems are not easy to solve because they involve many technologies and protocols. In the following sections, we will discuss some of these technical problems.

Why data portability is hard

From a technical perspective, data portability means that the user needs to be able to export their data from a service provider and import it into the service that the user wants to switch to. This means not only physically moving the data from one place to another, but also transforming the data into a format that the receiving service can understand.

The problem of interoperability between the data structures used by service providers, technically speaking, is probably one of the most complex issues involved in data portability.

At their core, data structures are the representation of a problem. Many, if not most, problems we face every day as humans can be narrowed down to a set of data. For example, facilitating or enabling social interactions through an online network of people is a problem that can be represented by a set of users and their relationships. A calendar is a problem that can be represented by a set of events. And so on. How the data is represented by each service provider to solve such problems is what we call a data structure. And each service provider might use a different data structure to solve the same problem.

A very simple and trivial example, one that describes the problem very well, is how we represent dates in different cultures. In some cultures, the month comes before the day, while in others, the day comes before the month. As a consequence, the numbers 11-12-2022 can have very different meaning depending on the culture. In some Countries, it means December 11th, 2022. In others, it means November 12th, 2022. When it comes to dates and other commonly used data structures, there are standards that are used to represent them in a way that is understandable by all. For example, the above date is usually represented in the ISO format as either 2022-12-11 or 2022-11-12.

However, there are many data structures that do not have such standards. For example, the data structure used by a social network to represent users and their relationships is not standardized. This means that the data structure used by one social network is not necessarily understandable by another social network.

This is one of the main technical reasons why data portability is such a hard problem to solve.

What are open formats and how they help

Open formats are a way to solve the interoperability problem. An open format is a format that is standardized and publicly documented. This means that anyone can use it to represent data. This is very useful for data portability because it means that the data structure used by one service provider can be understood by another service provider.

On the other hand, proprietary formats are formats that are not standardized and not publicly documented. This means that only the service provider that uses the format can use it to represent data. This is not very useful for data portability because it means that the data structure used by one service provider cannot be understood by another service provider.

An example of open format created to solve the interoperability problem was XML. XML was first created in the late 1990s to solve the interoperability problem between different software applications. It was a very successful initiative, and it was used by many software applications. However, XML is quite verbose, and it is not the easiest to use. Nowadays, most applications prefer to rely on the open standard JSON, which is a much simpler format for representing data.

Open formats such as JSON are not enough though to make data portability easy. This is because they are not enough to solve the interoperability problem. They are only a part of the solution. While they make it easier to understand the data structure used by one service provider, they do not in themselves explicitly define the data structure used by one service provider. This is why open formats need to be used in conjunction with open specifications.

Open specifications are specifications that are standardized and publicly documented. This means that anyone can use them to define the data structure used by one service provider. This is very useful for data portability because it means that the data structure used by one service provider can be easily understood by another service provider.

How Anita tries to solve the problem of data portability

In short, Anita solves the problem of data portability by using open formats and open specifications as the foundation of how it works.

As per the open formats, data can be saved in Anita in a single file, that is a JSON file or a SQLite database. This means that the data can be easily exported from Anita and moved around as the user wants to without any limitation (to learn more on this, see here). For devices that do not support saving data in files on the device (such as mobile devices), the data is stored internally in the browser, and it can be exported at any time in JSON.

As per the open specifications, first in Anita there are no predefined data structures. It is the user who defines the data structure of each project. This means that the user can define the data structure of each project in a way that is understandable by them. This is a very useful first step to ensure data portability because it means that the user is in control of the very first step of the data portability process.

The fact that the user can define its own data structures means that users can port their data to another service at any time, as long as the new service also allows them to define their own data structures. Users, however, are in general not very keen to waste time setting up services. And most services still do not allow users to define their own data structures. This is why Anita includes the data structure of each project in the data file used to store all the user data for that particular project.

In each file in which data is saved in Anita, the data structure of the project is also saved. This is a very useful second step to ensure data portability because it means that any receiving service can use the data structure of the project to understand the data that is being sent to it. If the new service allows users to define any data structure as Anita does, then the user can simply import the data into the new service. The new service will merely need to adjust the data structure of the project to fit it to its supported features. If the new service uses a predefined data structure, then the new service can easily interpret the data structure of the project and use it to understand the data that is being imported.

Conclusion

Data portability is a very important feature that is missing from most services. It is a feature that is very useful for users, but it is also a feature that is very hard to implement. This is why it is not surprising that most services do not support it.

With Anita, we are trying to solve the problem of data portability by using open formats and open specifications as the foundation of how it works. The philosophy that we follow is that the user should be in control of their data. This is why we allow users to define their own data structures, and we also include the data structure of each project in the data file used to store all the user data for that particular project. Try now Anita yourself to see how it works and how it can help you to be in control of your data.