Excerpt from The Cloud Report 2022
Sustainable Data Strategy
Sustainability can be understood in different ways. In the narrow sense, sustainability means climate neutrality, ecology, green IT, and so on. In a broader sense, however, sustainability has a much greater social value. Solutions that are developed and used openly, transparently, and collaboratively, that promote sovereignty and independence, are also sustainable. In software, for example, open source solutions are sustainable. Well-documented open source code prevents the same solutions from having to be developed repeatedly and increases the security and quality of the software, as in most cases the code has been reviewed by several developers.
But transparent solutions also offer sustainability and flexibility in terms of processes and hardware solutions. Let us take the example of an ideal data centre that is climate neutral or even generates green energy. If the basic structure in combination of hardware and software can be communicated openly and transparently, then this solution can be transferred to other data centers.
Data centers are always dependent on their specific environment regarding the surrounding climate or waste heat utilization, but transparently communicated experiences and information nevertheless help to build data centers in a climate-positive way at some point. In addition, if you interpret sustainability more broadly and always look at what opportunities arise from this approach, then a sustainable data strategy requires that organizations use and support open source, that solutions are developed with open standards so that they are able to share this solution sustainably with other organizations as well, so that they do not have to develop the same thing again on their own. This is one of the prerequisites for ultimately achieving a sustainable data strategy.
Sustainability also means that organizations are able to comply with the policies and regulations that they set for themselves as an organization, that are required by legal regulations or that are established in agreement with business partners or customers. This is also part of a sustainable data strategy, which allows you to comply with very different rules, store them transparently for all parties involved and adapt them at any time. This is particularly relevant for organizations in regulatory industries.
Sustainability therefore means long-term and ecological solutions, but also openness, transparency, flexibility, being able to work in hybrid environments, or rather to work in whatever environments the user wishes. Green IT should always be sustainable in a broader sense, even if the focus is initially on the ecological side. Green IT is often about the question of the carbon footprint or, for example, about the question: How performant am I with the software when I upload 1000 files? How much electricity do I really consume to do that? Fortunately, this is easily measurable nowadays.
This is where the newly introduced ownCloud Infinite Scale solution comes in.
Thanks to it, the user is able to upload an amount of small files up to ten times faster. Moreover, higher speed helps saving energy. If I upload faster, if I have to burn fewer CPU cycles to do certain things, I logically save energy at the same time.
Green IT and sustainable data strategy go hand in hand. True carbon neutrality requires close cooperation between data centers and software providers. Much is demanded of hardware providers in terms of carbon footprint, but software developments can and must be energy-efficient in the long term, so that the interaction of software and hardware can result in significantly lower energy consumption. After all, in the long run, providers of digital services will only work sustainably if carbon neutrality is not achieved through climate certificates.
In a broader sense, sustainability has a greater social value. Solutions that are developed and used openly, transparently, and collaboratively, that promote sovereignty and independence, are also sustainable.
One of the challenges of open source software providers with open standards is that they thus have open offerings that can be operated virtually anywhere. Whether the software is really used in a climate-neutral way always depends on the environment in which it is applied. There are numerous examples of green software, but it is important to use it responsibly, because even with climate-active software you can waste energy if you use it incorrectly or operate it in an energy-inefficient environment.
There are some positive examples of open source projects and software, but there are still too few software vendors who think and develop in this direction. The first open source software has been awarded with the Blue Angel. This is the software Okular, a PDF viewer from the KDE Project. This shows that it seems to be possible for software in general to be awarded a Blue Angel.
The underlying criteria can become a basis for software development in the long term. An important step in this direction is definitely to be aware of which partners organizations want to work with and to develop criteria on how the respective organization wants to achieve true carbon neutrality. Many providers claim to be carbon neutral, but only achieve this by buying climate certificates, thus ‘green washing‘. This is where the broader meaning of sustainability comes into play. As a user of green open source software, I have the technical freedom to run it in the data center of my choice, I am not dependent on a particular provider and, accordingly, not dependent on an environment.
I have the option of choosing an energy-efficient data center that meets my criteria. That is the decisive factor for Green IT. From 2030, data centers in Europe must operate in a climate-neutral manner. So, such questions will basically also become relevant for data center operators, especially if it plays a decisive role which software is used or offered. They will have to ask themselves: Which software fits into my climate concept? How can I build my overall environment to save energy sustainably? These questions will not only continue to be important in climate issues, but will also become more relevant in financial terms, as the development of the global energy market will still be tense.
Transparently measurable In order to achieve the previously described, the overall system must be measurable, and the measurement data must then also be communicated transparently. The total energy consumption is basically measurable, individual applications are only measurable if only this one application is used in a certain period of time. This would also have to be compared with the energy consumption in standby mode. However, in principle, it will be possible to determine the consumption of individual applications: uploading and downloading certain data sets, collaborative editing, and so on. Measurement criteria will have to be developed for this in the future. Standards must be developed on how to implement this so that it becomes comparable.
A sustainable data strategy means being able to really understand what happens to the data and when, as well as defining exactly when to delete it, for example. Strategic deletion of data then has to do with Green IT in a broader sense; everything that is deleted does not consume any storage space and thus no energy. However, this is a simple connection and only takes effect with a long-term data strategy. However, control over one’s own data always has something to do with having an overview and keeping it up to date, which means that this cleaning up is often forgotten. However, data has the characteristic that it always becomes more and more and one only rarely deletes things.
This can be problematic, as deletion is actually necessary for many regulations. Almost all NDA agreements that deal with confidentiality state that both sides have to delete the data when the business relationship is over. This is rarely checked. And how often the deletion of data really takes place in practice, including the backup tapes, is also a question that arises here. On the other hand, there are retention obligations that have to be complied with. Data maintenance is therefore a challenge for every organization that must be built into every data strategy.
For example, the ownCloud solution is able to set deletion periods. Metadata can be used to define the period of time after which certain files are to be deleted or placed in an archive. For the archives, too, time periods can be defined. For example, if documents have to be kept for 10 years, this can be marked accordingly and they will then be for 10 years. Other archives bring up the files after two years, and if no one has touched the files during this period, they can be deleted. A bit like the “Simplify your Life” method, where once a year you put everything you don’t need in a box. If you realize after a year or two that you haven’t even opened the box, you can most likely just return it to the circular economy, which is more sustainable than throwing it away for things from the household. For data, a different strategy should be used, as they should really be deleted.
As a user of green open source software, I have the technical freedom to run it in the data center of my choice, I have the option of choosing an energy-efficient data center that meets my criteria. That is the decisive factor for Green IT.
Companies should think about data strategy from the very beginning and create a system where employees work in a cloud-based and collaborative data room that is independent of any individual. This means that they do not only have their own data, which is only maintained by one employee and, if necessary, never deleted. However the different workspaces share the data room and then also tidy it up accordingly, and do so together, so that joint archiving strategies can also be developed, which then apply to the entire data room. This also includes a common version management, a common recycle bin for data, a common archiving system and cleansing metadata. This is one approach to a sustainable data strategy. The metadata can be used to predefine whether something should be deleted straight away or resubmitted after two years. If something has been deleted, however, there is a certain amount of time to find and restore the files. After a defined period of time, it goes into an area where only the support team can restore everything, and again after a defined period of time it is finally deleted.
Currently, ownCloud is working on making data maintenance and processing “smarter” with machine learning. This means that in the future it will be possible to automate much more than today. The system will learn what exactly people frequently touch and what not at all, then report back what has not been used and for how long and, if necessary, make recommendations for archiving.
The next future step for ownCloud will be to link this metadata with each other. It will be about linking the various metadata and the various pieces of information that are available and then supporting them in even more intelligent decisions that are not only based on simple policies. Another aspect of data maintenance is version thinning. When documents are created collaboratively, there are always different versions of the documents that are created, and in most cases not all of them are needed, unless it is something to do with legal documents, where the history of creation would be relevant. Otherwise, if there is a version two, the different variations of version one can be deleted. Implementing version thinning complements active data maintenance in an organization and also supports low energy needs in the long run.