How GDPR will change Data Science and challenge companies creativity and growth.
As the storm of GDPR (General Data Protection Regulation) has been dying down and everybody is back to business, I still spend most of my time discussing how Data Science will be affected in the coming years.
If you are not familiar with GDPR, in a nuttshell the European Parliament and Council of the European Union implemented a regulation within EU law on data protection and privacy for all individuals within the European Union and the European Economic Area. GDPR primarily aims to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU.
How Data Scientist add value
The role of Data Scientists has been growing at a rapid pace for the last couple of years. With the explosion of data, cloud-based technologies and the application of machine learning and artificial intelligence, the ways that Data Scientists add value range from, empowering management and officers to make better decisions, challenge staff to use experiments to test assumptions and create a data-driven approach, all the way to identifying and refining target audiences for marketing and business purposes.
How are GDPR and Data Science related
As a fast growing number of companies are building Data Science teams, the introduction of GDPR can have a large impact on the way Data Science can and will be used. In most cases, the vast amount of data is stored and processed to provide insights in the underlying data. But as Data Scientist are building models that are able to automate decision making, companies will be required to think about the processes that they are making and if they are not in violation of GDPR. Several simple concepts can have a big impact on what Data Scientists can do, for example pseudonymisation.
Pseudonymisation is the process of transforming personal data in such a way that the resulting data cannot be attributed to a specific data subject without the use of additional information. For many businesses collecting names, addresses, emails and other information where the foundation for building models that created direct value for the business. From creating cohorts to determine which group of users, living in a specific area would benefit from an upcoming pizza promotion, to blocking fraudulent card transactions that are originated from a specific country in combination with a non-traditional name or other personal attributes.
Besides worrying about the lawful basis to process the data, demonstrating compliance with GDPR and taking the necessary steps to protect the data by design and default, data scientists will have to start building very strong cases for why they want to access and use specific user data, while at the same time keep in mind that they might be required to provide access or erase the data, which can have a big impact on the results of their algorithms.
How Data Scientists will have to work with Data
Unlike many I actually embrace GDPR because. First of all because as a user of many digital services, I like the fact that I have a little bit more control over what happens with my data if I decide to leave a service, I am able to delete (or request to delete) my data.
As a Data Scientist, I actually embrace the challenge. First of all the way that I will have to think about what I want to build, will be challenged from a legal perspective, which is actually a good thing. To often Data Scientists, want to “play” around with data or import data from various sources, launch a feature or product to end up having their CEO defending their action in front of a group of lawmakers.
Another part I like about it, is that GDPR will challenge more Data Scientists to interact with Development as well as DevOps. Just thinking about the architecture, dealing with where servers are located as well as how pseudonymisation or full anonymisation needs to be automated, will require Data Scientists to learn new concepts and maybe even find new ways of doing things.
What can you or your organisation do to still benefit from Data Science?
So you might be wondering, what can I do to make sure that my Data Scientists are complying with GDPR but still have the freedom to develop the data-driven features that are necessary to differentiate your company from others.
Dependant on the type of business that you are in appointing a Data Protection Officer, is the best way to get familiar with the limitations your company will be subject to.
The next step is getting the Data Scientists, CTO (or head of IT) and the Data Protection Officer in the same room to discuss, what projects Data Scientists are currently working on and are planning to work on in the future. By figuring out what will be required, you can start looking at the process and the data that will be involved.
Most companies will discover that if they hired capable people, they already have most things figured out and only need to make slight alterations. As far as Data Scientists, I would encourage them to embrace learning about the legal aspect so it will be part of their own thinking and decision making, as well as sitting a little bit closer with your Tech team and DevOps to try to figure out what you can do to still build the amazing products and features you have in mind.