20 Years FSFE: Interview with Vincent Lequertier on AI
In our sixth birthday publication we are interviewing Vincent Lequertier about crucial aspects of artificial intelligence, such as its transparency, its connection to Open Science, and questions of copyright. Vincent also recommends further readings and responds to 20 Years FSFE.
A PhD candidate at the Claude Bernard university in Lyon who researches artificial intelligence for healthcare, Vincent supports software freedom and volunteers for the FSFE in his free time. He has been a part of the System Hackers, the team responsible for the technical infrastructure of the FSFE, for many years. His contribution was valuable in setting the foundation for the for the good state that the FSFE's System Hackers team is today. Vincent is also a member of the FSFE's General Assembly, and participates in the 'Public Money? Public Code!' campaign. In our interview, Vincent shares his thoughts answering questions about the current state of AI and its future implications.
Interview with Vincent Lequertier
FSFE: You are deeply involved in the field of artificial intelligence. How would you explain to a 10-year-old what AI is?
Vincent Lequertier: A few years ago I was a speaker at a local radio station, and sometimes I was responsible for mixing the audio. At the station, there were several inputs: the mics of the radio speakers (mine included), the music, the jingles, and so on. And then there was the output broadcast to the radio listeners. Between the inputs and output there was the mixing table, with its uncountable knobs and sliders. I needed to adjust the knobs and sliders so that the inputs were well mixed together, thus producing an output that sounded nice to the listeners. At the time of writing, an AI works just like that. It automatically adjusts the numerous parameters of a digital, virtual mixing table. Once put through it, the inputs produce a satisfying output according to a predefined definition of success (that the sound was nice in this analogy).
You are advocating for accessible and transparent AI. According to your research, what would you say are the necessary requirements to make sure that programs using artificial intelligence are accessible and transparent?
Reusing AIs makes sense because they are costly to develop and train, both in terms of human and computer resources. Additionally, training AI models demands a lot of data which are particularly hard to obtain and work with. Therefore, being able to reuse an AI is important, as it saves time and potentially scarce resources. Moreover, making an AI available to others fosters innovation by facilitating collaboration. I think a fundamental requirement for AI accessibility is Free Software, because AIs licensed as Free Software (also known as Open Source) are inherently accessible. Other requirements can be Open Standards and Open Data. AI models should therefore be published and freely accessible.
Transparency in AI is the ability to understand and interpret the output coming from it. Although given the complexity of today's AI systems transparency can be hard to obtain, it is an important characteristic as it fosters trust. Being able to understand why a given output was produced, and what part contributed the most to it, increases confidence in the model and makes it easier to debug. Moreover, understanding the role played by each input can help data-driven policy making. For example, in healthcare, understanding the most important factors impacting the quality of patients' care for a disease can validate or change healthcare practices. Free Software is a key part of transparency because it allows everyone to use the AI and analyze its predictions to better understand them.
How can we make sure that inequalities in our current societies do not pass on to AI data training? How can we assure that AI results are fair?
As AI is really good at magnifying existing inequalities found in the data used for its training, fairness issues will creep into AI. Detecting those issues in the dataset and in the AI's output is therefore critical. However, simply removing data that might be a source for unfairness (e.g. a training dataset variable that is not representative of the data used once the model is put in production) may not always work, because these data might be correlated to other attributes in the dataset which would need to be removed as well. Completely removing any potential inequality may therefore remove a lot of data from the dataset, potentially limiting the ability of the AI to properly address the problem it has been designed to solve. Inequalities therefore come from badly constructed datasets, and advanced methods are required to circumvent them.
Data related to COVID-19 are public, and the most popular website to visualize these data as well as other tools are Free Software.
To detect fairness issues, a definition of fairness must be decided upon. For example, fairness may be defined as whether pairs of similar individuals get similar predictions (individual fairness), or it may be defined as whether predictions are similar across a majority and minority group according to some characteristics (group fairness). This fairness measure may be computed once the AI has been trained to identify potential unfairness, or may be computed during the AI training so that it can take the notion of fairness into account when it adjusts its parameters.
Free Software is also important here, as it allows everyone to check for fairness issues, whether by inspecting the source code or by running the AI directly and analysing its predictions.
Your research focuses on healthcare, a field that has universally raised the question of supporting Open Science. To what extent are health metrics and biometrics open? Is artificial intelligence for healthcare a big and globally collaborative aim or independent and competitive?
Well, it depends! Because of security and privacy, access to individualized healthcare metrics is often restricted and each study using them must be approved by a ethical committee. However, aggregated statistics may be widely available. For example, the website data.gouv.fr has a section dedicated to healthcare. Also, the data related to COVID-19 are public, and the most popular website to visualize these data as well as other tools are Free Software.
The openness and collaborative aspects of research on AI will improve, partly because scientific journals encourage researchers to share all the research materials, including source code, and also because funding institutions can also ask them to do so. [...] I also think that the line of reasoning around our "Public Money? Public Code!" campaign applies for AI research.
However, it should be noted that data without enough granularity can reduce the AI's performance in healthcare, as, just like humans, an AI application needs to have detailed information, especially if the goal of the AI is to make predictions at the individual level. Because healthcare outcomes are so dependent on context, prediction abilities depend on specific healthcare situations. More open data and more Free Software (i.e. Open Science) make it easier to collaborate. A shared dataset released under a Free Software licence creates a "playground" where AI models can be easily compared and where we can create benchmarking tasks, such as hospital length of stay prediction. Without a proper benchmarking task, finding methodological improvements is harder. An example of an open dataset for healthcare is MIMIC. Also, a lot of papers about AI research are freely available on arxiv.org. I think the openness and collaborative aspects of research on AI will improve, partly because scientific journals encourage researchers to share all the research materials, including source code, and also because funding institutions can also ask them do do so. For example the Horizon 2020 program of the European Union values Open Science.
I also think that the line of reasoning around our "Public Money? Public Code!" campaign applies for AI research.
A common expectation for the future of AI is that it can have abrupt economical and societal impact by making many job positions redundant. Do you see this as a possibility for the upcoming years? If so, is there any practice that could alleviate these consequences? Would Free Software be one?
I think AI has come a long way in the last ten years. It is more and more able to organize and structure information. The fields which have made the most impressive progress are natural language processing (i.e. tasks involving text such as sentiment analysis) and computer vision (i.e. tasks involving images such as image classification). In natural language processing, deep learning models can semantically understand words and documents as well as the relationships between them. So I think the jobs where AI will be able to assist us (I consider things only from a technical point of view here) are the jobs dealing with a lot of structured information that needs to be understood, processed, and memorized, as AI is becoming better at this than us. For example, AI-based software has shown good results in assisting in radiology, legal document analysis, and programming (see next question). So it's possible that AI makes people more efficient, which would reduce the amount of human work required. However, this work would require skills where AI does not work well at the time of writing, such as creativity or emphatic and thoughtful communication.
The jobs where AI will be able to assist us (I consider things only from a technical point of view here) are the jobs dealing with a lot of structured information that needs to be understood, processed, and memorized, as AI is becoming better at this than us. For example, AI-based software has shown good results in assisting in radiology, legal document analysis, and programming.
If AI is bound to get better, and will at some point have the capacity to completely automate some work, transparency and fairness can only become more and more important. Although not sufficient, Free Software is a big part of what helps putting strong safeguards in place.
However, I don't think it's up to the scientific community to design policies around employment. Putting together a proof of concept or finding a novel theory that could automate some work is not a reason for implanting it in everyday lives. In the past years, the EU has already had to deal with AI applications that are impressive technically but raise ethical concerns. For example, the Clearview AI facial recognition platform has been judged illegal in some EU countries, and citizens have the right to opt out from this technology. The next few years will be important with regard to AI ethical concerns, and the upcoming EU Artificial Intelligence Act might play a big role in it.
And finally, although I'm not a historian, I think that over the last centuries we have made tremendous technological progress and society has always evolved along with it. Thinking about the past challenges of technological improvements would help us to understand whether they would be different this time around, and how to deal with them as best as we can.
What legal issues do you think will be raised regarding AI in the next ten years? Would it be issues of ownership or responsibility? For example, we are already seeing ethical and technical aspects of the AI ownership in Github's Copilot. We are interested to know what the upcoming crucial questions are, according to you.
Issues around ownership and responsibility will be very important, and Copilot is a prime example of that, where the fundamental question is whether AI creations can be considered as novel ideas, and, if they do, whether they are copyrightable on their own. Specifically on Copilot, the fact that a code completion tool may yield straight copies of licensed work can be problematic, as, at the time of writing, the AI does not know the licence under which the source of autocompleted code is released, and how the licence should be respected. For example, to the best of my knowledge, it is not clear whether code autocompleted by Copilot originally released under the GNU Public Licence makes the rest of the project a derivative work. Being able to freely use source code often comes with obligations that need to be fulfilled, regardless of whether it is accessed by an AI or a human being. Our REUSE project, which aims to make it easier to programmatically understand how a project and its diverse components are licensed, may help building licensing-aware programming tools. The same legal troubles apply to other models able to generate content, in domains such as in painting or in music production.
Another legal issue is with patents, where the question of whether an AI can be a patent author is still undecided. In the UK and EU, a patent whose inventor was an AI was rejected because they considered that AI does not have a legal personality and cannot have a legal right over its output. But a couple of months ago, the first patent which lists AI as the inventor was approved.
The fundamental question is whether AI creations can be considered as novel ideas, and, if they do, whether they are copyrightable on their own.
Is there any book about artificial intelligence you would like to recommend to our readers?
I can't recommend "Genesis" from Bernard Beckett enough. It is a small novel showing a philosophical debate around the questions of what it means to be human and of whether machines can have consciousness. The classic "I, Robot" from Isaac Asimov also raises many questions that make a lot of sense today (it was published in 1950!) If we are building autonomous robots with some freedom of action, what safeguards must we put in place? The book is really about how to ensure AI works as intended.
You have been a part of the FSFE for several years. What is an important thing that you learnt from this experience?
I learnt that Free Software can be viewed from a lot of different angles and is not only a technical topic. This translates into the diversity and breadth of our community. This diversity is a huge strength.
And what is a story that still makes you smile when you remember it?
My first FOSDEM in 2019. I met some awesome people from our community. That was really heartwarming.
As a last question, what do you wish the FSFE for the next 20 years?
I wish the FSFE will be able to tackle the challenges ahead. The next years will be full of innovations that will make technology even more ubiquitous in our lives. I hope we will be able to keep spreading the word about Free Software and the values behind it.
Being able to freely use source code often comes with obligations that need to be fulfilled, regardless of whether it is accessed by an AI or a human being. Our REUSE project, which aims to make it easier to programmatically understand how a project and its diverse components are licensed, may help building licensing-aware programming tools.
FSFE: Thank you very much!
About "20 Years FSFE"
In 2021 the Free Software Foundation Europe turns 20. This means two decades of empowering users to control technology.
Turning 20 is a time when we like to take a breath and to look back on the road we have come, to reflect the milestones we have passed, the successes we have achieved, the stories we have written, and the moments that brought us together and that we will always joyfully remember. In 2021 we want to give momentum to the FSFE and even more to our pan-European community, the community that has formed and always will form the shoulders that our movement relies on.
20 Years FSFE is meant to be a celebration of everyone who has accompanied us in the past or still does. Thank you for your place in the structure of the FSFE today and for setting the foundation for the next decades of software freedom to come.