Apple: a horde of lawyers. The FSFE: ONE lawyer. We still landed a knockout! Support us today to keep standing up to Apple and defending our user rights!

Åtvaring: Denne sida har ikkje blitt omsatt endå. Her ser du den originale versjonen av sida. Du kann hjelpa til med omsetjingar, eller andre ting.

Transcript of SFP#7 Artificial intelligence as Free Software with Vincent Lequertier

Back to the episode SFP#7

This is a transcript created with the Free Software tool Whisper. For more information and feedback reach out to podcast@fsfe.org

WEBVTT

00:00.000 --> 00:18.080
Welcome to the Software Freedom Podcast.

00:18.080 --> 00:22.560
This podcast is presented to you by the Free Software Foundation Europe, where a charity

00:22.560 --> 00:25.480
that empowers users to control technology.

00:25.480 --> 00:29.020
I'm Matthias Kirchner, the President of the Free Software Foundation Europe, and I'm

00:29.020 --> 00:31.820
doing this podcast with my colleague, Bonnie Mehring.

00:31.820 --> 00:33.060
Hello!

00:33.060 --> 00:36.980
In this episode, we will talk about artificial intelligence and free software, which

00:36.980 --> 00:41.620
for me is also a lot about the question, how do this will power between computers or machines

00:41.620 --> 00:43.900
and humans on the other side.

00:43.900 --> 00:46.020
Our guest for today is Versa Le Cartier.

00:46.020 --> 00:51.860
He is an active FSFE contributor in our French team, our system hackers team, and

00:51.860 --> 00:55.140
regularly also gives talks for the FSFE.

00:55.140 --> 01:00.060
In his state chart, he is a PhD student at the University of Claude Bernard, researching

01:00.060 --> 01:03.740
about artificial intelligence for healthcare systems.

01:03.740 --> 01:05.060
Hello Versa.

01:05.060 --> 01:06.060
Hello Bonnie.

01:06.060 --> 01:07.060
Hello, Matthias.

01:07.060 --> 01:08.060
Hello Versa.

01:08.060 --> 01:13.540
To just go ahead, Versa, when we talk about artificial intelligence, I automatically

01:13.540 --> 01:18.220
think of Hall from 2001, a Space Odyssey or a Samantha, the artificial intelligence

01:18.220 --> 01:20.100
from the movie Her.

01:20.100 --> 01:24.900
Is this how I should imagine artificial intelligence looks like?

01:24.900 --> 01:31.140
You know, I never got around to seeing the 2001 Space Odyssey movie, but I did watch

01:31.140 --> 01:32.140
the movie Her.

01:32.140 --> 01:38.100
I found it a bit creepy, but I don't think that we are anywhere close to making voice

01:38.100 --> 01:42.180
assistant with emotions and personality.

01:42.180 --> 01:45.540
But AI is much more than interactive robots.

01:45.540 --> 01:51.220
It encompasses a lot of different techniques, aiming at simulating, point-some-cases,

01:51.220 --> 01:54.660
surpassing human intelligence.

01:54.660 --> 02:01.660
It also includes chat bots, voice recognition, text translation, bots in video games, and

02:01.660 --> 02:02.660
so on.

02:02.660 --> 02:08.820
A formal definition of artificial intelligence may be any system that can learn how to

02:08.820 --> 02:12.500
perform a task based on observation.

02:12.500 --> 02:19.340
If I want to cite practical examples of AI, I might say things like Minecraft, the voice

02:19.340 --> 02:21.940
assistant, that is a free software.

02:22.500 --> 02:27.500
OK, so we do have artificial intelligence in our lives.

02:27.500 --> 02:35.980
Yes, whether we read it or not, AI is here for us, and it's a powerful technology that

02:35.980 --> 02:43.820
has been in our lives since maybe one decades or two.

02:43.820 --> 02:49.660
Vasa, in your presentations about your work with AI, your main demand there was in the

02:49.660 --> 02:55.740
past that artificial intelligence should be accessible, transparent, and fair.

02:55.740 --> 03:00.100
I think it would be very interesting for our listeners to dive more into those criteria

03:00.100 --> 03:02.700
and what you understand about that.

03:02.700 --> 03:06.580
Maybe we could start with the fairness part, Bonnie, you had some questions when we were

03:06.580 --> 03:07.580
preparing for this.

03:07.580 --> 03:09.740
Do you want to go ahead?

03:09.740 --> 03:10.740
Yes, please.

03:10.740 --> 03:15.940
Vasa, I was wondering, what does fairness mean for an AI?

03:15.940 --> 03:21.820
Would it be seen as unfair if an AI does not follow the laws of a society like the law

03:21.820 --> 03:26.540
to not discriminate any people no matter of the race, sex, or gender?

03:26.540 --> 03:34.460
So yes, if I want to define fairness for artificial intelligence, fairness will mean the equality

03:34.460 --> 03:41.140
of treatments for everyone for the less of things that you don't want to include in

03:41.140 --> 03:43.620
your prediction models.

03:43.620 --> 03:48.180
For example, you might want to have a fair artificial intelligence that do not take into

03:48.180 --> 03:57.260
account your gender or your race or your religion or your age or any kind of sensitive attributes.

03:57.260 --> 04:02.420
Do you have an example of how an AI could discriminate someone?

04:02.420 --> 04:05.580
Yes, so I have a couple of examples.

04:05.580 --> 04:09.540
That was the case of racial bias in healthcare a couple of years ago.

04:10.100 --> 04:15.500
This has been reported in a research article whose title is Disacting Racial Bias in

04:15.500 --> 04:19.460
an Algorithm used to manage the health of population.

04:19.460 --> 04:25.500
And in this article, the authors found that widely used algorithm used to assess the risk

04:25.500 --> 04:32.260
of health issues, so the health issues of people had racial bias.

04:32.260 --> 04:38.180
And this algorithm is used to identify high-risk patients, which get more care resources

04:38.180 --> 04:41.660
and attention from the hospital staff.

04:41.660 --> 04:46.780
But unfortunately, the issue with this algorithm is that to get the same risk score as white

04:46.780 --> 04:51.180
people, black people had to be much more sick.

04:51.180 --> 04:57.260
And this is presumably caused by raising the risk estimation on the health of the people,

04:57.260 --> 05:01.060
but also on the estimated health care cost.

05:01.060 --> 05:06.940
So as you can see, AI bias can have important real-world consequences.

05:06.940 --> 05:12.980
And I can give you another example this time in the US justice system.

05:12.980 --> 05:19.020
There is a proprietary software called the Compass used to tell how likely someone is going

05:19.020 --> 05:21.620
to receive the data in their client.

05:21.620 --> 05:27.180
An analysis by Kopelbika revealed that the algorithm was racist.

05:27.180 --> 05:33.860
It turned out that compared to white people, black people at a much higher risk of being

05:33.860 --> 05:41.340
falsely considered as risky criminals that are going to commit their crimes again.

05:41.340 --> 05:47.900
So in other words, the algorithm told that black people were much more dangerous for societies

05:47.900 --> 05:49.940
than white people.

05:49.940 --> 05:56.580
And conversely, white people were often misclassified as low-risk difference, which means unlikely

05:56.580 --> 05:59.060
to receive their crimes.

05:59.060 --> 06:05.340
So the false positive rate was much higher for black people compared to white people and

06:05.340 --> 06:07.900
so reverse for the false negative rate.

06:07.900 --> 06:13.620
Again, this shows that unfair algorithms exist in the wild and that they are using critical

06:13.620 --> 06:15.020
cases.

06:15.020 --> 06:18.500
And on top of that, those two algorithms aren't free software.

06:18.500 --> 06:24.740
OK, before I go over to my next question, could you shortly describe what false positive

06:24.740 --> 06:27.100
and false negative means?

06:27.660 --> 06:34.220
Yes, so to explain false positive and false negative and true positive and two negatives,

06:34.220 --> 06:38.980
I will give you an example based on the spam detection.

06:38.980 --> 06:45.540
So the spam are emails you don't want to see and to tackle spam.

06:45.540 --> 06:51.620
There are some software that is used to classify whether an email is a spam or a legitimate

06:51.620 --> 06:54.060
email.

06:54.060 --> 07:00.500
So if you get a message and it's completely legitimate email, but the software classifies

07:00.500 --> 07:06.860
it as spam, it will be called a false positive because the software thought that the email

07:06.860 --> 07:09.540
was a spam, but it wasn't.

07:09.540 --> 07:17.940
If the email was in fact a spam, but the software thought it was completely legitimate, it will

07:17.940 --> 07:24.900
be called a false negative because the software thought that the email wasn't a spam.

07:24.900 --> 07:30.300
And so the true positive and the true negatives are correct classifications, meaning that the

07:30.300 --> 07:36.420
software correctly classified the emails as spam or legitimate email.

07:36.420 --> 07:43.580
So this is an example that can be used to explain this concept.

07:43.580 --> 07:46.700
Vasa, I also have a question about what you just said.

07:46.700 --> 07:51.580
So I mean you said that sometimes there are mistakes that happen.

07:51.580 --> 07:57.900
But I mean when we look back in history of a humankind, there were a lot of occasions

07:57.900 --> 08:02.580
when humans on purpose discriminated certain groups.

08:02.580 --> 08:08.420
And a lot of that was also done on purpose with architecture, with technical means, like

08:08.420 --> 08:13.060
for example in Lauren Slasik's book with whom we also talked about regulation before

08:13.060 --> 08:20.500
in one of the podcasts, there's an example of how bridges and train lines were used to

08:20.500 --> 08:27.180
make it harder for certain minorities to go to other parts of a city and get better

08:27.180 --> 08:28.380
jobs.

08:28.380 --> 08:34.940
How can we find out if something is done by mistake or if that's on purpose when you

08:34.940 --> 08:38.500
have an AI involved?

08:38.500 --> 08:45.620
So if you have only the new result of the AI, I mean if you have only the predictions,

08:45.620 --> 08:51.420
then you cannot really know the intent like the is the purpose of the predictions.

08:51.420 --> 08:53.140
What you need is a source code.

08:53.140 --> 09:00.700
So free software will help you to know the purpose behind the predictions because you know

09:00.700 --> 09:09.900
what the input of the AI was and you can also know what where the design behind the prediction

09:09.900 --> 09:10.900
model.

09:10.900 --> 09:17.060
So you can guess how the data was processed and how the algorithm was used and that way

09:17.060 --> 09:24.260
you know the purpose of the AI and also you can know how the model was evaluated.

09:24.260 --> 09:31.020
I mean what metric was used to evaluate the performance of the artificial intelligence.

09:31.020 --> 09:38.100
So you can know for example if the true positive rate was the same whether the person was

09:38.100 --> 09:46.420
a male or a female or black or white or whatever and if you can do this kind of test then

09:46.420 --> 09:50.940
you can see if the AI was there.

09:50.940 --> 09:58.820
So by seeing this source code and with transparency then you can guess the purpose.

09:58.820 --> 10:05.780
So this part is now about the demand for transparency you talk about in your presentations, right?

10:05.780 --> 10:13.620
Yeah, I think that the result connection between how transparent an algorithm is and

10:13.620 --> 10:22.180
how fair can be because much like when we talk about security and free software I think

10:22.180 --> 10:28.660
that we need transparency for algorithm to ensure that they are fair.

10:28.660 --> 10:34.100
If you cannot see the source code of AI and if it's not transparent then you cannot

10:34.100 --> 10:36.300
ensure that it will be fair.

10:36.300 --> 10:41.780
Much like you cannot really be sure about the security of the software if you cannot

10:41.780 --> 10:44.340
see the source code of it.

10:44.340 --> 10:49.580
Is it the case that if you have the source code of the AI that this would be sufficient

10:49.580 --> 10:55.420
to understand how it's actually working or do you also need a lot of the training data

10:55.420 --> 11:00.100
or other data where the AI learned from?

11:00.100 --> 11:06.140
I think that to answer your question to really understand the AI you will need three things

11:06.140 --> 11:07.140
basically.

11:07.140 --> 11:13.900
You will need the data that was used to train the AI or if the data is really sensitive

11:13.900 --> 11:20.060
they can you cannot access to it you can have the its characteristics.

11:20.060 --> 11:27.740
So what were the variables and what were their distribution how did they look like?

11:27.740 --> 11:31.860
Then you need to know how the AI was trained.

11:31.860 --> 11:35.860
So what was the source code used to train the AI?

11:35.860 --> 11:40.700
And then you need to be able to evaluate the AI.

11:40.700 --> 11:49.220
You need to have some kind of metric that tells if the AI was accurate and if the accuracy

11:49.220 --> 11:56.820
was the same regardless of some kind of attribute such as your age or gender or any kind

11:56.820 --> 11:59.860
of protected attribute.

11:59.860 --> 12:07.540
From my understanding one thing that AI is able to do is to very quickly adapt and learn

12:07.540 --> 12:10.060
way, way faster than humans are.

12:10.060 --> 12:15.940
So when we are now talking about source code is it correct that that means that in one

12:15.940 --> 12:22.660
time like one minute it is that source code and a few minutes later it's completely different

12:22.660 --> 12:28.580
and the AI might act on different rules or how should I mention that?

12:28.580 --> 12:36.140
So I don't really think that AI learns faster than humans do because I mean if you like

12:36.140 --> 12:43.460
show 10 pictures of cats to a two years old he or she will be able to you know recognize

12:43.460 --> 12:50.860
cats or any kind of animal but for AI you need to put through the algorithm like millions

12:50.860 --> 12:55.980
or billions of images for it to grow any kind of subject.

12:55.980 --> 13:01.180
So I don't think this is generally true that the algorithm is faster.

13:01.180 --> 13:05.940
It just appears to be because we have a lot of computational power so we can use a lot

13:05.940 --> 13:12.300
of computation to train algorithms for days and days and days in data centers.

13:12.300 --> 13:18.620
So for an AI to work you have to train it with the right data, with the right training

13:18.620 --> 13:25.220
code and evaluate its performance in a good way that measure how fair it is and after

13:25.260 --> 13:30.100
you have to monitor its accuracy through this series.

13:30.100 --> 13:38.780
You have to check if the furnace of the algorithm stays the same and if the AI furnace drops

13:38.780 --> 13:46.420
you have to like stop using it and you have to detect it and then retrain your AI with

13:46.420 --> 13:53.380
new data or with a new source code to make sure that the furnace is good.

13:53.380 --> 13:59.580
So yeah, this is the source code of the AI, this change and let's be checked.

13:59.580 --> 14:07.220
So that means that the AI itself would also have to be set up in a way that it's documenting

14:07.220 --> 14:11.540
itself in a way that humans understand that.

14:11.540 --> 14:14.340
Do I understand it correct, Vasa?

14:14.340 --> 14:20.820
Yes, so what you need is to make sure that the AI can give you some kind of metrics or

14:20.820 --> 14:29.540
furnace regularly, like each day you measure the furnace core so that you can have some

14:29.540 --> 14:37.220
kind of measure and you can detect the automatism in the furnace.

14:37.220 --> 14:42.340
I do have a basic question here because you have already mentioned the training data

14:42.340 --> 14:49.060
for an artificial intelligence, who actually trains an artificial intelligence, how should

14:49.060 --> 14:50.900
I imagine that the data looks like?

14:50.900 --> 14:58.140
For example, if you take Alexa, one of the examples you gave at the beginning for an AI,

14:58.140 --> 15:04.500
who trains Alexa, would this be Amazon or is it a person at home?

15:04.500 --> 15:10.100
So an AI is trained both with data and resource code.

15:10.100 --> 15:16.900
Basically, when you are using Alexa or any kind of like a voice recording device, you create

15:16.900 --> 15:20.980
data that is used to train Alexa again.

15:20.980 --> 15:27.660
So you participate in the training of the AI because your data is used, but the training

15:27.660 --> 15:34.180
code is done by Amazon, so because Alexa is proprietary, we can really only guess what

15:34.180 --> 15:41.600
is happening there, but I guess that its data scientist is a trained AI, often with

15:41.600 --> 15:42.600
research tools.

15:42.600 --> 15:54.120
AI is developed a lot with Open Source software and it's done inside companies by data scientist.

15:54.120 --> 15:59.120
I can imagine when you have lots of data and you have to train such an AI that also means

15:59.120 --> 16:05.560
that you need a lot of processing power from the AI you deal with, is it something that

16:05.560 --> 16:10.640
you can actually run on your computer or do people have to imagine that more like you

16:10.640 --> 16:17.240
need huge data centers to train an AI or how do you have to think about that?

16:17.240 --> 16:21.960
So it depends on what you want and also it depends on the AI itself.

16:21.960 --> 16:28.960
If you want to like reproduce the state of the art, I mean the paper that just was published

16:28.960 --> 16:34.840
last month and will produce all their results, well you can see if you don't have like giant

16:34.840 --> 16:42.040
data centers with entire teams that monitor like computers and stuff, so you need a lot

16:42.040 --> 16:46.560
of money and computing power to do that.

16:46.560 --> 16:52.160
Because you need to train your AI for a lot of time and with a lot of data, by a lot

16:52.160 --> 16:57.920
of data I mean like gigabytes or terabytes of data.

16:57.920 --> 17:04.680
But thankfully you can still with your home computer, I mean with your laptop, you can

17:04.680 --> 17:11.160
still get good results if you have like more modest intent.

17:11.160 --> 17:18.640
Because of improvements in the hardware, like with GPUs, I mean graphical processing

17:18.640 --> 17:25.560
units, getting cheaper and cheaper, you can have powerful machines at home and you can

17:25.560 --> 17:28.480
use them to train some AI.

17:28.480 --> 17:31.360
And that's also possible because of free software.

17:31.360 --> 17:37.560
Because free software is available to you, you can use it yourself and so you can you

17:37.560 --> 17:43.680
can train it on your personal computer and it will work, it will work also because you

17:43.680 --> 17:47.840
can leverage already trained model.

17:47.840 --> 17:55.600
What you can do is to take the already existing models and incorporate them inside your

17:55.600 --> 17:56.600
world.

17:56.600 --> 18:04.760
You can take the state of the art model and just train some part of it to report for your

18:04.760 --> 18:05.760
needs.

18:05.760 --> 18:14.000
So I think that it's a very powerful technique and that makes you able to use AI with your

18:14.000 --> 18:19.840
simple, I mean basic computers and still have amazing results.

18:19.840 --> 18:24.640
So you kind of use pre-trained AI and continue with this.

18:24.640 --> 18:33.360
Yes, you can use retrained AI for a lot of command tasks such as image classification

18:33.360 --> 18:42.000
or like for example an LP model, like natural language processing models that have gathered

18:42.000 --> 18:49.040
a lot of knowledge about language and you can take these giant big models and you can

18:49.040 --> 18:53.720
use them as part of your own model.

18:53.720 --> 18:59.880
For example, there is a very large AI competition that is called ImageNet.

18:59.880 --> 19:09.080
In this competition, you have to classify I think 10,000 different categories of dogs

19:09.080 --> 19:12.920
or animals or objects or things.

19:12.920 --> 19:17.080
So you have 10,000 different things to classify.

19:17.080 --> 19:22.960
And this is a competition done by researchers or scientists.

19:22.960 --> 19:30.060
And so the winner, so the model that is the most accurate at doing that is often released

19:30.060 --> 19:32.560
publicly as free software.

19:32.560 --> 19:38.680
So what you can do if like let's say that you want to classify between two different

19:38.680 --> 19:41.960
things like cats, the three dogs.

19:41.960 --> 19:45.800
For example, you have images of cats and images of dogs.

19:45.800 --> 19:52.520
What you can do instead of starting from scratch is to take these big models and repose

19:52.520 --> 19:54.640
it for your needs.

19:54.640 --> 20:03.360
So you can train only part of it and reduce the 10,000 classification levels to only cats

20:03.360 --> 20:04.960
and dogs.

20:04.960 --> 20:11.320
And that will be much more fast and efficient than starting from scratch.

20:11.320 --> 20:15.040
So I think we are now already partly in the accessibility part.

20:15.040 --> 20:19.200
So I mean, we talked about the fairness, we talked about transparency.

20:19.200 --> 20:24.880
Now with the accessibility, I mentioned, I mean, one part is that the tools are free software.

20:24.880 --> 20:30.000
So you can use them for any purpose that you can understand how they work that you can

20:30.000 --> 20:33.720
share them with others and that you can make modifications.

20:33.720 --> 20:38.400
Is there anything else which is necessary for AI's that they are accessible?

20:39.040 --> 20:39.840
Yes.

20:39.840 --> 20:48.280
So what you need is a powerful hardware, but thankfully, as I said, powerful hardware is getting

20:48.280 --> 20:50.160
cheaper every day.

20:50.160 --> 20:57.480
So you can have accessible hardware that you can use to like train your own artificial

20:57.480 --> 20:59.760
intelligence.

20:59.760 --> 21:07.960
But unfortunately, the drivers for this graphical processing unit comes are proprietary.

21:07.960 --> 21:14.040
Like that, I mean that the software that is used to make your card communicate to your

21:14.040 --> 21:22.880
computer is proprietary, that prevents AI from being fully accessible, unfortunately.

21:22.880 --> 21:31.440
So it makes AI training with software much more complicated that it should be.

21:31.440 --> 21:35.520
So maybe we're sad to summarize it a bit to this point.

21:35.520 --> 21:41.600
So for fairness, what do we need that you haven't fair AI?

21:41.600 --> 21:46.640
So you need to be able to measure the fairness of the AI.

21:46.640 --> 21:52.320
You need to evaluate how fair it is with some kind of score.

21:52.320 --> 21:58.840
And then you need to be able to monitor this score to make sure that it stays the same.

21:58.840 --> 22:05.280
And then you need to make sure that this score has been well established because I mean,

22:05.280 --> 22:08.360
there are multiple definitions of fairness.

22:08.360 --> 22:11.400
And so you can leverage it in different ways.

22:11.400 --> 22:17.720
So you have to agree with all stakeholders to make sure that your fairness definition

22:17.720 --> 22:21.680
is good considering your problem attend.

22:21.680 --> 22:28.880
And then you need to, as I said, monitor the fairness of the software.

22:28.880 --> 22:33.840
Could you also summarize transparency and accessibility for us?

22:33.840 --> 22:34.840
Yes.

22:35.000 --> 22:43.600
Transparency of AI means adding access to the data that was used to train the algorithm.

22:43.600 --> 22:49.760
Or at least be able to know the characteristics of the input data.

22:49.760 --> 22:54.320
Then you need to have access to this whole score of the AI.

22:54.320 --> 23:01.600
And then you need to define a metric that is used to tell if the model is accurate.

23:01.600 --> 23:08.360
And also if it's accurate for every values of a protected attribute.

23:08.360 --> 23:15.080
And then you need to make sure that everything is released as a free software.

23:15.080 --> 23:22.720
And also what is great with regard to transparency is that recently,

23:22.720 --> 23:28.960
with the Free Software Foundation Europe, what we want to do is to have open science.

23:28.960 --> 23:35.320
So open science means to have science accessible to all and to consider software

23:35.320 --> 23:38.480
as a result of the research.

23:38.480 --> 23:44.440
As a citizen, you should be able to have access to the data that was used to the research

23:44.440 --> 23:46.600
and also to its source code.

23:46.600 --> 23:51.560
And all of that was used to create an AI.

23:51.560 --> 23:59.800
And so with these two things, you are able to have access to the artificial intelligence

23:59.800 --> 24:02.040
and to make it transparent.

24:02.040 --> 24:11.720
So to summarize the accessibility point, what you need is to be able to train the AI yourself.

24:11.720 --> 24:15.320
We need to have free software to train AI.

24:15.320 --> 24:22.040
So we need to have full frameworks and methods to train artificial intelligence.

24:22.040 --> 24:28.400
We need also to have cheap and reliable hardware to train artificial intelligence.

24:28.400 --> 24:36.000
And you need to have free drivers to be able to control these GPUs.

24:36.000 --> 24:39.560
Is there any AI out there which implements those three criteria?

24:39.560 --> 24:43.800
So do we have any positive examples there?

24:43.840 --> 24:51.880
So yeah, unfortunately, I don't know any kind of AI that is like for accessible

24:51.880 --> 24:54.280
and transparent at the same time.

24:54.280 --> 25:01.480
And I think it's really bad and we can do much better with regard to these three things.

25:01.480 --> 25:04.760
So yeah, no AI is perfect yet.

25:04.760 --> 25:09.640
Do you know of any upcoming legislations in Europe that are planning on implementing

25:09.640 --> 25:12.640
those three criteria for an AI?

25:12.640 --> 25:14.240
No, unfortunately not.

25:14.240 --> 25:21.600
I'm not aware of any kind of legislation that is ongoing, but fortunately it's a result

25:21.600 --> 25:26.480
because the European Commission released a white paper in February.

25:26.480 --> 25:32.640
Its title is on artificial intelligence, European approach to excellence and trust,

25:32.640 --> 25:35.760
which talks about AI transparency.

25:35.760 --> 25:40.320
And it demands that the data about the data used to train models

25:40.320 --> 25:45.760
and how their accuracy is measured is provided to everyone.

25:45.760 --> 25:51.840
So this is not a legislation, but I think it's an 8.20 right direction.

25:51.840 --> 25:52.880
So there's hope.

25:52.880 --> 25:56.800
Vasa, to wrap it up, what are the biggest challenges you see for free software

25:56.800 --> 26:02.080
in the field of artificial intelligence at the moment?

26:02.080 --> 26:06.240
So I think that artificial intelligence is really powerful.

26:06.240 --> 26:09.120
I mean, we have met a lot of progress.

26:09.120 --> 26:13.760
And it's like in some regard, AI is much better than humans.

26:13.760 --> 26:19.200
Like it can run for hours without any kind of concentration issues.

26:19.200 --> 26:24.160
I mean, it never gets bored and it has a consistent behavior.

26:24.160 --> 26:28.640
And you know, it can remember a lot of information.

26:28.640 --> 26:34.400
So I think that for these points, AI has a lot of advantages

26:34.400 --> 26:37.120
over their models.

26:37.120 --> 26:41.360
But I think that, yeah, AI can be leveraged to improve society.

26:41.360 --> 26:46.000
But I'm afraid of AI for a couple of reasons.

26:46.000 --> 26:50.000
I think that the first one would be aggressive behavior.

26:50.000 --> 26:54.800
So for example, AI systems are employed to filter out, you know,

26:54.800 --> 26:59.440
helpful content or to detect copyright infringement.

26:59.440 --> 27:02.560
And it's done in a non-tomated way.

27:02.560 --> 27:05.280
And with limited human oversight.

27:05.280 --> 27:08.960
And more specifically, for example, YouTube use AI

27:08.960 --> 27:12.400
to detect unauthorized use of copyright materials.

27:12.400 --> 27:14.320
But sometimes it gets things wrong.

27:14.320 --> 27:18.480
And it doesn't understand things like priorities or means

27:18.480 --> 27:21.360
or more generally they're used.

27:21.360 --> 27:27.040
I think that being able to test AI and measure its furnace

27:27.040 --> 27:31.120
and be able to detect when it gets things wrong

27:31.200 --> 27:34.000
is one big challenge for a furnace.

27:34.000 --> 27:37.200
One point I'm also thinking a little bit about is

27:37.200 --> 27:40.960
when people or companies say, well, we don't know

27:40.960 --> 27:44.880
why this was the result of our software.

27:44.880 --> 27:47.680
It's so complex, we cannot understand it anymore.

27:47.680 --> 27:51.600
So we're sorry about that, but it was the AI.

27:51.600 --> 27:55.520
So when people say something like that, do you think that's true?

27:55.520 --> 27:58.640
Or do you think that this is something they

27:58.640 --> 28:00.000
rather use as an apology?

28:02.720 --> 28:06.000
So I think that the decade I grew it was true

28:06.000 --> 28:10.560
because we weren't able to really understand the AI.

28:10.560 --> 28:14.720
I mean AI can sometimes give a lot of good predictions.

28:14.720 --> 28:16.640
But we are not able to interpret it.

28:16.640 --> 28:20.000
Because the neural networks and the technologies

28:20.000 --> 28:23.120
used to make predictions are so complex

28:23.120 --> 28:27.600
that we are not able to interpret the results.

28:27.680 --> 28:33.840
In a way that we aren't able to connect the input to the output.

28:33.840 --> 28:38.400
I mean, how we are able to know what in the input

28:38.400 --> 28:40.240
led to the prediction.

28:41.040 --> 28:43.360
But I think that we are getting better at this.

28:43.360 --> 28:49.440
And we are researching ways to interpret the results of the AI.

28:49.440 --> 28:55.280
So if companies or people want to not to take responsibility for that,

28:55.280 --> 28:58.880
it's probably rather that maybe they don't know

28:58.880 --> 29:03.600
at the moment why certain decisions are happening like that.

29:03.600 --> 29:06.960
But they also maybe don't want to know at the moment.

29:06.960 --> 29:08.480
Because if they would like to know,

29:08.480 --> 29:10.320
they would have the means to find out

29:10.320 --> 29:12.640
why certain decisions are made by the AI.

29:15.600 --> 29:20.000
Yes, yes, but I think that it boils down to too many.

29:20.640 --> 29:24.560
I think that being able to produce a system

29:24.560 --> 29:28.240
that is interpretable costs a lot of money.

29:28.240 --> 29:29.760
And it takes a lot of time.

29:29.760 --> 29:35.120
And so you need to be able to spend money

29:35.120 --> 29:38.240
to create powerful AI that are well designed,

29:39.120 --> 29:42.720
that are transparent, that are fair, accessible,

29:42.720 --> 29:45.360
and that you are able to interpret.

29:46.160 --> 29:51.840
So I think that one issue with this is time and money.

29:52.560 --> 29:55.280
If you now think about what we talked

29:55.280 --> 29:59.200
and maybe also about how AI without free software

29:59.200 --> 30:01.440
could shape and control our future,

30:01.440 --> 30:05.040
are you then afraid of the increasing usage of artificial intelligence

30:05.040 --> 30:06.160
in our society?

30:07.360 --> 30:11.360
I think that with this issue with our full AI

30:11.360 --> 30:16.240
that are perpetually and that don't have any kind of human oversight.

30:16.960 --> 30:21.360
So with the danger, because as I gave examples earlier,

30:21.440 --> 30:26.320
artificial intelligence has a lot of consequences in our world.

30:26.320 --> 30:31.520
And sometimes it's good, but sometimes it's leads to mistakes

30:31.520 --> 30:34.480
or things that we don't want to see.

30:35.280 --> 30:39.200
And I think that it's a bit scary, to be honest,

30:39.200 --> 30:43.760
to have these systems that we aren't able to access

30:43.760 --> 30:48.240
and we aren't able to inspect because they are appropriate.

30:49.200 --> 30:52.160
And also I'm a bit scared about AI

30:52.160 --> 30:54.480
because of its impact on the environment.

30:55.200 --> 30:59.200
Because a lot of jobs will be replaced with AI at some point.

30:59.760 --> 31:02.640
And I hope that we will find a way to not put people

31:02.640 --> 31:05.040
whose jobs might become irrelevant

31:05.040 --> 31:06.560
in an embarrassing situation.

31:08.400 --> 31:11.200
And how about an AI that would be free software?

31:11.200 --> 31:12.320
Would you then be afraid?

31:14.240 --> 31:17.680
A bit less, because with free software,

31:18.240 --> 31:21.600
we are able to inspect how the AI works.

31:22.240 --> 31:25.760
And so we are able to take a lot of issues.

31:25.760 --> 31:28.240
We are with a proprietary AI.

31:28.960 --> 31:32.640
And with this, we can visual how accurate it is,

31:32.640 --> 31:34.080
how fair it is.

31:34.080 --> 31:36.960
And I think that it should be mandatory

31:36.960 --> 31:41.680
and it's a much less scary to have AI that are open and accessible.

31:41.680 --> 31:44.880
So Vesa, unfortunately, we are coming to the end.

31:44.880 --> 31:49.520
So I think this topic is a big challenge for human freedoms.

31:50.000 --> 31:53.920
And I'm not sure yet how exactly AI should look in future.

31:54.720 --> 31:57.120
I think on the way there, we will learn a lot

31:57.120 --> 32:00.080
and also make some good and some bad experience.

32:00.080 --> 32:03.120
But in general, the idea you're promoting

32:03.120 --> 32:05.440
that supporting people building AI

32:05.440 --> 32:08.000
that is accessible, transparent and fair

32:08.000 --> 32:10.400
seems like a good first step for humankind.

32:11.360 --> 32:14.880
Even if that process might then sometimes be slower

32:14.880 --> 32:17.040
if you don't apply those criteria.

32:17.760 --> 32:21.600
So thank you already very much for talking with us about AI.

32:22.320 --> 32:26.000
In our podcast, we always, at the end, have one question.

32:26.720 --> 32:29.760
And I would also like to ask that to you.

32:29.760 --> 32:34.160
So as our regular visitors know, on the 14th of February,

32:34.160 --> 32:36.720
we always celebrate the I love free software day

32:37.440 --> 32:41.760
so that not just the flower industry benefits from this day.

32:41.760 --> 32:45.120
And we use this day to thank free software developers

32:45.120 --> 32:47.440
and communities out there for the effort and work

32:47.440 --> 32:49.840
to making our society a better place to live.

32:50.480 --> 32:52.720
But of course, the 14th of September

32:52.720 --> 32:55.360
shouldn't be the only day where you thank people

32:55.360 --> 32:57.200
for their work for free software.

32:57.200 --> 32:58.400
I wanted to ask you the question,

32:58.400 --> 32:59.920
is there any software out there

32:59.920 --> 33:04.240
or any developer out there whom you would like to thank or to mention?

33:07.680 --> 33:10.320
Yes, so I'd like to mention a few software.

33:10.960 --> 33:13.680
So I want to thank Perras,

33:13.680 --> 33:16.800
the Artificial Intelligence Framework.

33:16.800 --> 33:19.920
So it's a software that is used to build

33:19.920 --> 33:21.680
Artificial Intelligence very easily.

33:22.240 --> 33:26.880
And I'm also very grateful for the by-dodge

33:26.880 --> 33:27.760
developerism.

33:27.760 --> 33:30.320
I think it's a project for me from Facebook.

33:30.320 --> 33:34.800
And also to the TensorFlow software done by Google.

33:34.880 --> 33:36.960
And I'm deeply thankful for this

33:36.960 --> 33:40.160
because I'm based in my PhD project on those software.

33:40.800 --> 33:42.720
And so far, it's been working great.

33:43.520 --> 33:47.920
And I'm also really thankful for the modular community

33:47.920 --> 33:50.720
for developing the Firefox web browser

33:50.720 --> 33:52.960
because it's a web browser that I already like.

33:53.520 --> 33:55.280
Because it's free software.

33:55.280 --> 33:57.520
It respects your privacy.

33:57.520 --> 33:58.720
It's powerful.

33:58.720 --> 33:59.840
It's fast.

33:59.840 --> 34:00.240
So yeah.

34:00.240 --> 34:00.880
Thank you, Vasa.

34:03.440 --> 34:03.920
You're welcome.

34:04.960 --> 34:06.080
Thank you, Vasa.

34:06.080 --> 34:08.640
We're talking with us about Artificial Intelligence

34:08.640 --> 34:09.440
and Free Software.

34:10.320 --> 34:12.480
This was the software Freedom Podcast.

34:12.480 --> 34:15.840
If you liked this episode, please recommend it to your friends

34:15.840 --> 34:16.560
and rate it.

34:17.200 --> 34:20.320
Also subscribe to make sure you will get the next episode.

34:20.960 --> 34:24.400
This podcast is presented to you by the Free Software Foundation Europe,

34:24.400 --> 34:27.440
where a charity that works on promoting software freedom.

34:27.440 --> 34:30.880
If you like our work, please consider supporting us with a donation.

34:30.880 --> 34:33.440
You'll find more information under fsfe.org,

34:33.520 --> 34:34.560
/donate.

34:34.560 --> 34:35.600
Thank you very much.

34:35.600 --> 34:36.800
Thank you very much, Vasa.

Back to the episode SFP#7