#1411 F21 privacy issue, Geolocation done for every install
Closed None Opened 5 years ago by hadess.

Anaconda, apparently for the past 2 years, has been phoning home to get a location without any warnings to the user.

https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/ui/gui/spokes/welcome.py#L76

Notice how in both Android and iOS, you either get asked about Geolocation before it's used (iOS), or it's not used at all as part of the installation/first use.

http://www.lukew.com/ff/entry.asp?1937

Furthermore, Anaconda, when run on a live CD, doesn't make use of the system-wide Geolocation API (Geoclue), which would have allowed the user to authorise the request for a location.

Anaconda should:
- disable Geolocation in future releases
- advertise the privacy leak, and how to disable it
- re-enable Geolocation only when it: 1) asks users about it 2) can link to a privacy policy when doing so 3) uses an existing system-wide service to request it

Note that I am aware of NetworkManager doing connectivity checking by default, the purpose is different, and the data gathered and computed is different. Nevertheless, both should be covered by Fedora's privacy policy, and it's not clear that it is right now.


So, IMHO:

This isn't really much of a privacy leak, but I can see how it might appear so or trouble some folks.

The geoloc service hits fedoraproject.org servers and returns data from the free geoip database via a wsgi. The only record of the access is a apache log with an ip address and that they accessed the wsgi.

I agree we should have a privacy policy to cover this.

On using one framework, have you approached the anaconda folks and offered to work with them on this?

Fedora users are rare enough that a DNS request for a fedoraproject.org domain could be enough to out the person.

Replying to [comment:3 rishi]:

Fedora users are rare enough that a DNS request for a fedoraproject.org domain could be enough to out the person.

But without massively retooling, we need to do that to deliver updates -- this feature having nothing to do with that.

Replying to [comment:4 mattdm]:

Replying to [comment:3 rishi]:

Fedora users are rare enough that a DNS request for a fedoraproject.org domain could be enough to out the person.

But without massively retooling, we need to do that to deliver updates -- this feature having nothing to do with that.

It is a user choice whether to update their system or not. Even though we strongly encourage it and do it by default, users can opt out of it. And it can be essential to avoid any 'spontaneous' network access.

I spoke to the Anaconda guys at DevConf. To disable geolocation one needs to pass inst.geoloc=0 as a boot option:
https://github.com/rhinstaller/anaconda/blob/master/docs/boot-options.txt

This ticket will be discussed in the FESCo meeting on Wednesday at 18:00UTC in #fedora-meeting on irc.freenode.net.

Bastien, why is this a significant privacy issue? It's not like anaconda is some untrusted application. We certainly don't want to display a privacy policy just so it can come up with a better timezone suggestion?

The comparison with iOS and Android is misleading; their vendors perform IP-based geo-location without prompting all the time. Prompts only occur if something wants to use the really precise location information, something that Fedora does not do.

From today’s FESCo meeting: Cc: spot for fedora-legal to see whether there is any issue or whether we need to update privacy policy (+5)

Replying to [comment:11 fweimer]:

The comparison with iOS and Android is misleading; their vendors perform IP-based geo-location without prompting all the time. Prompts only occur if something wants to use the really precise location information, something that Fedora does not do.

Given the flows shown, I don't see how it could do IP-based geolocation when it hasn't setup networking first. In any case, notice how they ask for language/region first which would preclude them doing geolocation as they don't even have Internet up yet.

Replying to [comment:10 catanzaro]:

Bastien, why is this a significant privacy issue? It's not like anaconda is some untrusted application.

Data about the user is transmitted to Fedora's servers. There's no explanations as to how the data will be used, kept, etc.

We certainly don't want to display a privacy policy just so it can come up with a better timezone suggestion?

Why on earth do you need the timezone this early in any case? A privacy policy when user's data is transmitted is an option, there needs to be one.

Replying to [comment:15 hadess]:

Replying to [comment:10 catanzaro]:

Bastien, why is this a significant privacy issue? It's not like anaconda is some untrusted application.

Data about the user is transmitted to Fedora's servers. There's no explanations as to how the data will be used, kept, etc.

All that happens is https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/geoloc.py#L526 . There is no “data about the user” being transmitted within the request (but the server does know the sending IP address). In my non-laywer evaluation I don’t think an IP address really is a “personally identifiable information” (in the EU sense) but I am really not following the legal development in this area, let alone the legal development all over the world’s jurisdictions.

Why on earth do you need the timezone this early in any case? A privacy policy when user's data is transmitted is an option, there needs to be one.

We need the timezone to allow the user to set the clock correctly, and we need the clock set correctly so that files written during installation do not have a newer timestamp than files written after the installation finishes.

Okay. Here's what I see:

Fedora runs a GeoIP service here: https://geoip.fedoraproject.org This service takes in an IP address from apache and returns the nearest city. It's not the most accurate thing ever, but it is good enough. The only record of a query is apache logs that IP XYZ queried the service. This data is always sent over HTTPS.

Anaconda needs a country and timezone to be set as part of installation. If, during installation, the network is brought up and an IP assigned, it will send that IP address to the Fedora GeoIP service and parse the JSON response to set the timezone and country code. This data is used to configure the local system clock, and set localization functionality appropriately. The user can still override that data if it is not correct, it is merely setting the default choices intelligently based on IP. If there is no networking, this functionality is skipped. If the user overrides it by passing "inst.geoloc=0", then the functionality is skipped. The geographical data collected is not sent back to Fedora systems, it is only set on the local system being installed.

Based on the fact that Fedora is not keeping any record of this geographical data (nor any mapping of installed system, geodata, and IP address), I do not believe any changes to our privacy policy are merited for this issue.

If, for example, we were using this data to set fields in the Fedora Account system, or were storing this IP/geodata to track Fedora installations, that would definitely require privacy policy updates and a requirement that the user opt-in to providing us with that data.

Replying to [comment:17 spot]:

Based on the fact that Fedora is not keeping any record of this geographical data (nor any mapping of installed system, geodata, and IP address), I do not believe any changes to our privacy policy are merited for this issue.

Where's the privacy policy? I don't see any that's relevant to what the installer does.

A privacy policy should include:
- what data is sent out (in this case, the IP address is sent and kept in the logs at least)
- how it's used
- how long it's kept for

If, for example, we were using this data to set fields in the Fedora Account system, or were storing this IP/geodata to track Fedora installations, that would definitely require privacy policy updates and a requirement that the user opt-in to providing us with that data.

You don't need to be storing the results for it to be a privacy policy problem. From the moment that you transform the data provided by the user's system, you're a data provider, and you're storing them (even if they seem innocuous to you). Furthermore, the code in question in Anaconda makes me think that 1) Fedora might not always be the only provider (there's unused HostIP and Google providers) 2) That Wi-Fi access point data might be passed to that server when it supports it (not even ignoring the "_nomap" access points...)

I'll add that we're missing privacy policies, at the very least, for our connectivity checker, for that geolocation code in Anaconda, for the retrace servers...

From today's FESCo meeting: Make sure that the kernel boot option is listed on a wiki page somewhere and otherwise do nothing. (+1:9, -1:0, 0:0)

Not even "fill in the privacy policy as a matter of urgency" ?

This is ridiculous.

inst.geoloc=provider_hostip will also work

Login to comment on this ticket.

Metadata