#3807 GeoIP API for Anaconda
Closed: Fixed None Opened 10 years ago by m4rtink.

Anaconda[1] (the Fedora installer) recently got support for GeoIP. It is currently used to preset the language and timezone and works with the Fedora !MirrorManager mirror list API. Unfortunately, even though this API returns a territory code, it's purpose is to return the fastest mirror for the user, taking in account nearby mirror bandwidth, etc. It also returns a lot of data (mirror URLs), while Anaconda only needs the territory code.

So we would like to propose a dedicated GeoIP API hosted on Fedora infrastructure. How it could look like ?
Anaconda calls the API URL
the server handles the call and returns the most probable territory code, based on the public IP of the caller
the response is returned in some standard way, eq. just a territory code string, JSON, XML, etc.
optionally, also returning accurate timezone would be nice (especially for countries covering many timezones, such as Canada)

FAQ
* Why not use an existing external service ?
Because the GeoIP data for users is potentially sensitive and therefore we thing it should not be handled outside of Fedora infrastructure. GeoIP is enabled by default in Anaconda, so we would be basically telling a third party the public IP address of everyone installing Fedora.

[1] http://fedoraproject.org/wiki/Anaconda


https://github.com/mdomsch/geoip-city-wsgi
has a stab at such a web service. It listens as a WSGI app, grabs the client IP address (borrowing code from MirrorManager for same), does a lookup in the MaxMind GeoIP City Lite (free) database, and returns that data as JSON. About as simple as I can make it. FI already has a monthly cronjob to download and distribute new GeoIP databases to the app servers. This should be as simple as running it on app servers that get the same GeoIP databases, and hooking up haproxy in front of it.

The bugs![1]![2] caused by using the !MirrorManager are slowly piling up. Would it be possible to get the new GeoIP API online before the next infra freeze ? It would really help to have the API working when Fedora 19 is released (or else we can expect another batch of "Anaconda things I'm in on another continent" bugs). Thanks in advance !

![1] https://bugzilla.redhat.com/show_bug.cgi?id=957809

![2] https://bugzilla.redhat.com/show_bug.cgi?id=960763

Is there anything that needs to be done? I'm willing to help with any packaging work, code development or anything like that. If the only missing step is the deployment of the service, I have no idea what could be done to help with it.

I was going to look at this next week... but perhaps we should get together and discuss a bit (or discuss here).

If we just use Matt's wsgi it's just a matter of deployment I think.

There was some discussion around splitting this out into a "what is my IP" service and a GeoIP service, do we want to look at that or not bother?

The first step would be getting it running in our stg env, and have you test it out and see if it's working as you expect from there.

Replying to [comment:4 kevin]:

I was going to look at this next week... but perhaps we should get together and discuss a bit (or discuss here).
Thanks!

If we just use Matt's wsgi it's just a matter of deployment I think.
Yeah, that one looks perfectly fine for our purposes.

There was some discussion around splitting this out into a "what is my IP" service and a GeoIP service, do we want to look at that or not bother?
We really prefer a server based GeoIP API and can't really do offline GeoIP (Anaconda runs from ramdisk and the database is too big for that, could be easily outdated, etc.). And we need to call an API anyway, so why not just return the data we need at once (from a server based database that can be big & easily updated if need be).

tl;dr: We just only need a GeoIP API.

The first step would be getting it running in our stg env, and have you test it out and see if it's working as you expect from there.
Sure, just tell us the port & IP.:)

Something like the attached should make its way into staging. Need to figure out what URL we want to host this at. I randomly chose geoip-city.fp.o but am far from wedded to it. Just keep it out of the way of an existing service please...

ok, thanks to matt's excellent work we have now a staging instance for you to test against. ;)

The url is:

https://geoip.stg.fedoraproject.org/city/

(the ssl cert will be invalid, sorry if that causes problems).

If you could look at this and see if it provides what you need that would be great. If all looks well, we can move it to production next week.

https://geoip.stg.fedoraproject.org/city (no trailing slash please). It also currently accepts one override: ?ip=(something), e.g. city?ip=18.0.0.1 which then uses the passed IP address instead of the determined IP address of the client.

This does not work for IPv6, and in staging, you will only see IPv4 addresses for this URL. When we move to production, we need to figure out how to be sure we only return IPv4 addresses for this name, as the current wildcard setup will return both v4 and v6, but the WSGI script can't properly look up v6 addresses, and will return HTTP 404 if presented one.

Replying to [comment:7 kevin]:

The url is:

https://geoip.stg.fedoraproject.org/city/

(the ssl cert will be invalid, sorry if that causes problems).

If you could look at this and see if it provides what you need that would be great. If all looks well, we can move it to production next week.
Wow, that looks ultra-nice, it even directly returns the timezone ! I'll make an Anaconda updates image using this API (so that it can be easily tested) and let you know about the results. :)

Replying to [comment:8 mdomsch]:
First - thanks a lot ! :)

https://geoip.stg.fedoraproject.org/city (no trailing slash please). It also currently accepts one override: ?ip=(something), e.g. city?ip=18.0.0.1 which then uses the passed IP address instead of the determined IP address of the client.
That's a nice touch - really makes testing & debugging much easier.

Be aware this is simply a JSON-formatted dict in utf8. The order of the items in the dict vary from request to request based on how the dict was converted to json, so do not manually parse the dict and expect a guaranteed order of dict elements.

Replying to [comment:10 mdomsch]:

Be aware this is simply a JSON-formatted dict in utf8. The order of the items in the dict vary from request to request based on how the dict was converted to json, so do not manually parse the dict and expect a guaranteed order of dict elements.
Yep, I've noticed that - I'm currently checking the output for various public IPs from random parts of the world. So far the output is very good !

My plan is to just pipe the response through the built-in json Python module, and check if the resulting dictionary contains a valid "time_zone" field. If it does, use that, if it doesn't, fall back to using just the country code to get a matching timezone (current behaviour).

Also if "time_zone" is empty but "city" is not, it might be possible to match the city to some timezone - but I haven't seen such a response yet.

BTW, just found an IP address that returns "Internal server error":

https://geoip.stg.fedoraproject.org/city/?ip=198.72.101.119

(it's the IP address of the ottawa.ca website)

from app01.stg (I hate Unicode!)

{{{
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] Traceback (most recent call last):
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] File "/usr/share/geoip-city-wsgi/geoip-city.wsgi", line 61, in applicatio
n
[Mon Jun 03 14:08:42 2013] [error] 2013-06-03 14:08:42,225 urllib3.connectionpool INFO Starting new HTTPS connection (1): admin.fed
oraproject.org
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] results = json.dumps(results)
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] File "/usr/lib/python2.6/json/init.py", line 230, in dumps
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] return _default_encoder.encode(obj)
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] File "/usr/lib/python2.6/json/encoder.py", line 367, in encode
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] chunks = list(self.iterencode(o))
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] File "/usr/lib/python2.6/json/encoder.py", line 309, in _iterencode
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] for chunk in self._iterencode_dict(o, markers):
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] File "/usr/lib/python2.6/json/encoder.py", line 275, in _iterencode_dict
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] for chunk in self._iterencode(value, markers):
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] File "/usr/lib/python2.6/json/encoder.py", line 294, in _iterencode
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] yield encoder(o)
[Mon Jun 03 14:08:42 2013] [error] [client 10.5.126.88] UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 5: inva
lid continuation byte
}}}

Fixed in staging now. Needed an explicit call to GeoIP.set_encoding().

According to the feedback from out testers, the API seems to work very well. Therefore, I would say - let's go ahead and move it to the production version. :)

BTW, would it be possible to leave the staging API running for say a week or two ? There might still be some testers out there using it.

ok, we should be live and working in production now:

https://geoip.fedoraproject.org/city

We will keep the staging one around essentially forever. ;) We keep a stg env setup and running so we can test updates against it and such. So, it should be around, but subject sometimes to reboots/outages or breakage due to other things being updated, so people wanting reliable service should move to the production url.

Please let us know if you need anything further. thanks!

Big THANKS to all of you, guys, for making this happen and in such a short time!

Replying to [comment:17 vpodzime]:

Big THANKS to all of you, guys, for making this happen and in such a short time!
+1

Thanks a lot to everyone involved ! :)

No problem. I'm glad we could get things going for you. ;)

Login to comment on this ticket.

Metadata