#3485 Request for Resources: datanommer
Closed: Fixed None Opened 9 years ago by ralph.

The datanommer RPM is ready for staging our infrastructure. https://admin.fedoraproject.org/updates/datanommer
It is a fedmsg consumer which means it will get executed (among others) by the fedmsg-hub service. It needs a database to talk to (postgres) and has a "datanommer-create-db" script to set up its own tables.

= hosts? =

Where should this be run? It could go on busgateway01 for the time being. Another consumer is running there already (which does some work for the busmon webapp).

At some point in the future I'd like to break these message processing consumers off onto their own isolated node (just for organization's sake). We could call it, say, busfarm01 or buscrunch01 or busproc01 or ... More consumers will be coming. For instance, herlo is working on one that listens for messages from fas and makes updates to the fama ambassadors trac. rossdylan's badge processors need a place to live someday as well.

For where should this run it seems akin to what we use bapp for. We might want to look at setting up multiple bapp servers, or whether it still makes sense for bapp* servers to be app servers plus extra scripts, though.

lmacken mentioned in IRC that we might want to separate the DB for this from our other DBs. It could be collecting data at a higher rate than our other apps.

toshio mentioned in IRC that the concern is more about reading the data. Data for each "topic" is stored in its own DB table. Doing a select on a table would mean scanning over every message ever emitted on that topic, which could be costly and could slow down our other DBs.

So, is datanommer just the collection/logging part? Or it's also the reporting/query part?

My inclination is to start with just using busgateway01 and our regular db host, and then split out as things grow to the point where they are causing issues or needing more resources. I would think it should be pretty easy to dump db and move it to another host or move the service to it's own instance as they grow.

I agree this is kinda like bapp, but for fedmsg stuff instead of application stuff.


datanommer is just the collection/logging part. For admin purposes it also includes two 'datanommer-dump' and 'datanommer-stats' scripts to see what its doing.

I'm fine with setting it up on busgateway01+regular-db-host for now, too. Its good to have started the discussion about scaling, though.

Can someone set up the database and add credentials to the private repo?

I'd do it myself, but I don't think I have the privileges.

After that we'll need to:

  • Try out the RPMs in staging
  • Initialize the tables
  • Restart fedmsg-hub on busgateway01 to see if it picks up the datanommer consumer and behaves rationally.
  • Emit a few messages with fedmsg-logger and then check to see if they landed in the DB with datanommer-dump.
  • Let it sit for a few days, then rinse/repeat in production.

I'd really like to get this in place before the freeze begins on Tuesday if possible. It will be nice to have been able to collect stats over the period from beta to F18-release.

I've setup the db in stg.

Hopefully all will go well and we can setup prod soon.

Staging looks good. :) If you'd like to test it, you can log in to busgateway01.stg and run "sudo datanommer-stats" and/or "sudo datanommer-dump".

Kevin, if you or someone else can setup the db in the production environment, I can copy the puppet config over and make sure everything's working.

Login to comment on this ticket.