#24 File F21 change: Big Data Image
Closed None Opened 10 years ago by mattdm.

'''Summary:''' Fedora Cloud agreed to make a base image plus several tailored to specific purposes. This is one of the tailored ones, produced in collaboration with the Big Data SIG.

'''Importance:''' nice to have (if we fail to do it, we are not any worse-off than we are now, and big data tools can still be used on the base iamge.)

'''Timeframe:''' F21 alpha / If it's not ready for alpha, we have missed this release

'''Scope:''' self-contained (in collaboration with Big Data SIG)

'''Cloud SIG owner:''' TBD

https://fedoraproject.org/wiki/Changes/Big_Data_Cloud_Image


Some work has been done on this:

https://git.fedorahosted.org/cgit/dockerfiles.git/tree/needs_work/hadoop/multi_container

That contains 3 images:

namenode
nodemanager
resource manager

Those three images are configured to talk to each other on the same host. At some point, need to figure out the cross-host networking.

There is also a functioning single image configuration which needs quite a bit of cleanup:

https://git.fedorahosted.org/cgit/dockerfiles.git/tree/hadoop/single_container

Both of these are great for POC deployments.

Are we looking to create a separate kickstart file in the spin-kickstarts repo, or pass some kind of build argument to fedora-cloud-base.ks? I would assume we want to set things like hugepages, vm.swappiness, limits, etc.

Replying to [comment:2 marshyski]:

Are we looking to create a separate kickstart file in the spin-kickstarts repo, or pass some kind of build argument to fedora-cloud-base.ks? I would assume we want to set things like hugepages, vm.swappiness, limits, etc.

I was thinking a separate kickstart, but we can use whatever includes make sense....

I would be really interested in working on this ticket. I just have a few questions about it. Would it be possible to make different images for different sets of software? For example, https://fedoraproject.org/wiki/SIGs/bigdata/packaging has the list of software the BigData SIG packages which if https://fedorahosted.org/cloud/ticket/4 has the requirement that images be less than 600MB at boot, we may go over including all of those packages.

I would be interested in making two different images revolving around two of the groups of packages in that list. One image would be an Apache Mesos image that would have Mesos, Tachyon, Spark, and Zookeeper. The other image would be a Hadoop ecosystem image with Hadoop, HBase, Pig, and Hive. I'm more familiar with Mesos and packages that use it than Hadoop, so I would probably do the Mesos image first to get my feet wet. I was thinking of a separate kickstart script as well.

Also, which F21 deadline would we have to meet?

Good questions. I think that we could do two, especially if they clearly serve different needs. But, we also wanted to start kind of small with the number of options available (to reduce our own burden, but also to avoid offering a dizzying array of confusing options before we're ready to go to a full-on community supported library of images (which is a future plan)).

The 600MB number was something I pulled out thin air as an example, and was intended to apply to the generic image. I don't think we have any constraint for this one.

We need to have a change proposal (the very basic plan) filed by next tuesday. Actually having something would be the alpha change deadline, and we could probably slip past that we really needed to if the generic cloud base image is in good shape at that time.

scollier, jeid64, marshyski -- any of you want to own or co-own this change proposal? We need to get it in by next Tuesday.

Remember that we can't have everything ready within F21 timeframe, when you say big data, people think Hadoop so I'd rather focus on getting the Hadoop ecosystem flavored image ready.
If we can get other flavors, that would be awesome too.
If nobody steps up until tuesday, I'll take ownership.

Mattdm, I can own this ticket and work on it.

Replying to [comment:9 jeid64]:

Mattdm, I can own this ticket and work on it.

Awesome, thanks. hguemar, do you want to be co-owner?

If jeid64 is ok, no problem.
Owner or not, I'll be helping anyway (currently giving a hand in reviewing the hadoop ecosystem). :)

Updated https://fedoraproject.org/wiki/Changes/Big_Data_Cloud_Image for the deadline tonight for proposal changes. Can anyone else take a look at it (especially hguemar) and see if there's anything they'd like to add or clarify?

Change has been proposed and announced.

Login to comment on this ticket.

Metadata