#95 GSoC Project Proposal - Interactive Performance Monitoring for Podman
Closed 3 years ago by siddharthvipul1. Opened 3 years ago by t0xic0der.

This is a proposed project for Google Summer of Code

  • Skill level: Intermediate to advanced
  • Skills required:
    • Programming languages - Python, JavaScript
    • Markup languages and design - HTML5, CSS3
    • Framework and libraries - Falcon, Terminado, Flask, Click, JQuery, XTerm
    • Concepts - Computer Networks, Operating Systems, Containerization
    • Technologies - Podman and Docker
  • Mentor(s): @t0xic0der, [Please comment under the ticket if you are interested]
  • Contacts (IRC & email): t0xic0der (t0xic0der@fedoraproject.org)

  • Idea description:

    • There aren't many FOSS alternatives out there which assist you to monitor performance of a pod and view other nuances such as images, networks, volumes etc.
    • The idea is to build a web-based interactive performance monitoring tool for Podman hosts which help with abstraction and actively keep up with the changes in a pod.
    • This would require creation of a JSON-based API which talks with the Podman UNIX socket and lets out those details in an passcode-based encrypted fashion via endpoints.
    • There would be a decoupled web interface which can then connect to the endpoints of the aforesaid API in a secure manner and display graphs, logs, statistics with refresh rates.
  • What are we looking for:

    • A self-hosted service for folks using Podman, for monitoring performances actively by the end of GSoC period.
    • People who are interested to work with computer networks and containers, and/or have a sound knowledge of both.
  • Notes & references:

    • I have a WIP plan-of-action about how this can be achieved. Please let me know via mail/Matrix/Telegram if you want to know about it.

EDIT - Added necessary information regarding expectations from participants, implementation, benchmarks for success/failure determination, milestones and roadmap.

  • Expectations from participant - The participant is expected to work on both backend (monitoring of Podman/Docker statistics and conveying the same using an JSON-based HTTP REST API) and frontend (actively synchronizing states from the endpoint and visualizing graphs/statistics in a web application) with a special emphasis on efficiency of execution and maintainability of code.

  • Yes. It would be more or less, an implementation from scratch but there are libraries available which can easily facilitate monitoring, help build APIs, vary DOM elements periodically etc. They are free to take a look at my implementation as well, if needed.

  • Judgement of success or failure
    There are five important parameters to monitor and report here which are Host performance, Container information, Images information, Volume information and Network information. Once these five elements are crossed off the checklist - with of course, good session management and user experience, the project is successful. Failing to meet one or more of the aforementioned requirements marks the failure of the project.

  • Roadmap

    • Week 1
      • Part 1 - Monitoring host system performance (CPU/Memory/Thermal/Storage/Battery etc.)
      • Part 2 - Setting up an API to securely convey the above information
      • Part 3 - Review #1 - Peer testing, good practices conformity and bug fixing
    • Week 2
      • Part 1 - Monitoring container information (Preliminaries/Statistics/Logging/Processes etc.)
      • Part 2 - Monitoring image information (Preliminaries/Revisions etc.)
      • Part 3 - Review #2 - Peer testing, good practices conformity and bug fixing
    • Week 3
      • Part 1 - Monitoring volume information (Preliminaries etc.)
      • Part 2 - Monitoring network information (Preliminaries etc.)
      • Part 3 - Review #3 - Peer testing, good practices conformity and bug fixing
    • Week 4
      • Part 1 - Wrapping the above information in the same APi
      • Part 2 - Picking routes, methods, cryptographic methods etc.
      • Part 3 - Review #4 - Peer testing, good practices conformity and bug fixing
    • Week 5
      • Part 1 - Setting up a web application with proper secure session management
      • Part 2 - Building frontend for actively rendering dashboard, host perf and container information
      • Part 3 - Review #5 - Peer testing, good practices conformity and bug fixing
    • Week 6
      • Part 1 - Building frontend for actively rendering image, volume and network information
      • Part 2 - Rigorous introspective testing and bag packing
      • Part 3 - Review #6 - Peer testing, good practices conformity and bug fixing

Hi @t0xic0der You have shown me the work you have been putting in this and I like the idea and work you have done but I am conflicted because this has nothing to do with the Fedora project yet.

  • Have you discussed this with Podman folks to get their feedback?
  • how is it going to benefit fedora project?
  • How is it different (or what other problem it solves) compared to other monitoring tools like prometheus + grafana (or more that I am not aware of)

Metadata Update from @siddharthvipul1:
- Issue tagged with: GSoC

3 years ago

but I am conflicted because this has nothing to do with the Fedora project yet.

Fedora recommends Podman so it can potentially become an offering along with Podman.

Have you discussed this with Podman folks to get their feedback?

I would really need directions on this one. Links, references to mailing lists, folks to contact to - if you could suggest me anything, that could help me a lot to get a review on the idea.

how is it going to benefit fedora project?

It can become a quality offering "from" Fedora to be used alongside Podman. Call it a product of sorts. If there are folks who are using Podman in Fedora (which is definitely huge in numbers), they can facilitate the lower learning curve of monitoring containers with the end project.

How is it different (or what other problem it solves) compared to other monitoring tools like prometheus + grafana (or more that I am not aware of)

Prometheus + Grafana is a composite tool. For someone looking for options to make performance monitoring an easier yet choiceful process, the project can be of a potential help. Plus, there would certainly be folks who would want to keep aside the hacky nature of joining endpoints and having two distinct containers talk with each other just to monitor when they can do so with just one.

There are offerings like Datadog etc. as well which can be good alternatives but I am not so sure if they work on Podman. The last I checked, they were very specific with their support to Docker only but maybe I might have just scratched the surface. The Podman field is kind of empty when it comes to interactive monitoring and control and that's where this project comes in.

[Plus if I could rack up a PoC (and fever + bad right wrist) within three days of time by working on both the monitoring and displaying ends, I am sure as hell - the participants would be able to do so as well within the limited span of time GSoC allots us]

but I am conflicted because this has nothing to do with the Fedora project yet.

Fedora recommends Podman so it can potentially become an offering along with Podman.

Right, but recommendation is not same as offering :)
but that being said, I am not saying that podman related project is not relevant here (jumping to next point)

Have you discussed this with Podman folks to get their feedback?

I would really need directions on this one. Links, references to mailing lists, folks to contact to - if you could suggest me anything, that could help me a lot to get a review on the idea.

I meant before opening a project that you want to offer alongside with podman, did you discuss with them :) I just wanted to hear from people from podman what they think about this or who wants this project :)

how is it going to benefit fedora project?

It can become a quality offering "from" Fedora to be used alongside Podman. Call it a product of sorts. If there are folks who are using Podman in Fedora (which is definitely huge in numbers), they can facilitate the lower learning curve of monitoring containers with the end project.

I would recommend discussing it on devel list and seeing interest

<snip>

[Plus if I could rack up a PoC (and fever + bad right wrist) within three days of time by working on both the monitoring and displaying ends, I am sure as hell - the participants would be able to do so as well within the limited span of time GSoC allots us]

If you are saying this project is in PoC stage and you have worked on it for 3 days, I believe it's not yet ready to put in front of GSoC student. I am not worried about whether the student can do what you expect them to do or not, I am worried about lack of path and stability in the project's direction.
Please discuss this on devel list.

Leaving it to @sumantrom

also, can you please describe the task you are expecting to be done from the student?
it's missing in the ticket
Is it implementation from scratch?
how would you judge success/failure of student. Identifying milestones and writing roadmap weekly/bi-weekly would help

[Plus if I could rack up a PoC (and fever + bad right wrist) within three days of time by working on both the monitoring and displaying ends, I am sure as hell - the participants would be able to do so as well within the limited span of time GSoC allots us]

Having been a mentor a few times, I can tell that this shows how good of a developer you are, but it doesn't say anything about the student. What took you three days, could take three months to the student. It really depends on the student and your capacity of doing something does not relate in any way to the student's capabilities.

but I am conflicted because this has nothing to do with the Fedora project yet.

Fedora recommends Podman so it can potentially become an offering along with Podman.

That sounds like it would be a good project to be hosted under the podman GSoC organization. I have no idea if there are a GSoC organization but that would be where I reach out.

I meant before opening a project that you want to offer alongside with podman, did you discuss with them :) I just wanted to hear from people from podman what they think about this or who wants this project :)

Yes @siddharthvipul1, I know that a discussion would be important and all I stand to ask are waypoints as to where I can take up the discussions.

I would recommend discussing it on devel list and seeing interest

Noted. I would do so soon.

If you are saying this project is in PoC stage and you have worked on it for 3 days, I believe it's not yet ready to put in front of GSoC student. I am not worried about whether the student can do what you expect them to do or not, I am worried about lack of path and stability in the project's direction.

Seems like I was not able to explain myself clearly here. :)

The driver service providing JSON API endpoints https://github.com/t0xic0der/supervisor-driver-service/tree/master and the frontend visualizing the statistics https://github.com/t0xic0der/supervisor-frontend-service/tree/master at master is the PoC on which I worked upon to attain https://github.com/t0xic0der/supervisor-driver-service/tree/container-monitoring and https://github.com/t0xic0der/supervisor-frontend-service/tree/container-monitoring which I demonstrated to you yesterday, both were achieved within 3 days each.

I would suggest exploring the code and the functionalities first before moving on the see if the project is mature or not as there are a lot of nuances that were taken care of to reach that goal. Well, I had to push away Flask and introduce Falcon just because I wanted to make sure that the backend should be as fast and efficient in frequently transmitting information as possible.

also, can you please describe the task you are expecting to be done from the student?
it's missing in the ticket
Is it implementation from scratch?
how would you judge success/failure of student. Identifying milestones and writing roadmap weekly/bi-weekly would help

I would update the ticket contents shortly to make a note of it.

Having been a mentor a few times, I can tell that this shows how good of a developer you are, but it doesn't say anything about the student. What took you three days, could take three months to the student. It really depends on the student and your capacity of doing something does not relate in any way to the student's capabilities.

Certainly. But with me keeping faith on how the participants are selected for the same, I can be assured that someone right would be picked up who would be able to achieve the target, with the help of correct mentors in a justified amount of time.

I would implore you to please take a look at the aforementioned links to decide on the project's maturity and to see if it is worth pursuing or not. If you can be free for even half an hour, we could arrange a video meet - where I would go on explaining the nuances considered for the project and demonstrate its current state as well.

That sounds like it would be a good project to be hosted under the podman GSoC organization. I have no idea if there are a GSoC organization but that would be where I reach out.

I don't think Podman, as an organization, applies for GSoC participation.

This is one of the past projects, Fedora has pursued as a part of GSoC https://summerofcode.withgoogle.com/archive/2019/projects/5713938757451776/ which has ties to Podman.

I really think that the proposition can turn out good under favorable circumstances, participants and mentors.

Why not add this to Cockpit's Podman plugin instead of making another self hosted service for it ? It already provides basic metrics like CPU, memory usage and logs, alongside container and image management features.

It's a good suggestion but I had already scoped the peculiarities of it happening. The complexity of the outcome project would be big enough that we would not want to include it just as a plugin but as a project of its own. Plus with the Cockpit plugin option, the students are expected to write their own API to dispatch monitored details to the client which essentially means that they would have to first know about Cockpit in the first place and then, move on to writing an API backend and a graphical frontend. Wouldn't that be too much to ask for?

Hi all, I made comments in mentored-projects#85 that also include this ticket. Would be great to get wider feedback in the other ticket, in order to figure out where to drive the conversation next.

Hi @t0xic0der @siddharthvipul1. I've contributed to Fedora in the past and after a dormant period, I am looking for ways to change that. I came across this issue and since I have some relevant experience running container workloads in production in the past, I want to add my 2 cents here.

I really liked the idea of having a monitoring tool with a web UI that tells you the state of your containers and don't we all love beautiful graphs that ease the pain of having to sift through metrics ourselves? :). That being said, I have a few points that I would like to bring into the discussion:

  • The above mentioned idea seems a bit too specific to a project. With it, we might be solving problems that are already solved by general purpose monitoring tools like Prometheus + Grafana.
    The beauty of having general purpose tools is that they can be plugged into any system to monitor it without having to learn a new tool every time or have a new dashboard to look at for every system in operation.

  • Podman like other container engines provides the podman stats command to monitor a container's stats via the CLI which usually works well for single developers.
    Most people running serious container workloads in containers use tools for orchestrating/managing their lifecycle and therefore as mentioned earlier, a general purpose monitoring tool works well for them as they can use the same tool to monitor the system itself, the cluster metrics, the container metrics, the metrics of the node that the containers are running on etc.

  • I second @icewreck 's comment on adding a plugin for Cockpit or for that matter as it means that we would be adding/improving support for monitoring podman to an already existing general purpose tool. This can benefit everyone who uses it and it would improve their experience in monitoring podman deployed containers. I also think that adding support for something in a general purpose monitoring tool benefits the wider community who consume the tool more and adds more value in general.

  • Also, there are design questions I have about how the metrics(time series data) would be stored and retrieved in the system mentioned above. Most times, folks like me like to look at historical data to correlate with incidents and events to make sense of what happened. A good monitoring system provides you with reasonably long retention for metrics so that they can be analyzed to understand changes in the system being monitored. Therefore the question of metrics storage/retrieval remains unanswered.

To summarize, I feel that a monitoring tool + dashboard specific to podman would be a bit too specific. The value added with this system to those who run their workloads in containers might not be greater in comparison to improving podman's monitoring experience with general purpose and non opinionated tools. I believe that doing so will benefit the wider community consuming it as it plays well with tools that they already know and use and helps them keep their monitoring all in one place rather than look at multiple tool specific dashboards.

Is there ways you think we can improve podman's monitoring to benefit the wider community as a whole? I can think of one way - To build a prometheus exporter for podman which can enable prometheus to scrape/monitor metrics of containers spun up by podman. This means that podman containers can now be monitored using Prometheus and when someone runs podman containers on their cluster, they can consume the exporter to monitor their containers. The complexity of this should not be more than the existing proposal as there are libraries that make writing an exporter really simple. I can't comment much about adding a Cockpit plugin as I don't have much idea about that.

But again, as @siddharthvipul1 questioned earlier, doing any of the above might not directly impact Fedora

Hi @t0xic0der @siddharthvipul1. I've contributed to Fedora in the past and after a dormant period, I am looking for ways to change that. I came across this issue and since I have some relevant experience running container workloads in production in the past, I want to add my 2 cents here.

Hey @t0xic0der , @siddharthvipul1 and Prakash!

I really liked the idea of having a monitoring tool with a web UI that tells you the state of your containers and don't we all love beautiful graphs that ease the pain of having to sift through metrics ourselves? :). That being said, I have a few points that I would like to bring into the discussion:

  • The above mentioned idea seems a bit too specific to a project. With it, we might be solving problems that are already solved by general purpose monitoring tools like Prometheus + Grafana.
    The beauty of having general purpose tools is that they can be plugged into any system to monitor it without having to learn a new tool every time or have a new dashboard to look at for every system in operation.

  • Podman like other container engines provides the podman stats command to monitor a container's stats via the CLI which usually works well for single developers.
    Most people running serious container workloads in containers use tools for orchestrating/managing their lifecycle and therefore as mentioned earlier, a general purpose monitoring tool works well for them as they can use the same tool to monitor the system itself, the cluster metrics, the container metrics, the metrics of the node that the containers are running on etc.

  • I second @icewreck 's comment on adding a plugin for Cockpit or for that matter as it means that we would be adding/improving support for monitoring podman to an already existing general purpose tool. This can benefit everyone who uses it and it would improve their experience in monitoring podman deployed containers. I also think that adding support for something in a general purpose monitoring tool benefits the wider community who consume the tool more and adds more value in general.

I too side with @icewreck here. Cockpit is large project and from a student's point of view, they have a community and post GSoC, we still need numbers of "how many" folks can be retained. I see this going hand in hand with Cockpit plugin implementation since the student will be able to continue developing their skills in the Cockpit or relevant functional area.

  • Also, there are design questions I have about how the metrics(time series data) would be stored and retrieved in the system mentioned above. Most times, folks like me like to look at historical data to correlate with incidents and events to make sense of what happened. A good monitoring system provides you with reasonably long retention for metrics so that they can be analyzed to understand changes in the system being monitored. Therefore the question of metrics storage/retrieval remains unanswered.

I have the same questions and however, there are possible solutions of storage (influx) but the real question is, the whole idea of a container is to keep it simple and having a monitoring tool that the community won't want to ship out-of-the-box with Podman, raises the question of... How do we tell the student, that their code would probably exist in repo but there wont be any participation from the community or acceptance for that matter.

To summarize, I feel that a monitoring tool + dashboard specific to podman would be a bit too specific. The value added with this system to those who run their workloads in containers might not be greater in comparison to improving podman's monitoring experience with general purpose and non opinionated tools. I believe that doing so will benefit the wider community consuming it as it plays well with tools that they already know and use and helps them keep their monitoring all in one place rather than look at multiple tool specific dashboards.

Is there ways you think we can improve podman's monitoring to benefit the wider community as a whole? I can think of one way - To build a prometheus exporter for podman which can enable prometheus to scrape/monitor metrics of containers spun up by podman. This means that podman containers can now be monitored using Prometheus and when someone runs podman containers on their cluster, they can consume the exporter to monitor their containers. The complexity of this should not be more than the existing proposal as there are libraries that make writing an exporter really simple. I can't comment much about adding a Cockpit plugin as I don't have much idea about that.

But again, as @siddharthvipul1 questioned earlier, doing any of the above might not directly impact Fedora

I have somewhat fail to understand, if you have already spoken to Podman folks ? For any GSoC project, the community needs to stand behind the project and come to an agreement about participation guidelines. Mostly, people start by talking to community members, building a roadmap with them and then finally having a group of 2-3 mentors who will help the mentee if they get stuck. Google mandated that we have at least 2 mentors since last GSoC.
I am -1 to the idea for this being a GSoC project in general. However, @siddharthvipul1 I would like to know your final thoughts.

@t0xic0der , I will be glad to hear about if you have collaborated with the Podman folks and how they feel about shipping this out of the box or put it somewhere, which then will be visible to the community. As its the community that will help to maintain this project forward.

Hey @prakashmishra1598, :smile:

The above mentioned idea seems a bit too specific to a project. With it, we might be solving problems that are already solved by general purpose monitoring tools like Prometheus + Grafana.
The beauty of having general purpose tools is that they can be plugged into any system to monitor it without having to learn a new tool every time or have a new dashboard to look at for every system in operation.

That is something I completely agree with. Why would someone want to pick up an alternative to Prometheus when it already exists and also performs better in most cases than what SuperVisor plans to achieve. SuperVisor might not be a complete replacement to Prometheus + Grafana and it does not plan on doing so as well but it does aim to provide a simpler way to monitor performances across multiple containers which are running as a service.

Though the usability of it itself is a big and valid question. It would not be of much value as a standalone general purpose tool and there would not be folks using it unfortunately in the wake of alternatives. We would want to rescope this differently to enable people to actually take advantage of what's been made.

Podman like other container engines provides the podman stats command to monitor a container's stats via the CLI which usually works well for single developers.
Most people running serious container workloads in containers use tools for orchestrating/managing their lifecycle and therefore as mentioned earlier, a general purpose monitoring tool works well for them as they can use the same tool to monitor the system itself, the cluster metrics, the container metrics, the metrics of the node that the containers are running on etc.

Maintaining multiple containers and then popping open another terminal just to run podman stats sounds like yak-shaving and could seriously hamper the productivity of single developers. Instead, it would be a more preferable decision to keep a service running to monitor the performance but yes, the project would take a lot of effort (and time) to be able to serve those maintaining bigger container workloads as the project is just ain't there yet and I could only amount to so much in a given time. Influx of contributors can actually make it reach a state where it can be that good and effective which unfortunately isn't the case.

I second @icewreck 's comment on adding a plugin for Cockpit or for that matter as it means that we would be adding/improving support for monitoring podman to an already existing general purpose tool. This can benefit everyone who uses it and it would improve their experience in monitoring podman deployed containers. I also think that adding support for something in a general purpose monitoring tool benefits the wider community who consume the tool more and adds more value in general.

It is a good idea but I am concerned about having the participants to learn how Cockpit works first which might be a bit too much work for the one and half month of the period. But if this is how the project would be able to serve the community, I do agree to this as Cockpit is a tool with a lot of users and community folks behind it.

To summarize, I feel that a monitoring tool + dashboard specific to podman would be a bit too specific. The value added with this system to those who run their workloads in containers might not be greater in comparison to improving podman's monitoring experience with general purpose and non opinionated tools. I believe that doing so will benefit the wider community consuming it as it plays well with tools that they already know and use and helps them keep their monitoring all in one place rather than look at multiple tool specific dashboards.

Yes, inclusion of it as feature to an already existing and utilized tool than having to provide a standalone tool might just be the way forward. We would not want to efforts of the GSoC participants go unappreciated and unused by creating a tool that might take some time to build their userbase. Creating its own userbase could have been the experimental part of the process but with too much risk to account for.

Is there ways you think we can improve podman's monitoring to benefit the wider community as a whole? I can think of one way - To build a prometheus exporter for podman which can enable prometheus to scrape/monitor metrics of containers spun up by podman. This means that podman containers can now be monitored using Prometheus and when someone runs podman containers on their cluster, they can consume the exporter to monitor their containers. The complexity of this should not be more than the existing proposal as there are libraries that make writing an exporter really simple. I can't comment much about adding a Cockpit plugin as I don't have much idea about that.

This is one of the great propositions that can both be innovative and garner usage from the community at the same time. I don't know if Prometheus is actually doing it already or not but if they are not, we would definitely want to pursue this.

Hey @sumantrom,

Retention of participants beyond the scope of GSoC has also been one of my top priorities (as evidenced by the conversations that have been made at https://pagure.io/mentored-projects/issue/85) and I too believe that introducing a general-purpose standalone tool would have fewer takers and support from the community than an addition to an already existing tool. If there is less utilization of the tool, the contributions made by the folks from the community and the participants from GSoC would go unappreciated giving them no incentive to stay back and contribute beyond the scope of GSoC - which we do not want.

Plus, I think I am missing the point here for I do not understand as to why the contributions made by the students remain just as a plain repository and not serve the community. There are packages like cockpit-podman which can be installed as a plugin to enable Podman preliminary monitoring support for Cockpit. We could use the same approach here as well, right? This can be packaged in the repositories and made available for the community when the project is complete. This can be of potential help as an addon like @icewreck, @prakashmishra1598 and you reckoned but the scoping is happening when we are inching close.

My bad. Maybe I should have made the proposition earlier.

I am -1 to the idea for this being a GSoC project in general. However, @siddharthvipul1 I would like to know your final thoughts.

I am -1 as well considering it's not been discussed with podman community and not relevant directly to the Fedora Project.
I like the idea and I would really love to see it a successful project and down the road, being a Fedora initiative (or a mentored project) with podman.
Please take the time and discuss with wider community on how to make it the best possible effort that works with podman (or containers in general).
I would like to thank @t0xic0der for opening this proposal and @icewreck, @prakashmishra1598 and @pingou to participate in this ticket.

closing this as it's not accepted in this year slot. Hoping to see it succeed and get back to it in future

Metadata Update from @siddharthvipul1:
- Issue status updated to: Closed (was: Open)

3 years ago

Alright, Vipul.

I would continuing on developing the project by myself and with the new
WebDev SIG - and keep you updated with the progress. :)

On Thu, Mar 4, 2021 at 11:10 AM Vipul Siddharth pagure@pagure.io wrote:

siddharthvipul1 added a new comment to an issue you are following:
closing this as it's not accepted in this year slot. Hoping to see it succeed and get back to it in future

To reply, visit the link below or just reply to this email
https://pagure.io/mentored-projects/issue/95

Log in to comment on this ticket.

Metadata