#1411 Refactor ModuleBuild.current_batch to filter using SQL instead of Python
Opened 4 years ago by mprahl. Modified 4 years ago

ModuleBuild.current_batch queries for all component builds associated with the module and then filters the component builds using Python. This is inefficient and should just filter using SQL instead.


Probably this refactor may not be applicable for the use cases in MBS.

module_build = db.relationship("ModuleBuild", backref="component_builds", lazy=False)

component_builds is defined in ComponentBuild as a backref with lazy=False. IIRC, that means SQLAlchemy will load all associated component builds eagerly at once, so after the first time access of module_build.component_builds, the loaded component builds will be accessible from memory without any other additional query from database.

Let's see the use cases of module_build.component_builds in MBS. Once component builds are recorded into database, no component build will be removed from module build and no new component build will be added. So, every time access to module_build.component_builds, code will get same set of builds and query database only once to get all component builds. To optimize the access to ModuleBuild.current_batch() a little bit, we can simply save the result to a variable and reuse it:

current_batch = module_build.current_batch()
...
failed_components = [... for b in current_batch]
completed_components = [... for b in current_batch]

If we refactor current_batch by querying database, that would mean every call to ModuleBuild.current_batch will query database and cannot use the SQLAlchemy "cache" mechanism.

So, looping component builds inside memory should be fast enough than querying database every time.

By the way, do we have some statistics of component builds? So far, what is the maximum number of component builds a module build has?

@cqi you are correct. I had forgotten that SQLAlchemy caches relationship queries in the session until a commit occurs. I tried locally, and it seems that SQLAlchemy only eagerly loads ComponentBuild.module_build. The backref is still lazy, which is the correct behavior here, but I also wonder if we should remove lazy=False on ComponentBuild.module_build. That relationship is probably not always used when a component build is queried for.

I still think it is wasteful since some batches can contain a single component where as another batch can include hundreds of components. One example is the ffsend module, which has module-build-macros in the first batch, 196 components in the second batch, and 1 component in the third batch. That means for batch 1 and 3, 197 components are being pulled down for no reason.

@mprahl Make sense to me. Let's do it.

Login to comment on this ticket.

Metadata