# IPC Notes To support the [async](https://docs.kde.org/trunk5/en/kdesrc-build/kdesrc-build/conf-options-table.html#conf-async) parameter, which permits network updates to be run in parallel with the build process, kdesrc-build implements some limited inter-process communication (IPC). In reality there are 3 separate long-term processes during an async build: +-----------------+ +---------------+ +------------+ | | | | | | | main / build <------ monitor <--------- update | | | ^ | | ^ | | +--------^--------+ | +---------------+ | +------------+ | | | | $ipc $updaterToMonitorIPC | | +--------v--------+ | | user ------->| TTY | | | +-----------------+ - 1. The main (build) process - 2. The update process, normally squelched - 3. A "monitor" process, connected to the other two ## Why IPC is necessary IPC is used to carry information about the status of build updates back to the main process. Over the years this has evolved to include, using a custom app-specific protocol: - 1. Success/failure codes (per-module) - 2. Whether the module was even attempted to be updated at all - 3. Failure codes (overall) - 4. Log messages for a module (normally squelched during update) - 5. Changes to persistent options (must be forwarded to main proc to be persisted) - 6. "Post build" messages, which must be shown by the main thread just before exit. You could in principle do most of this by doing something like serializing changes into a file after each module update and then reading the results from the file in the main thread using file locking or similar. However it seemed simpler to ferry the information over IPC pipes instead. ## How it works, today At this stage, the IPC data flow is mediated by [ksb::IPC](https://metacpan.org/pod/ksb%3A%3AIPC), which is an interface class with a couple of methods meant to be reimplemented by subclasses, and which implements the IPC API on top of those subclass-defined methods. The user code in kdesrc-build is required to create the IPC object before forking using ["fork" in perlfunc](https://metacpan.org/pod/perlfunc#fork). The parent then declares that it will be the receiver and the child declared that it will be the sender. ### Monitor process Early experiments used only the two build (main) and update processes. However this quickly ran into issues trying to keep the main process U/I in sync. During a build there was no easy way to monitor the build child's output along with the update child's, and the update child would block if it tried to write too much output to the build process if the build process was itself blocked waiting for a build. The solution was to reinvent a message queue, poorly, for much the same reason you would use a message queue today in a distributed architecture. It simplified the problem for build and update and allowed the update process to send at will without blocking, and likewise the build thread did not have to worry about blocking by trying to read from the child unless it was safe to wait. The monitor simply uses a second [ksb::IPC](https://metacpan.org/pod/ksb%3A%3AIPC) object to connect to the update child process, and feeds messages it receives from the child to the parent, in the order received and exactly once. ### Ordering the update and build To keep the build from proceeding before the update has completed, the IPC class supports methods to wait for the module to complete if it hasn't already. By their nature these are blocking methods, ultimately these block waiting on I/O from the monitor. This means that the build process will block forever if the update thread forgets to send the right message. The update process should build modules in the same order the build process will expect them, though this won't cause the build to block forever if it does not. ### Squelching log messages The various logging methods all output the message immediately. This is problematic in the context of concurrent build and update processes, especially since most log messages do not duplicate the name of the module (since it's normally nearby in the U/I output). We resolve this tension by having the update process pass the IPC object into ksb::Debug, which will then feed the output to the IPC handle instead of STDOUT/STDERR. In the build process, as log messages are read in from the update process, they are stored and then printed out once it comes time to build the module. This system only works because the update and build processes are separate processes. The 'modern' scheme I'm building towards does not require the existence of a separate update process at all, but we may still retain it to make squelching work. ### Commands that do not require IPC The log\_command() call in [ksb::Util](https://metacpan.org/pod/ksb%3A%3AUtil) also uses a fork-based construct to read I/O from a child (to redirect output to the log file and/or to a callback). It is safe to use this function from the update thread, as long as we are disciplined about using unique names for each log-file. The update process will set the `latest` and `error.log` symlinks as necessary, and the main process will find `error.log` where it expects to when making the report at the end. Note that this works only if the base log directory for the module is created in [ksb::BuildContext](https://metacpan.org/pod/ksb%3A%3ABuildContext) before the fork occurs! ## How it will work, tomorrow Looking back, the IPC stuff I coded isn't as bad as I remember it to be. However there are still good reasons to work to replace it with some of the superior options supported by [Mojolicious](https://docs.mojolicious.org/). - Easier use of the Web and APIs A lot of this work was kicked off based on conversations at Akademy 2018, where people asked about a way to track the progress of a kdesrc-build build using APIs or RPC. kdesrc-build isn't setup today to host a web server interface **during** the build, and the [ksb::IPC](https://metacpan.org/pod/ksb%3A%3AIPC) stuff isn't helping on that front. But this is what Mojolicious is built for. However not only would it be good to have kdesrc-build be able to feed information to e.g. a running Plasma applet, but it would also be good for kdesrc-build to be able to make API calls to KDE infrastructure, for things like bug management, creating new work branches in Gitlab, and so on. For all these things it will be greatly helpful to have the Web-native capability and event loop provided by Mojolicious. - Improved API In addition, Mojolicious's concurrent code is just a simpler API. It doesn't hurt that their "promise"-based API is the same API you'd find in JavaScript, including browsers and Node.js ecosystems. Unfortunately a lot of this is a fair bit different than what kdesrc-build has been built to date. But I think I understand how to port it over time without breaking everything. - Improved code correctness Even though we ferry a lot of information from the update process to the main process, there are still information types we do not that might be considered bugs. For example, [ksb::OptionsBase](https://metacpan.org/pod/ksb%3A%3AOptionsBase)'s `getOption`/`setOption` methods (which power the same in [ksb::Module](https://metacpan.org/pod/ksb%3A%3AModule) and [ksb::BuildContext](https://metacpan.org/pod/ksb%3A%3ABuildContext) do not make any attempt to forward changes to the options dictionaries in the update process back to the main process. Mojolicious's "subprocess" feature would allow us to move the blocking portions of the update command into a subprocess, while allowing the business logic to be retained in the main process. This way there is only one place to call `getOption`/`setOption` from, simplifying how the information flows. - Fewer bugs Mojolicious has quite a few more users testing their code base compared to my custom IPC stuff. - Finer granularity Ultimately, Mojolicious would permit the main process to split the work of the build process up to an even finer degree than "module update" or "module build". This will allow the operating system a better opportunity to let kdesrc-build use whatever is available between disk I/O, CPU, or network I/O. ### The plan Ultimately, the plan is to introduce some porting-aid functions to [ksb::Util](https://metacpan.org/pod/ksb%3A%3AUtil) and use these to slowly port calls to `log_command()` to split the function into two parts: - 1. Generate a promise-based logging command - 2. Wait on the promise for the result By splitting the work into two steps we can avoid changing too much at once, while allowing for the slow merger of promise-based code into a chain of promises that can be handled at once using standard [Mojo::Promise](https://metacpan.org/pod/Mojo%3A%3APromise) methods. The significant limitation to this is that we **cannot call `promise->wait`** recursively! #### The recursive promise->wait issue The issue is that `promise->wait` requires that the I/O loop **not** be running, for much the same reason that we avoid nested event loops with Qt-based GUI programs. `wait` is simply a convenience method to run the event loop just long enough for the given promise to resolve or reject. If multiple promises are trying to `wait` at the same time in the presence of truly concurrent code then nothing good can happen. It might be possible to do this safely with structured concurrency (i.e. if the second promise being awaited were _guaranteed_ to always complete before the first promise then it should not be an issue). There might be a way to create a new [Mojo::IOLoop](https://metacpan.org/pod/Mojo%3A%3AIOLoop) to use with the inner promise so that we can safely wait on it. But it's better to avoid it entirely. That's why the porting methods I've written check to see if the I/O loop is running and, if it is, aborts the program. #### Implication of recursive waiting problems Since we can't wait recursively on promises, we generally need to port from blocking code to promises from leaf function calls on up. As an example if a call tree looks like: + runAllTasks \-+ handle_update \-+ ksb::Module::update \-+ ksb::Updater::Git::updateInternal \-+ ksb::Updater::Git::updateExistingClone +-+ ksb::Updater::Git::_auxFunction1 | \-- ksb::Updater::Git::_nestedAuxFunction1 \---- ksb::Updater::Git::_nestedAuxFunction2 We might first port the "nested aux functions", making them create promises and then immediately await them so that they remain blocking. However, once we port their caller `_auxFunction1`, and make `_auxFunction1` create a promise and then await it to remain blocking, we **must** get rid of the blocking calls within the "nested aux functions", and have them deal only in promises. This will apply so on up the chain. At each level of the call tree that we want to block on a promise for each of porting, **all child calls** must be promise-native with no blocking at all! This can require rewriting complicated functions that have several "await\_result" calls to instead return a chain of promises.