DESIGN: ======= The KIO framework uses workers (separate processes) that handle a given protocol. Launching those workers is taken care of by the kdeinit/klauncher tandem, which are notified by DBus. (TODO: update to klauncher remove, also below) Connection is the most low-level class, the one that encapsulates the pipe. WorkerInterface is the main class for transferring anything to the worker and Worker, which inherits WorkerInterface, is the sub class that Job should handle. A worker inherits WorkerBase, which is the other half of WorkerInterface. The scheduling is supposed to be on a two level basis. One is in the daemon and one is in the application. The daemon one (as opposite to the holy one? :) will determine how many workers are ok for this app to be opened and it will also assign tasks to actually existing workers. The application will still have some kind of a scheduler, but it should be a lot simpler as it doesn't have to decide anything besides which task goes to which pool of workers (related to the protocol/host/user/port) and move tasks around. Currently a design study to name it cool is in scheduler.cpp but in the application side. This is just to test other things like recursive jobs and signals/slots within WorkerInterface. If someone feels brave, the scheduler is yours! On a second thought: at the daemon side there is no real scheduler, but a pool of workers. So what we need is some kind of load calculation of the scheduler in the application and load balancing in the daemon. A third thought: Maybe the daemon can just take care of a number of 'unused' workers. When an application needs a worker, it can request it from the daemon. The application will get one, either from the pool of unused workers, or a new one will be created. This keeps things simple at the daemon level. It is up to the application to give the workers back to the daemon. The scheduler in the application must take care not to request too many workers and could implement priorities. Thought on usage: * Typically a single worker-type is used exclusively in one application. E.g. http workers are used in a web-browser. POP3 workers used in a mail program. * Sometimes a single program can have multiple roles. E.g. konqueror is both a web-browser and a file-manager. As a web-browser it primarily uses http-workers as a file-manager file-workers. * Selecting a link in konqueror: konqueror does a partial download of the file to check the MIME type (right??) then the application is started which downloads the complete file. In this case it should be able to pass the worker which does the partial download from konqueror to the application where it can do the complete download. Do we need to have a hard limit on the number of workers/host? It seems so, because some protocols are about to fail if you have two workers running in parallel (e.g. POP3) This has to be implemented in the daemon because only at daemon level all the workers are known. As a consequence workers must be returned to the daemon before connecting to another host. (Returning the workers back to the daemon after every job is not strictly needed and only causes extra overhead) Instead of actually returning the worker to the daemon, it could be enough to ask 'recycling permission' from the daemon: the application asks the daemon whether it is ok to use a worker for another host. The daemon can then update its administration of which worker is connected to which host. The above does of course not apply to hostless protocols (like file). (They will never change host). Apart from a 'hard limit' on the number of workers/host we can have a 'soft limit'. E.g. upon connection to a HTTP 1.1 server, the web- server tells the worker the number of parallel connections allowed. THe simplest solution seems to be to treat 'soft limits' the same as 'hard limits'. This means that the worker has to communicate the 'soft limit' to the daemon. Jobs using multiple workers. If a job needs multiple workers in parallel (e.g. copying a file from a web-server to a ftp-server or browsing a tar-file on a ftp-site) we must make sure to request the daemon for all workers together since otherwise there is a risk of deadlock. (If two applications both need a 'pop3' and a 'ftp' worker for a single job and only a single worker/host is allowed for pop3 and ftp, we must prevent giving the single pop3 worker to application #1 and the single ftp worker to application #2. Both applications will then wait till the end of times till they get the other worker so that they can start the job. (This is a quite unlikely situation, but nevertheless possible)) File Operations: listRecursive is implemented as listDir and finding out if in the result is a directory. If there is, another listDir job is issued. As listDir is a readonly operation it fails when a directory isn't readable .. but the main job goes on and discards the error, because bIgnoreSubJobsError is true, which is what we want (David) del is implemented as listRecursive, removing all files and removing all empty directories. This basically means if one directory isn't readable we don't remove it as listRecursive didn't find it. But the del will later on try to remove it's parent directory and fail. But there are cases when it would be possible to delete the dir in chmod the dir before. On the other hand del("/") shouldn't list the whole file system and remove all user owned files just to find out it can't remove everything else (this basically means we have to take care of things we can remove before we try) ... Well, rm -rf / refuses to do anything, so we should just do the same: use a listRecursive with bIgnoreSubJobsError = false. If anything can't be removed, we just abort. (David) ... My concern was more that the fact we can list / doesn't mean we can remove it. So we shouldn't remove everything we could list without checking we can. But then the question arises how do we check whether we can remove it? (Stephan) ... I was wrong, rm -rf /, even as a user, lists everything and removes everything it can (don't try this at home!). I don't think we can do better, unless we add a protocol-dependent "canDelete(path)", which is _really_ not easy to implement, whatever protocol. (David) Lib docu ======== mkdir: ... rmdir: ... chmod: ... special: ... stat: ... get is implemented as TransferJob. Clients get 'data' signals with the data. A data block of zero size indicates end of data (EOD) put is implemented as TransferJob. Clients have to connect to the 'dataReq' signal. The worker will call you when it needs your data. mimetype: ... file_copy: copies a single file, either using CMD_COPY if the worker supports that or get & put otherwise. file_move: moves a single file, either using CMD_RENAME if the worker supports that, CMD_COPY + del otherwise, or eventually get & put & del. file_delete: delete a single file. copy: copies a file or directory, recursively if the latter move: moves a file or directory, recursively if the latter del: deletes a file or directory, recursively if the latter Resuming -------- If a .part file exists, KIO offers to resume the download. This requires negotiation between the worker that reads (handled by the get job) and the worker that writes (handled by the put job). Here's how the negotiation goes. (PJ=put-job, GJ=get-job) PJ can't resume: PJ-->app: canResume(0) (emitted by dataReq) GJ-->app: data() PJ-->app: dataReq() app->PJ: data() PJ can resume but GJ can't resume: PJ-->app: canResume(xx) app->GJ: start job with "resume=xxx" metadata. GJ-->app: data() PJ-->app: dataReq() app->PJ: data() PJ can resume and GJ can resume: PJ-->app: canResume(xx) app->GJ: start job with "resume=xxx" metadata. GJ-->app: canResume(xx) GJ-->app: data() PJ-->app: dataReq() app->PJ: canResume(xx) app->PJ: data() So when the worker supports resume for "put" it has to check after the first dataRequest() whether it has got a canResume() back from the app. If it did it must resume. Otherwise it must start from 0. Protocols ========= Most KIO workers (but not all) are implementing internet protocols. In this case, the worker name matches the URI name for the protocol. A list of such URIs can be found here, as per RFC 4395: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml