COMP 3511: Lecture 7

Date: 2024-09-24 14:35:41

Reviewed:

Topic / Chapter:

summary

❓Questions

Notes

Operation on Processes (cont.)

Process creation
- fork()
  - parent & child: can be added to ready queue parallelly
    - behavior: cannot be determined by code only
  - parent & child: resumes execution after fork w/ the same PC
    - i.e. the return of fork() call
  - after
  - return values of fork()
    - each process: receives exactly one return value
    - -1: unsuccessful (to parent)
    - 0: successful (to child)
    - >0: successful (to parent)
      - returned value: child's PID
      - 👨‍🏫 losing track of PID / etc.: not discussed here
    - 👨‍🏫 this value: used to distinguish post-fork behavior of process
  - almost everything of the parent gets copied
    - memory / file descriptors / etc.
    - i.e. such copy of everything: very costly & time consuming
      - parent can't access child's cloned memory
      - nor the child access parent's original memory
- UNIX fork
  - create & initialize process control block (PCB) in kernel
  - create new address space / allocate memory
  - initialize address space w/ the copy of entire contents
    - time consuming!
  - inherit the execution context of the parent (e.g. open files)
    - i.e. all stack and etc. information
  - inform the CPU scheduler: child process is ready to run
- parent & child comparison
  - after fork()
  Duplicated Different
  
  address space PID
  
  global-local var fork() return
  
  current working dir running time
  
  root dir running state
  
  process resources
  
  resource limits
  
  program counter
  
  ...
  example
```
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>

#define BUFSIZE 1024
int main(int argc, char *argv[]) {
    char bug[BUFSIZE];
    size_t readlen, writelen, slen;
    pid_t cpid, mypid;
    pit_t pid = getpid();
    printf("Parent pid: %d\n", pid);
    cpid = fork(); // branch
    if (cpid > 0) {
        mypid = getpid(); // parent
        printf("[%d] parent of [%d]\n", mypid, cpid);
    } else if (cpid == 0) {
        mypid = getpid(); // child
        printf("[%d] child\n", mypid);
    } else {
        perror("Fork failed");
        exit(1);
    }
}
```
  - note that
    - output after printing (i.e. within control flow) has undetermined order
      
      depending on CPU scheduler
system calls
- exec(), execlp(): syscall to change program in current process
  - creates a new process image from: regular executable file
  - 👨‍🎓 ~= j to another program?
    - ⭐ no return to original process!
- wait(): syscall to wait for child process to finish
  - or: on general wait for event
  - 👨‍🏫 enter waiting stage & give up CPU
    - 👨‍🎓 if parent keep occupying CPU: then a single-core won't be able to be execute a fork w/ wait()
    - ⭐ very very important!
- exit(): syscall to terminate current process
  - - free all resources
- signal(): syscall to send notification to another process
- implementing a shell
```
char *prog, **args;
int child_pid;

while (readAndParseCmdLine(&prog, &args)) {
    child_pid = fork();
    if (child_pid == 0) {
        exec(prog, args); // run command in child
        // cannot be reached
    } else {
        wait(child_pid);
        return 0;
    }
}
```
- fork tracing
  - to trace forks in loops, try to expand the loop
    - e.g. for (int i=0; i < 10; ++i) fork(); into fork(); fork(); fork();...
  fork diagrams (credit: IA Peter)

Duplicated	Different
address space	PID
global-local var	`fork()` return
current working dir	running time
root dir	running state
process resources
resource limits
program counter
...

---
    title: fork()
    ---
    flowchart LR
        p0((p0)) --> f1{fork 1}
        f1 --> p0_2((p0))
        f1 --0--> p1((p1))

---
    title: fork(); fork()
    ---
    flowchart LR
        p0((p0)) --> f1{fork 1}
        f1 --> p0_2((p0))
        f1 --0--> p1((p1))
        p0_2 --> f2{fork 2}
        f2 --> p0_3((p0))
        f2 --0--> p2((p2))
        p1 --> f3{fork 3}
        f3 --> p1_2((p1))
        f3 --0--> p3((p3))

---
    title: fork()&&fork()
    ---
    flowchart LR
        p0((p0)) --> f1{fork 1}
        f1 --> p0_2((p0))
        f1 --0--> p1((p1))
        p0_2 --> f2{fork 2}
        f2 --> p0_3((p0))
        f2 --0--> p2((p2))

---
    title: fork()||fork()
    ---
    flowchart LR
        p0((p0)) --> f1{fork 1}
        f1 --> p0_2((p0))
        f1 --0--> p1((p1))
        p1 --> f2{fork 2}
        f2 --> p1_2((p1))
        f2 --0--> p2((p2))

</details>

Final notes on fork
- process creation is unix: unique (no pun)
  - most os: create a process in new address space & read in an executable file and execute
  - Unix: separating it into fork() and exec()
- linux: fork(): implemented via copy-on-write
  - as usually: we don't need the entire copy
    - thus we can delay / prevent copying the data
      - until content is changes / written to
      - improves speed a lot!
  - child process: points to parent process's address space
- linux also implements fork() via clone() (more general)
  - clone(): uses a series of flags allow to specify which set of resources should be shared by parent & child
Process termination
- termination
  - after process executes the last statement
    - exit syscall: used to ask OS to delete it
- some os: do not allow child to exist if parent has terminated
  - i.e. all children are to be terminated after parents
    - cascading termination by the OS
- process termination:
  - deallocation: must involve the OS
  - e.g. kernel data, etc.: cannot be accessed / modified by the user application
- concepts
  - zombie process: process terminated, but wait not called on parents yet
    - e.g. corresponding entry in the process table / PID, and PCB
    - 👨‍🏫 every process enters this stage, at least for a moment after termination
      - nothing wrong. It's just "we are almost over"
    - but zombies can be accumulated, and wit as a problem back then
      - because the memory restriction was very tight ~=30 years ago
    - once parent calls wait: PID of zombie process and other corresponding entry: released
    - such design: enables parent inform the OS termination of the child
      - 👨‍🏫 not the best design, nor the only. but the design of Unix
  - if parent terminates without invoking wait()
    - child becomes an orphan
    - without cascading, the process might be still runnable
    - or become a zombie
      - which, will never be released, as no parent exist
    - thus: all process (except root or so): must have a parent
      - (or kill them all using cascading)
      - 👨‍🎓 can we assign
  - 👨‍🏫 this is the design chosen by UNIX
    - 👨‍🎓 can't we ensure the child to call OS and take care of themselves?
      - maybe we can, but this is choice or trade-off made by the unix
    - 👨‍🏫&👨‍🎓 parent has ability to track all its children, but it is not required.

COMP 3511: Operating Systems