操作系统之进程与线程

现代操作系统可以在同一时间做不同的事情，比如一边浏览网页，一边听音乐，这就是因为实现了多进程或者是多线程，在单核 CPU 中，CPU 在不同进程中来回切换，所以严格的说，在单核 CPU 中任意时刻，只有一个进程在运行，如果是多核的 CPU, 则每个 CPU 都可以运行一个进程，这样就真正实现了并行。

进程模型

计算机中的所有可以运行的软件组成一系列的进程，每个进程就是一个可执行的程序，在这个进程中有自己独立的内存，寄存器，变量，有自己虚拟的 CPU, 真正的 CPU 在进程中来回切换。

进程创建

导致进程被创建的四个场景

系统初始化
通过运行程序执行创建进程的系统调用
用户请求创建新进程
一堆作业的初始化

Linux 下的进程创建

首先来一段简单的进程创建代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


#include <unistd.h>
#include <stdio.h>

int main() {
	pid_t pid;
	int count = 0;
	pid = fork();
	if(pid < 0) {
		fprintf(stderr, "Error: fork() failed!!!");
	}
	else if(pid == 0) {
		printf("Child Process, pid: %d, count: %d, count address: %p\n", getpid(), count, &count);
		count++;
	}
	else {
		printf("Parent Process, pid: %d, count: %d, count address: %p\n", getpid(), count, &count);
	}
	printf("count: %d, count address: %p\n", count, &count);
	return 0;
}

下面是输出结果：

1
2
3
4


Parent Process, pid: 2622, count: 0, count address: 0x7ffe909453d0
count: 0, count address: 0x7ffe909453d0
Child Process, pid: 2623, count: 0, count address: 0x7ffe909453d0
count: 1, count address: 0x7ffe909453d0

从输出结果中可以发现 count 这个变量的地址在子进程和父进程中是完全一样的，但是子进程中改变 count 的值，父进程中 count 的值却并没有变，说明子进程和父进程中的 count 应该不是同一个，那么为什么地址相同，值却不同呢？原来每个进程都有各自独立的逻辑地址，子进程完全复制了父进程，拥有了和父进程完全一样的变量，地址，状态，但是确是在自己所属的那个物理内存中运行的，尽管两个进程长得一模一样，但是实际上已经是相互独立运行的进程了。

进程退出

进程退出的场景

正常退出
错误退出
内存读取错误退出
被其他进程杀死

进程的层次结构

父进程创建子进程，子进程又可以创建新的子进程，形成树状结构。

下面这段代码是循环 fork 的一个例子：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


#include <stdio.h>
#include <unistd.h>

int main() {
        for(int i = 0; i < 3; i++) {
                fork();
        }
        sleep(30);
        return 0;
}

通过 linux 下的 pstree 命令查看进程

1

pstree -p | grep a.out

输出结果：

1
2
3
4


          |-konsole(2517)-+-zsh(2523)---a.out(3769)-+-a.out(3770)-+-a.out(3772)---a.out(3776)
           |               |                         |             `-a.out(3775)
           |               |                         |-a.out(3771)---a.out(3773)
           |               |                         `-a.out(3774)

进程状态

进程状态由下图表示：

总共可分为五个状态，创建，就绪，运行，等待（阻塞），退出，一般会在就绪态，等待态，运行态来回切换。

进程的实现

操作系统通过维护一张进程表 (Process Tabel or Process Control Blocks), 在这张表中记录了进程 id, 寄存器, 栈指针, 进程状态, 以及内存状态, 文件状态等一系列的信息，这张表记录了程序运行在某一时刻的现场，当某一进程要运行时，就通过这张表恢复现场，继续接着上次保存的地方运行。

线程

线程也叫做轻量级进程，与进程类似，也是为了实现并行, 线程运行在进程中，所以线程可以有共享的内存, 共享数据, 这样就避免了进程创建所需要的额外开销，因此线程相对于进程来说更加高效

线程实现

用户级线程

用户级线程由用户（开发者）在自己的程序中模拟出并行的状态

优点：效率高，实现方便
缺点：一旦某个线程阻塞讲导致整个进程阻塞

系统级线程

由操作系统维护一张线程表，和进程类似，只是数据实现了共享

优点：可以真正利用多核 CPU 进行并行计算
缺点：创建线程所需的开销大，因此效率低

线程创建

linux 下的线程创建：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


#include <stdio.h>
#include <pthread.h>

void* thread(void *arg) {
        printf("This is a thread and arg = %d.\n", *(int*)arg);
        *(int*)arg = 0;
        return arg;
}

int main(int argc, char *argv[]) {
        pthread_t th;
        int ret;
        int arg = 10;
        int *thread_ret = NULL;
        ret = pthread_create(&th, NULL, thread, &arg);
        if(ret != 0) {
                printf("Create thread failed!\n");
                return -1;
        }
        printf("This is the main thread.\n");
        pthread_join(th, (void**)&thread_ret);//等待子线程退出
        printf("thread_ret = %d.\n", *thread_ret);
        return 0;
}

输出：

1
2
3


This is the main thread.
This is a thread and arg = 10.
thread_ret = 0.

我们通过子线程改变了父线程中的值，说明线程中的数据是共享的，而不是像进程一样相互独立。

进程同步

当两个进程或线程向同一个共享数据进行操作时，可能会对共享数据产生破坏下面是一段多线程同时操作同一个变量的程序

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

unsigned long value;
unsigned long parent_value;
unsigned long child_value;

void* adder(void *param);

int main() {
        pthread_t tid_adder;

        pthread_create(&tid_adder, NULL, adder, NULL);

        for(;;) {
                value++;
                parent_value++;
                if(value % 1000000 == 0) {
                        printf("V: %lu, P: %lu, C: %lu, ", value, parent_value, child_value);
                        printf("Diff: %lu\n", (parent_value + child_value) - value);
                }
        }
}

void* adder(void *param) {
        for(;;) {
                value++;
                child_value++;
        }
}

这个程序开启了两个线程，在子线程和父线程中都做对 value+1 操作，并且 child_value, parent_value, 在各自线程进程加一操作。

下面是一段输出结果：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


V: 1207000000, P: 850983002, C: 1210082658, Diff: 854065544
V: 1208000000, P: 851687818, C: 1211085473, Diff: 854772993
V: 1209000000, P: 852387897, C: 1212087227, Diff: 855475078
V: 1210000000, P: 853087710, C: 1213091716, Diff: 856178886
V: 1213000000, P: 855189165, C: 1216104669, Diff: 858293784
V: 1214000000, P: 855904389, C: 1217107154, Diff: 859011425
V: 1215000000, P: 856609882, C: 1218111971, Diff: 859721659
V: 1216000000, P: 857319487, C: 1219110929, Diff: 860429838
V: 1217000000, P: 858024567, C: 1220117094, Diff: 861140514
V: 1218000000, P: 858728514, C: 1221118047, Diff: 861846436
V: 1219000000, P: 859434428, C: 1222122718, Diff: 862557001
V: 1220000000, P: 860137072, C: 1223126294, Diff: 863262926

从上面的结果可以看出 (child_value+ parent_value)-value 的值一直在增大，说明 value 的值少加了，分析原因： value++操作可分为：

1
2
3


eax = value
eax += 1
value = eax

如果线程在这三个操作中进行切换是就会导致 value 少加。

要想解决这个问题就要保证当一个线程在对一个可能产生破坏的共享数据（临界区）进行操作时，其他线程不能对其进行操作。在 linux 下的做法是，进入临界区的线程向系统申请锁，离开临界区时释放锁。下面来一段代码验证一下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

unsigned long value;
unsigned long parent_value;
unsigned long child_value;

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void* adder(void *param);

int main() {
        pthread_t tid_adder;

        pthread_create(&tid_adder, NULL, adder, NULL);

        for(;;) {
                pthread_mutex_lock(&mutex);
                value++;
                parent_value++;
                if(value % 1000000 == 0) {
                        printf("V: %lu, P: %lu, C: %lu, ", value, parent_value, child_value);
                        printf("Diff: %lu\n", (parent_value + child_value) - value);
                }
                pthread_mutex_unlock(&mutex);
        }
}

void* adder(void *param) {
        for(;;) {
                pthread_mutex_lock(&mutex);
                value++;
                pthread_mutex_unlock(&mutex);
                child_value++;
        }
}

输出结果：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


V: 2000000, P: 956480, C: 1043519, Diff: 0
V: 4000000, P: 1960595, C: 2039404, Diff: 0
V: 5000000, P: 2462517, C: 2537482, Diff: 0
V: 7000000, P: 3464045, C: 3535954, Diff: 0
V: 8000000, P: 3963956, C: 4036044, Diff: 0
V: 10000000, P: 4971290, C: 5028709, Diff: 0
V: 15000000, P: 7483515, C: 7516484, Diff: 0
V: 16000000, P: 7986805, C: 8013195, Diff: 0
V: 17000000, P: 8486186, C: 8513813, Diff: 0
V: 19000000, P: 9489035, C: 9510964, Diff: 0
V: 21000000, P: 10495956, C: 10504043, Diff: 0
V: 23000000, P: 11496255, C: 11503744, Diff: 0
V: 24000000, P: 11995933, C: 12004066, Diff: 0
V: 25000000, P: 12491340, C: 12508660, Diff: 0
V: 27000000, P: 13495655, C: 13504344, Diff: 0
V: 29000000, P: 14502558, C: 14497441, Diff: 0
V: 30000000, P: 15004959, C: 14995040, Diff: 0
V: 31000000, P: 15502752, C: 15497247, Diff: 0
V: 32000000, P: 16029160, C: 15970839, Diff: 0
V: 34000000, P: 17161598, C: 16838401, Diff: 0
V: 36000000, P: 18290382, C: 17709618, Diff: 0
V: 38000000, P: 19466091, C: 18533908, Diff: 0
V: 39000000, P: 20122410, C: 18877590, Diff: 0
V: 41000000, P: 21454397, C: 19545603, Diff: 0
V: 42000000, P: 22118900, C: 19881099, Diff: 0
V: 44000000, P: 23455632, C: 20544367, Diff: 0
V: 45000000, P: 24118266, C: 20881733, Diff: 0
V: 46000000, P: 24767635, C: 21232364, Diff: 0

从上面的结果可以看出加锁以后，保证临界区不会被破坏，但是在运行这个程序时，明显比不加锁时要慢，说明加锁，会影响并行的效率。

死锁

当两个或两个以上进程因争夺资源而相互等待的状态。

下面是一个死锁代码示例：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48


#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

unsigned long value = 0;

pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;

void* thread1(void *param);
void* thread2(void *param);

int main() {
        int i;
        pthread_t tid1, tid2;

        pthread_create(&tid1, NULL, thread1, NULL);
        pthread_create(&tid2, NULL, thread2, NULL);

        for(;;) {
                printf("Value = %lu\n", value);
                sleep(1);
        }
}

void* thread1(void *param) {
        for(;;) {
                pthread_mutex_lock(&mutex1);
                pthread_mutex_lock(&mutex2);

                value++;

                pthread_mutex_unlock(&mutex1);
                pthread_mutex_unlock(&mutex2);
        }
}

void* thread2(void *param) {
        for(;;) {
                pthread_mutex_lock(&mutex2);
                pthread_mutex_lock(&mutex1);

                value++;

                pthread_mutex_unlock(&mutex2);
                pthread_mutex_unlock(&mutex1);
        }
}

输出结果：

1
2


Value = 219
Value = 481

可能的情况是：线程一得到锁一，线程二得到锁二，线程一在等待锁二，而线程二在等待锁一，造成死锁

死锁发生的条件

互斥条件：指进程对所分配到的资源进行排它性使用，即在一段时间内某资源只由一个进程占用。如果此时还有其它进程请求资源，则请求者只能等待，直至占有资源的进程用毕释放。
请求和保持条件：指进程已经保持至少一个资源，但又提出了新的资源请求，而该资源已被其它进程占有，此时请求进程阻塞，但又对自己已获得的其它资源保持不放。
不剥夺条件：指进程已获得的资源，在未使用完之前，不能被剥夺，只能在使用完时由自己释放。
环路等待条件：指在发生死锁时，必然存在一个进程——资源的环形链，即进程集合{P0，P1，P2，···，Pn}中的 P0 正在等待一个 P1 占用的资源；P1 正在等待 P2 占用的资源，……，Pn 正在等待已被 P0 占用的资源。

解决死锁

预防死锁。
避免死锁。
检测死锁。
解除死锁。

Contents

进程模型

进程创建

导致进程被创建的四个场景

Linux 下的进程创建

进程退出

进程退出的场景

进程的层次结构

进程状态

进程的实现

线程

线程实现

用户级线程

系统级线程

线程创建

进程同步

死锁

死锁发生的条件

解决死锁