在共享内存的多核处理器架构中，比如SMPs(译者注：1)，线程可以用来实现并行。历史上，硬件厂商都已经实现了他们专有的线程版本，对于软件开发者来说，这就产生了可移植性的担心。对于UNIX系统，一个标准的C语言的线程编程接口已经在 IEEE POSIX (译者注：2) 1003.1c标准中详细的规定。我们称那些遵守这个标准的线程实现为POSIX 线程，或者 Pthreads.

本指南从概念，动机和使用Pthreads要考虑的内容介绍开始。Pthreads API 例程中的三个重要类别都会涉及到：线程管理，Mutex 变量，Condition 变量。文中大量的示例代码被用来展示如何使用一个Pthreads新手需要的大多Pthreads例程。结束的时候我们讨论了LLNL规范和如何混合使用MPI和Pthreads. 也包含了一个实验练习和许多的示例代码。

要求：本指南是4+天的"Using LLNL's Supercomputers" workshop 的8个指南中的一个。主要针对那些使用线程来进行并行编程的新手。使用C语言进行并行编程的基本理解是必须的。对于那些不熟悉一般并行编程内容的，可以参考 EC3500: Introduction To Parallel Computing中的材料，这对你有所帮助。

Pthreads总览

什么是线程？

技术上说，一个线程是一个独立的指令流，可以被操作系统调度来运行。但是，这意味着什么呢？
对软件开发者来说，独立于主程序来运行的“过程”(procedure)可能最好描述为线程。
更加深入来说，设想一下一个包含许多过程的主程序（a.out)。然后再想一想像一下所有的这些过程能够被操作系统调度来同时或是独立的运行。这可以描述为多线程程序。
这是怎么来完成的？
在理解线程之前，我们需要先来理解一个UNIX进程。进程由操作系统产生，需要相当数量的”开销“。进程包含许多的程序资源和程序执行状态信息，包括：
- 进程ID，进程组ID，用户ID和组ID
- 环境
- 工作目录
- 程序指令
- 寄存器
- 栈（stack）
- 堆（Heap）
- 文件描述符
- 信号动作(signal actions)
- 共享库
- 线程间通讯工具（比如消息队列，管道，信号量，或是共享内存）

Unix 进程（译者注：3） Unix进程内的多个线程

线程使用和存在于这些进程资源，而且还能被操作系统调度和独立的运行，很大的原因是他们最小化的重复了那些使他们作为可执行代码必须的那些资源。
独立的控制流能够完成是由于线程维护了它们自己的：
- 栈指针(Stack pointer)
- 寄存器(Registers)
- 调度内容（比如策略和优先级）
- 等待和阻塞的信号集合
- 线程规定的数据
所以，总结起来，在UNIX环境中，一个线程：
- 存在于一个进程和使用进程的资源
- 只要他的父进程存在和OS支持，有自己独立的控制流
- 只复制了支持独立调度的必须资源
- 可能与其他并不独立的线程共享进程的资源
- 父进程死掉（其他的相似的情况），线程死掉
- 轻量，因为大多数的”开销“已经在进程创建的时候完成了。
因为在一个相同进程内的线程共享资源：
- 一个线程更改的共享的系统资源（比如关闭的文件）能够被所有其他线程看到
- 有相同值的指针指向相同的数据
- 可能读写相同区域的内存，但是需要程序提供明确的同步

什么是Pthreads?

历史上，硬件生产商实现了他们自己版本的线程。这些实现非常的不同，这就造成了程序员很难开发出兼容不同版本的线程程序。
为了充分的利用线程提供的能力，一个标准的编程接口是必须的。
- 对于UNIX系统来说，这个接口已经在 IEEE POSIX 1003.1c standard (1995)中详细的规定了。
- 遵守了这个标准的线程实现称为POSIX线程，或者Pthreads.
- 现在大多数的硬件厂商除了提供他们转悠的API外，也提供了Pthreads。
- POSIX标准一直在不断地更改和修订中，包括Pthreads的规定
一些有用的链接：
- standards.ieee.org/findstds/standard/1003.1-2008.html
- www.opengroup.org/austin/papers/posix_faq.html
- www.unix.org/version3/ieee_std.html
Pthreads 以C语言类型和过程调用的形式定义，用pthread.h，header/include文件和标准的线程库来实现。这个标准的线程库可能是其他库的一部分，比如libc(译者注：4)。

为什么有Pthreads?

在高性能计算领域，使用Pthreads的主要动机是为了实现潜在的程序性能提高。
当与创建和管理进程的代价相比，一个线程需要更少的操作系统资源“开销”。管理线程需要更少的系统资源。

例如，在下面的表格中比较了 fork() 子程序与pthread_create()子程序的时间花费。结果是新建50,000次进程/线程的时间，用 time工具完成，单位是秒，没有优化选项。

注意：不要期望系统(system)和用户(user)的时间加起来等于真实时间（real time），因为这些系统是在同一时间有多个核心在工作的SMP系统。在最好的情况下，这些是过去和现在在本地机器上的近似。

Platform	`fork()`			`pthread_create()`
Platform	real	user	sys	real	user	sys
Intel 2.6 GHz Xeon E5-2670 (16 cores/node)	8.1	0.1	2.9	0.9	0.2	0.3
Intel 2.8 GHz Xeon 5660 (12 cores/node)	4.4	0.4	4.3	0.7	0.2	0.5
AMD 2.3 GHz Opteron (16 cores/node)	12.5	1.0	12.5	1.2	0.2	1.3
AMD 2.4 GHz Opteron (8 cores/node)	17.6	2.2	15.7	1.4	0.3	1.3
IBM 4.0 GHz POWER6 (8 cpus/node)	9.5	0.6	8.8	1.6	0.1	0.4
IBM 1.9 GHz POWER5 p5-575 (8 cpus/node)	64.2	30.7	27.6	1.7	0.6	1.1
IBM 1.5 GHz POWER4 (8 cpus/node)	104.5	48.6	47.2	2.1	1.0	1.5
INTEL 2.4 GHz Xeon (2 cpus/node)	54.9	1.5	20.8	1.6	0.7	0.9
INTEL 1.4 GHz Itanium2 (4 cpus/node)	54.5	1.1	22.2	2.0	1.2	0.6

在一个进程内不同的线程共享了相同的地址空间。线程间的通讯更加的有效率，在许多情况下，使用线程间通讯也比使用进程间通讯更加简单。
线程程序比非线程程序提供了潜在的性能提高和许多实用的好处：
- 重叠CPU和IO: 例如，一个程序可能有部分在执行长时间的IO操作。在一个线程在等待IO系统调用完成，CPU的紧张工作还是能够被其他线程来完成的。
- 优先级/实时调度：更加重要的任务可以被调度来取代或是终止低优先级的任务。(译者注：5)
- 异步事件控制：不确定频率和持续时间的服务的任务可以交叉。比如，一个web服务器可以同时传输上次请求的数据和管理新的请求的到来。
在SMP架构的系统上使用Pthreads的主要的动机就是提高性能。特别是，当一个程序使用MPI来进行节点的通讯时，使用Pthreads来进行节点的数据传输可能会提高更大的性能。
- MPI通常使用共享内存使用节点任务通讯，这就会使用最小一次的内存复制操作（进程到进程）。(译者注：6)
- 对于Pthreads来说，没有内存复制的必要，因为在一个进程内不得线程共享了相同的地址空间。这里没有数据的传输。这就变成了cache-to-CPU或是memory-to-CPU的带宽情形。这种速度会更高。
- 一些本地的比较如下：

Platform	MPI Shared Memory Bandwidth (GB/sec)	Pthreads Worst Case Memory-to-CPU Bandwidth (GB/sec)
Intel 2.6 GHz Xeon E5-2670	4.5	51.2
Intel 2.8 GHz Xeon 5660	5.6	32
AMD 2.3 GHz Opteron	1.8	5.3
AMD 2.4 GHz Opteron	1.2	5.3
IBM 1.9 GHz POWER5 p5-575	4.1	16
IBM 1.5 GHz POWER4	2.1	4
Intel 2.4 GHz Xeon	0.3	4.3
Intel 1.4 GHz Itanium 2	1.8	6.4

Pthread能被串行程序使用，来模拟并行执行或是利用空闲的周期。
一个好例子就是典型的web浏览器，在一个cpu的桌面上运行。许多事情看起来是同时发生的。
许多其他的串行的程序和操作系统使用线程。比如下面的MS的Windows操作系统和许多程序使用线程。

设计线程程序

并行编程

在现代的，多核机器中，Pthreads是对并行编程来说，非常的合适，而且一般来说任何适用于并行编程的，也适用于Pthreads程序。
在设计并行程序的时候有许多的考虑，比如：
- 使用什么样类型的并行编程模型？
- 问题划分
- 负载均衡
- 通讯
- 数据依赖
- 同步和竞争条件
- 内存问题
- I/O 问题
- 程序复杂度
- 程序员的努力，花费和时间
涉及所有这些超出了本指南的范围，但是感兴趣的同学可以参考 Introduction to Parallel Computing指南。
然而一般来说，为了使程序利用Pthreads的有点，必须能够将任务分成离散的，独立的可以并发执行的任务。比如，如果事物1和事物2能够实时的交换，交叉或是重叠，它们就是线程的候选。

有以下特点的程序可能会很适合Pthreads:
- 执行的工作或是操作的数据被几个任务同时执行
- 潜在的长I/O等待阻塞
- 在一些地方使用了很多的CPU周期，而在另外的地方则没有
- 必须响应异步事件
- 一些工作可能比另外一些工作重要(优先级中断)
一些常见的线程程序模型：
- 管理者工作者线程。典型的，管理者控制着所有的输入而且分配工作给其他任务。至少有两种管理者/生产者模式是普遍的：静态工作者库和动态生产者库。
- 流水线(Pipeline)：一个任务被划分为一系列的子过程，每一个过程在流水线上由不同的线程并发的执行。汽车装配线很好的描述了这种模型。
- Peer: 与管理者/工作者模型类似，但是在主线程创建了其他线程后，它也参与工作。

共享内存模型

所有线程访问相同的全局，共享内存
线程也有它们自己私有的数据
程序员必须负责控制同步访问全局共享的数据

线程安全：

线程安全：简而言之，指的是一个程序有能力执行多线程程序而不出现破坏(clobbering)共享数据或是产生竞争条件。
例如：你的程序新建了几个线程，每个都调用了相同的库函数：
- 这个库过程修改全局结构或内存中的位置
- 由于每个线程调用这个过程，可能他们会同时修改这个全局的结构或是内存位置
- 如果函数不采用某种同步机制来防止数据“破坏”，他不是线程安全

外部库过程对用户意味着如果你不百分之百确定这个过程是线程安全的，可能就会遇到问题。
推荐：如果你的程序使用不能保证线程安全的库或是对象，要千万小心。当你有疑问的时候，假设它们不是线程安全的，直到你能证明它是线程安全的。这可以通过序列化对这些不确定的过程的调用（译者注:7）

线程的局限

尽管Pthreads API 是ANSI/IEEE标准，实现通常都是不同的，没有被标准规定。
正因如此，一个在一个平台上运行很好的程序，可能在另一个平台上失败或是产生错误的结果。例如，允许的最大线程号，默认的线程堆栈长度是设计程序时两个最重要的限制。
几个线程限制会在指南后面有更详细的讨论。

Pthreads API

Pthreads API最早在 ANSI/IEEE POSIX 1003.1 - 1995标准中定义。POSIX标准一直在更新和升级版本，包括Pthreads 规范。
标准的拷贝可以从IEEE购买或是从其他的网站免费下载。
Pthreads API涉及到的过程可以大致的分为四个主要的组别：
- 线程管理：与线程直接相关的工作-新建（creating），分离（detaching），等待（joining）等。它们还包括设置和查询线程属性（可等待，调度等）的函数
- Mutexes：处理同步的过程称为"mutex"，是"mutual exclusion"的缩写。Mutex函数提供了新建，销毁，加锁和解锁 "mutexs"。另外还有一些可以设置或是修改互斥变量的属性。
- Condition 变量：处理共享mutex的线程之间通讯的过程。基于程序员规定的条件。这一组包含了基于规定的变量值新建，销毁，等待和信号量。同时也包含了设置和查询condition变量属性的函数
- 同步：管理读写锁和障碍（barriers）的过程
命名传统：在线程库中的所有变量的id都是以pthread_开头。下面是一些例子：

Routine Prefix	Functional Group
pthread_	Threads themselves and miscellaneous subroutines
pthread_attr_	Thread attributes objects
pthread_mutex_	Mutexes
pthread_mutexattr_	Mutex attributes objects.
pthread_cond_	Condition variables
pthread_condattr_	Condition attributes objects
pthread_key_	Thread-specific data keys
pthread_rwlock_	Read/write locks
pthread_barrier_	Synchronization barriers

不透明对象的概念遍及API的设计。基本的调用用来新建或是修改不透明对象，不透明对象可以调用属性函数来修改。
Pthreads API包含大约100个子过程。本指南只关注其中的一部分-特别是那些对刚刚开始Pthreads程序员直接有用的部分。
为了兼容性，使用Pthreads库源码中必须包含pthread.h头文件。
现在的POSIX标准只是为C语言定义的。Fortran程序员可以使用对C函数调用的封装。一些Fortran编译器（比如IBM AIX Fortran）可能提供了一个Fortran 线程 API。
有许多关于Pthreads的书。其中的一些在引用部分列出。

编译线程程序

下面的表格列出了几个编译使用pthreads的代码命令的例子：

Compiler / Platform	Compiler Command	Description
INTEL Linux	`icc -pthread`	C
INTEL Linux	`icpc -pthread`	C++
PGI Linux	`pgcc -lpthread`	C
PGI Linux	`pgCC -lpthread`	C++
GNU Linux, Blue Gene	`gcc -pthread`	GNU C
GNU Linux, Blue Gene	`g++ -pthread`	GNU C++
IBM Blue Gene	`bgxlc_r / bgcc_r`	C (ANSI / non-ANSI)
	`bgxlC_r, bgxlc++_r`	C++

线程管理

创建和终止线程

子过程：

pthread_create (thread,attr,start_routine,arg)
 
pthread_exit (status)
 
pthread_cancel (thread)
 
pthread_attr_init (attr)
 
pthread_attr_destroy (attr)

新建线程

初始的时候，你的main()程序只包含一个默认的线程，其他的线程必须由程序员显示的创建。
pthread_create 新建一个线程并使它执行。这个子过程可以在你的代码中被多次的使用。
pthread_create的参数：
- thread: 新建线程的标识符
- attr:一个不透明的可以设置线程属性的对象。你可以明确一个属性对象或是默认设置为NULL。
- start_routine:线程创建后要执行的函数
- arg:传递给start_routine函数的参数，它必须做一个void指针类型的转换，当没有参数时可以设置为NULL。
一个进程能够创建的最大线程数目与具体的实现有关。试图超过这个最大的数目可能会出错或是结果错误。
设置和检索你使用的线程的限制-列出了linux的例子。检索默认的（软）限制，然后设置进程（包括线程）到最大硬限制，然后确认那个限制已经被覆盖了。

bash / ksh / sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40	`$` `ulimit` `-a` `core` `file` `size (blocks, -c) 16` `data seg size (kbytes, -d) unlimited` `scheduling priority (-e) 0` `file` `size (blocks, -f) unlimited` `pending signals (-i) 255956` `max locked memory (kbytes, -l) 64` `max memory size (kbytes, -m) unlimited` `open` `files (-n) 1024` `pipe size (512 bytes, -p) 8` `POSIX message queues (bytes, -q) 819200` `real-time` `priority (-r) 0` `stack size (kbytes, -s) unlimited` `cpu` `time` `(seconds, -t) unlimited` `max user processes (-u) 1024` `virtual memory (kbytes, -v) unlimited` `file` `locks (-x) unlimited` `$` `ulimit` `-Hu` `7168` `$` `ulimit` `-u 7168` `$` `ulimit` `-a` `core` `file` `size (blocks, -c) 16` `data seg size (kbytes, -d) unlimited` `scheduling priority (-e) 0` `file` `size (blocks, -f) unlimited` `pending signals (-i) 255956` `max locked memory (kbytes, -l) 64` `max memory size (kbytes, -m) unlimited` `open` `files (-n) 1024` `pipe size (512 bytes, -p) 8` `POSIX message queues (bytes, -q) 819200` `real-time` `priority (-r) 0` `stack size (kbytes, -s) unlimited` `cpu` `time` `(seconds, -t) unlimited` `max user processes (-u) 7168` `virtual memory (kbytes, -v) unlimited` `file` `locks (-x) unlimited`

tcsh / csh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25	`% limit` `cputime unlimited` `filesize unlimited` `datasize unlimited` `stacksize unlimited` `coredumpsize 16 kbytes` `memoryuse unlimited` `vmemoryuse unlimited` `descriptors 1024` `memorylocked 64 kbytes` `maxproc 1024` `% limit maxproc unlimited` `% limit` `cputime unlimited` `filesize unlimited` `datasize unlimited` `stacksize unlimited` `coredumpsize 16 kbytes` `memoryuse unlimited` `vmemoryuse unlimited` `descriptors 1024` `memorylocked 64 kbytes` `maxproc 7168`

线程一旦被创建，线程都是平等的，可以创建新的线程。这些线程之间没有层次，也没有依赖。

线程属性

默认的，线程创建的时候可以带有某些属性。这些属性可以通过线程属性对象来更改。
pthread_attr_init和pthread_attr_destroy用来新建和销毁线程属性对象。
其他的子过程可用用来设置和检索线程属性对象中的属性。这些属性包括：
- 分离和可结合的状态
- 调度继承
- 调度优先级
- 调度参数
- 调度竞争范围
- 栈大小
- 栈地址
- 栈溢出大小
其他的一些属性会在以后讨论

线程绑定和调度

问题：

线程创建之后，你怎么知道：a)它什么时候被操作系统调度执行；b)他会在哪个核心上执行？

答案：除非你使用Pthreads的调度机制，否则，什么时候和怎么执行线程取决于操作系统。一个健壮的程序不应该依赖于线程在某个核心上具体的执行顺序。

Pthread提供了多个函数来确认线程怎么来调度执行。比如：线程的调度策略可以设置为：FIFO(First-In First-Out), RR(Round-Robin), OTHER. 当然也可以设置调度优先级的值。
这些主题在这里没有涉及到，不过在linux下“事情怎么工作的”可以在sched_setscheduler的man手册找到。
另外，操作系统可能也提供了一些方式来做这些。比如linux的sched_setaffinity函数。

线程终止和pthread_exit()

下面几种方式线程会终止：
- 线程从它开始的过程返回。它的工作完成了。
- 线程调用pthread_exit()，这时候不知道它的工作是否完成。
- 线程有其他的线程由pthread_cancle()取消。
- 线程所属的的进程由exec()或是exit()终止。
- main()首先完成，不用显示的调用pthread_exit()函数。
pthread_exit()允许程序员设置一个可选的终止status参数。这个可选的参数返回到连接（joining）终止线程的线程。
正常结束的过程，你可以不使用pthread_exit()，除非你想设置终止状态码。
清理：pthread_exit()并不会关闭线程打开的文件，这些文件在线程在线程结束后依然是打开的（当然除了你已经显示的关闭它们了）。
关于从main()中调用pthread_exit()过程：
- 如果在线程完成之前，没有调用pthread_exit()过程，main()完成，则所有的由它创建的线程都将结束，因为支持线程的资源不存在了。
- 如果显示的调用了pthread_exit()过程，那么main()将会阻塞知道线程完成。

例子：pthread新建和销毁

下面这个例子，用pthread_create()新建了5个线程，打印"Hello World！"信息，然后用pthread_exit()来终止。

xample Code - Pthread Creation and Termination
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29	`#include <pthread.h>` `#include <stdio.h>` `#define NUM_THREADS 5` `void` `PrintHello(void` `threadid)` `{` `long` `tid;` `tid = (long)threadid;` `printf("Hello World! It's me, thread #%ld!\n", tid);` `pthread_exit(NULL);` `}` `int` `main (int` `argc,` `char` `argv[])` `{` `pthread_t threads[NUM_THREADS];` `int` `rc;` `long` `t;` `for(t=0; t<NUM_THREADS; t++){` `printf("In main: creating thread %ld\n", t);` `rc = pthread_create(&threads[t], NULL, PrintHello, (void` `)t);` `if` `(rc){` `printf("ERROR; return code from pthread_create() is %d\n", rc);` `exit(-1);` `}` `}` `/* Last thing that main() should do */` `pthread_exit(NULL);` `}`

给线程传递参数

pthread_create()允许传递一个参数给线程启动的函数。比如你可能会考虑传递多个参数的情形，这可以通过传递一个包含所有参数的结构体指针来完成。
所有的参数通过指针完成和通过(void *)强制转换来完成。
问题：在不明确线程的启动和调度的情况下，如何给新建的线程传递数据？
答：保证所有传递的数据都是线程安全的（不能由其他线程来修改）。下面的例子说明了什么可以做，什么不可以做。

Example 2 - Thread Argument Passing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28	`struct` `thread_data{` `int` `thread_id;` `int` `sum;` `char` `message;` `};` `struct` `thread_data thread_data_array[NUM_THREADS];` `void` `PrintHello(void` `threadarg)` `{` `struct` `thread_data my_data;` `...` `my_data = (struct` `thread_data ) threadarg;` `taskid = my_data->thread_id;` `sum = my_data->sum;` `hello_msg = my_data->message;` `...` `}` `int` `main (int` `argc,` `char` `argv[])` `{` `...` `thread_data_array[t].thread_id = t;` `thread_data_array[t].sum = sum;` `thread_data_array[t].message = messages[t];` `rc = pthread_create(&threads[t], NULL, PrintHello,` `(void` `*) &thread_data_array[t]);` `...`

Example 1 - Thread Argument Passing
1 2 3 4 5 6 7 8 9 10	`}long` `taskids[NUM_THREADS];` `for(t=0; t<NUM_THREADS; t++)` `{` `taskids[t] = (long` `)` `malloc(sizeof(long));` `taskids[t] = t;` `printf("Creating thread %ld\n", t);` `rc = pthread_create(&threads[t], NULL, PrintHello, (void` `) taskids[t]);` `...` `}`

Example 3 - Thread Argument Passing (Incorrect)
1 2 3 4 5 6 7 8 9	`int` `rc;` `long` `t;` `for(t=0; t<NUM_THREADS; t++)` `{` `printf("Creating thread %ld\n", t);` `rc = pthread_create(&threads[t], NULL, PrintHello, (void` `*) &t);` `...` `}`

加入和分离线程（Joining and Detaching Threads）

过程（rountines）：

pthread_join (threadid,status)
 
pthread_detach (threadid)
 
pthread_attr_setdetachstate (attr,detachstate)
 
pthread_attr_getdetachstate (attr,detachstate)

连接（joining）：

“Joining”是线程之间完成同步的一种方式，例如：
pthread_join()会阻塞调用的线程，直到指定的threadid线程终止。
如果目标线程中调用了pthread_exit()，程序员可以在目标线程中获得线程返回的状态
一个连接(joining)的线程只会匹配一个Pthread_join()，匹配多个会有逻辑错误。
其他的两种同步方式：mutexes和Condition变量会在后面讨论。

连接与否？

当一个线程创建的时候，其中的一个属性决定了它是可以连接还是分离的。只有那些创建时候是可以连接的，才能够连接。一个创建是分离的线程，以后不能够被连接。
POSIX标准规定一个线程创建时候必须是可以连接的。
可以使用pthread_create()函数的attr参数来显示的确定线程是可以连接的还是分离的。典型的4步是：
- 声明一个pthread_attr_t数据类型
- pthread_attr_init()来初始化属性变量
- pthread_attr_setdetachstate()设置属性的分离状态
- 完成以后， pthread_attr_destroy()来释放线程占用的资源

分离（Detaching）

pthread_detach()可以将一个创建是可以连接的线程，转化到分离状态
没有相反的过程

堆栈管理

过程

pthread_attr_getstacksize (attr, stacksize)
 
pthread_attr_setstacksize (attr, stacksize)
 
pthread_attr_getstackaddr (attr, stackaddr)
 
pthread_attr_setstackaddr (attr, stackaddr)

防止堆栈问题

POSIX并不主宰堆栈的大小，这与具体的实现有关。
堆栈溢出的情况经常发生，通常的结果是：程序终止和/或者数据损坏
健壮和兼容性好的程序不依赖于堆栈大小的限制，而是使用pthread_attr_setstacksize过程来显示的分配堆栈大小
pthread_attr_getstackaddr 和 pthread_attr_setstackaddr过程可以在线程使用的堆栈必须来某个地方的情况。

LC的一些具体的例子

默认的线程堆栈大小，可以使用的堆栈的大小，这些可能会非常的不同，这可能会依赖于某个节点具体的线程数目
下面展示了现在和过去不同架构，默认堆栈大小的不同

Node Architecture	#CPUs	Memory (GB)	Default Size (bytes)
Intel Xeon E5-2670	16	32	2,097,152
Intel Xeon 5660	12	24	2,097,152
AMD Opteron	8	16	2,097,152
Intel IA64	4	8	33,554,432
Intel IA32	2	4	2,097,152
IBM Power5	8	32	196,608
IBM Power4	8	16	196,608
IBM Power3	16	16	98,304

Example Code - Stack Management
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48	`#include <pthread.h>` `#include <stdio.h>` `#define NTHREADS 4` `#define N 1000` `#define MEGEXTRA 1000000` `pthread_attr_t attr;` `void` `dowork(void` `threadid)` `{` `double` `A[N][N];` `int` `i,j;` `long` `tid;` `size_t` `mystacksize;` `tid = (long)threadid;` `pthread_attr_getstacksize (&attr, &mystacksize);` `printf("Thread %ld: stack size = %li bytes \n", tid, mystacksize);` `for` `(i=0; i<N; i++)` `for` `(j=0; j<N; j++)` `A[i][j] = ((ij)/3.452) + (N-i);` `pthread_exit(NULL);` `}` `int` `main(int` `argc,` `char` `argv[])` `{` `pthread_t threads[NTHREADS];` `size_t` `stacksize;` `int` `rc;` `long` `t;` `pthread_attr_init(&attr);` `pthread_attr_getstacksize (&attr, &stacksize);` `printf("Default stack size = %li\n", stacksize);` `stacksize =` `sizeof(double)NN+MEGEXTRA;` `printf("Amount of stack needed per thread = %li\n",stacksize);` `pthread_attr_setstacksize (&attr, stacksize);` `printf("Creating threads with stack size = %li bytes\n",stacksize);` `for(t=0; t<NTHREADS; t++){` `rc = pthread_create(&threads[t], &attr, dowork, (void` `*)t);` `if` `(rc){` `printf("ERROR; return code from pthread_create() is %d\n", rc);` `exit(-1);` `}` `}` `printf("Created %ld threads.\n", t);` `pthread_exit(NULL);` `}`

杂项例程

1 2	`pthread_self ()` `pthread_equal (thread1,thread2)`

pthread_self返回调用线程，唯一的，系统分配的ID
pthread_equal比较两个线程ID，如果不同，返回0；如果相同，返回1
注意这两个过程，线程ID对象是一个黑盒，不能很容易的观察。因为线程ID是黑盒，所以不能使用C语言的==来比较两个线程ID对象，也不能把线程ID对象和其他的值进行比较。

1	`pthread_once (once_control, init_routine)`

在一个线程里面，pthread_once只执行init_routine过程一次。在进程里面，首先调用pthread_once的会执行init_routine,不用给出任何参数。后面调用该过程没有任何影响
init_routine是一个典型的初始化过程
once_control是一个同步的控制结构，在调用init_routine之前要先初始化once_control，例如：

1	`pthread_once_t once_control = PTHREAD_ONCE_INIT;`

互斥变量

总览

Mutex是"mutual exclusion"的简称。互斥变量是实现线程同步和多个线程写时，保护共享数据的一个主要的方式。
Mutex扮演锁的角色，能够保护对共享资源的访问。Pthread中的互斥变量的一个基本原理就是在给出的任意一个时间点，只有一个线程能够锁（或是拥有）一个互斥变量。这样，即使有多个试图锁一个互斥变量时候，只有一个线程才能成功。只有等到锁互斥信号的线程释放互斥信号后，其他的线程才能拥有。线程排队访问受保护的数据。

Mutex可以用来保护竞争条件。下面是一个银行事务的竞争条件例子：

Thread 1	Thread 2	Balance
Read balance: $1000		$1000
	Read balance: $1000	$1000
	Deposit $200	$1000
Deposit $200		$1000
Update balance $1000+$ 200		$1200
	Update balance $1000+$ 200	$1200

在上面的例子中，互斥变量用来锁"Balance"变量，当其他的线程使用共享的数据资源
拥有互斥变量的线程执行的最常见的动作就是更新变量。一个安全的措施是当多个线程更改相同的变量的时候，最后的结果与只有一个线程更改变量时的效果相同。更改的变量属于关键区（critical section）。
使用互斥变量的典型过程如下:
- 创建和初始化互斥变量
- 几个线程试图锁互斥变量
- 只有一个线程成功和那个线程拥有互斥变量
- 拥有者的线程完成一组动作
- 所有者释放互斥变量
- 另一个线程获得互斥变量和重复过程
- 最后销毁互斥变量
当几个线程竞争一个互斥变量时，失败者会在调用过程中阻塞。
当保护共享变量时，程序员需要负责确保每一个线程都使用互斥变量来保护共享变量。比如，4个线程试图更改变量，只有一个使用互斥变量，数据还是可能被破坏。

新建和销毁互斥变量

过程

pthread_mutex_init (mutex,attr)
pthread_mutex_destroy (mutex)
pthread_mutexattr_init (attr)
pthread_mutexattr_destroy (attr)

互斥变量的类型是pthread_mutex_t，在使用之前必须初始化。有两种初始化互斥变量的方式：
- 静态的，声明的时候初始化。pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER;
- 动态的，使用过程pthread_mutex_init() ，这个过程允许设置互斥变量的属性。
初始化时，互斥变量是没有锁的。
attr对象用来为互斥变量对象确定内容，如果使用的话，必须是pthread_mutexattr_t。 pthreads定义了三种可选的属性：
- 协议：明确协议用来防止互斥变量的优先级倒置；
- 优先级上界：明确一个优先级的互斥变量的上界；
- 进程共享：明确共享互斥变量的线程；
注意可能不是所有的实现都会提供这三种可选的互斥变量属性。
pthread_mutexattr_init()和 pthread_mutexattr_destroy()分别用来新建和销毁互斥变量的属性对象。
pthread_mutex_destroy()用来释放一个不再使用的互斥变量。

锁和解锁互斥变量

过程

pthread_mutex_lock (mutex)
pthread_mutex_trylock (mutex)
pthread_mutex_unlock (mutex)

pthread_mutex_lock()是线程用来在一个互斥变量上获取一个锁；如果互斥变量已经被一个线程锁定，调用的线程会被阻塞，直到互斥变量的锁被释放；
pthread_mutex_trylock()试图锁定一个互斥变量；如果互斥变量已经被锁定，那么该线程会立刻返回一个”busy“错误。在优先级倒置中，这个过程可以用来防止死锁条件。
pthread_mutex_unlock()如果锁拥有的线程调用，可以解锁互斥变量；当线程完成了自己的工作或是其他的线程要获取互斥变量来使用受保护的数据时，需要调用这个过程。下面的两种情况会返回错误：
- 如果一个互斥变量已经解锁了；
- 如果一个互斥变量被其他进程拥有；
互斥变量没有任何神奇的地方；实际上它是参与线程之间的一个“君子协定”。需要程序员来保证互斥变量被正确的确定和解锁。下面的场景是一个逻辑错误:

 Thread 1     Thread 2     Thread 3
    Lock         Lock         
    A = 2        A = A+1      A = A*B
    Unlock       Unlock

Example Code - Using Mutexes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139	`#include <pthread.h>` `#include <stdio.h>` `#include <stdlib.h>` `/` `The following structure contains the necessary information` `to allow the function "dotprod" to access its input data and` `place its output into the structure.` `/` `typedef` `struct` `{` `double` `a;` `double` `b;` `double` `sum;` `int` `veclen;` `} DOTDATA;` `/* Define globally accessible variables and a mutex /` `#define NUMTHRDS 4` `#define VECLEN 100` `DOTDATA dotstr;` `pthread_t callThd[NUMTHRDS];` `pthread_mutex_t mutexsum;` `/` `The function dotprod is activated when the thread is created.` `All input to this routine is obtained from a structure` `of type DOTDATA and all output from this function is written into` `this structure. The benefit of this approach is apparent for the` `multi-threaded program: when a thread is created we pass a single` `argument to the activated function - typically this argument` `is a thread number. All the other information required by the` `function is accessed from the globally accessible structure.` `/` `void` `dotprod(void` `arg)` `{` `/ Define and use local variables for convenience /` `int` `i, start, end, len ;` `long` `offset;` `double` `mysum, x, y;` `offset = (long)arg;` `len = dotstr.veclen;` `start = offsetlen;` `end = start + len;` `x = dotstr.a;` `y = dotstr.b;` `/` `Perform the dot product and assign result` `to the appropriate variable in the structure.` `/` `mysum = 0;` `for` `(i=start; i<end ; i++)` `{` `mysum += (x[i] * y[i]);` `}` `/` `Lock a mutex prior to updating the value in the shared` `structure, and unlock it upon updating.` `/` `pthread_mutex_lock (&mutexsum);` `dotstr.sum += mysum;` `pthread_mutex_unlock (&mutexsum);` `pthread_exit((void) 0);` `}` `/` `The main program creates threads which do all the work and then` `print out result upon completion. Before creating the threads,` `the input data is created. Since all threads update a shared structure,` `we need a mutex for mutual exclusion. The main thread needs to wait for` `all threads to complete, it waits for each one of the threads. We specify` `a thread attribute value that allow the main thread to join with the` `threads it creates. Note also that we free up handles when they are` `no longer needed.` `/` `int` `main (int` `argc,` `char` `argv[])` `{` `long` `i;` `double` `a, b;` `void` `status;` `pthread_attr_t attr;` `/ Assign storage and initialize values /` `a = (double)` `malloc` `(NUMTHRDSVECLENsizeof(double));` `b = (double)` `malloc` `(NUMTHRDSVECLENsizeof(double));` `for` `(i=0; i<VECLENNUMTHRDS; i++)` `{` `a[i]=1.0;` `b[i]=a[i];` `}` `dotstr.veclen = VECLEN;` `dotstr.a = a;` `dotstr.b = b;` `dotstr.sum=0;` `pthread_mutex_init(&mutexsum, NULL);` `/* Create threads to perform the dotproduct /` `pthread_attr_init(&attr);` `pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);` `for(i=0; i<NUMTHRDS; i++)` `{` `/` `Each thread works on a different set of data.` `The offset is specified by 'i'. The size of` `the data for each thread is indicated by VECLEN.` `/` `pthread_create(&callThd[i], &attr, dotprod, (void` `)i);` `}` `pthread_attr_destroy(&attr);` `/* Wait on the other threads /` `for(i=0; i<NUMTHRDS; i++)` `{` `pthread_join(callThd[i], &status);` `}` `/ After joining, print out the results and cleanup */` `printf` `("Sum = %f \n", dotstr.sum);` `free` `(a);` `free` `(b);` `pthread_mutex_destroy(&mutexsum);` `pthread_exit(NULL);` `}`

条件变量

总览

条件变量为线程提供了另一种方式来进行同步；互斥变量通过控制线程访问数据来进行同步，条件变量可以允许根据数据实际的值来进行同步；
没有条件变量，程序员可能需要线程不断地“轮询”来检查条件是否满足；在实际中线程会一直在不停的工作，这可能会很消耗资源。条件变量时不使用“轮询”来达到相同的目的。
条件变量总是与互斥变量锁结合使用的。
使用条件变量的代表性的顺序如下：

Main Thread Declare and initialize global data/variables which require synchronization (such as "count") Declare and initialize a condition variable object Declare and initialize an associated mutex Create threads A and B to do work
Thread A Do work up to the point where a certain condition must occur (such as "count" must reach a specified value) Lock associated mutex and check value of a global variable Call `pthread_cond_wait()` to perform a blocking wait for signal from Thread-B. Note that a call to `pthread_cond_wait()` automatically and atomically unlocks the associated mutex variable so that it can be used by Thread-B. When signalled, wake up. Mutex is automatically and atomically locked. Explicitly unlock mutex Continue	Thread B Do work Lock associated mutex Change the value of the global variable that Thread-A is waiting upon. Check value of the global Thread-A wait variable. If it fulfills the desired condition, signal Thread-A. Unlock mutex. Continue
Main Thread Join / Continue

创建和销毁条件变量

pthread_cond_init (condition,attr)
 
pthread_cond_destroy (condition)
 
pthread_condattr_init (attr)
 
pthread_condattr_destroy (attr)

条件变量用pthread_cond_t来声明，在使用之前必须被初始化；这里有两种方式来初始化条件变量：
- 静态的，当声明时初始化。例如：pthread_cond_t myconvar = PTHREAD_COND_INITIALIZER;
- 动态的，使用pthread_cond_init()过程。创建的条件变量的ID通过condition参数可以返回给调用的线程；这个方法允许设置条件变量的属性。
可选的attr可以用来设置条件变量的属性；条件变量只有一个属性： process-shared，允许条件变量能被其他进程中的线程看到。属性对象的类型必须是pthread_condattr_t（可以被设置为NULL）
注意不是所有的实现提供process-shared属性。
pthread_condattr_init() 和pthread_condattr_destroy()用来创建和销毁条件变量的属性对象；
pthread_cond_destroy()用来释放不在使用的条件变量对象

等待和给条件变量发信号

过程

pthread_cond_wait (condition,mutex)
pthread_cond_signal (condition)
pthread_cond_broadcast (condition)

pthread_cond_wait()会阻塞调用的线程直到指定的条件变量被信号激活。这个过程应该在互斥变量锁的时候调用，当它等待的时候会自动的释放互斥信号。当信号被收到或是线程被唤醒，互斥信号会自动的被锁定，来让线程使用。当线程完成时，程序员负责解锁互斥信号。
pthread_cond_signal() 用来唤醒在条件变量上等待的另一个线程；它应该在互斥变量被锁定以后调用，为了pthread_cond_wait()完成，必须解锁互斥变量。
当有多个线程阻塞等待状态是，pthread_cond_broadcast()应该被使用，而不是 pthread_cond_signal()
在调用pthread_cond_wait()之前调用pthread_cond_signal()是一个逻辑错误。
当使用一些过程的时候，锁定和解锁互斥变量时必须的。例如：
- 在调用 pthread_cond_wait() 之前锁定互斥变量失败，会造成它不被阻塞；
- 在调用pthread_cond_signal()之前不释放锁定的互斥变量，可能会造成匹配的 pthread_cond_wait()过程不能完成；

Example Code - Using Condition Variables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96	`#include <pthread.h>` `#include <stdio.h>` `#include <stdlib.h>` `#define NUM_THREADS 3` `#define TCOUNT 10` `#define COUNT_LIMIT 12` `int` `count = 0;` `int` `thread_ids[3] = {0,1,2};` `pthread_mutex_t count_mutex;` `pthread_cond_t count_threshold_cv;` `void` `inc_count(void` `t)` `{` `int` `i;` `long` `my_id = (long)t;` `for` `(i=0; i<TCOUNT; i++) {` `pthread_mutex_lock(&count_mutex);` `count++;` `/` `Check the value of count and signal waiting thread when condition is` `reached. Note that this occurs while mutex is locked.` `/` `if` `(count == COUNT_LIMIT) {` `pthread_cond_signal(&count_threshold_cv);` `printf("inc_count(): thread %ld, count = %d Threshold reached.\n",` `my_id, count);` `}` `printf("inc_count(): thread %ld, count = %d, unlocking mutex\n",` `my_id, count);` `pthread_mutex_unlock(&count_mutex);` `/* Do some "work" so threads can alternate on mutex lock /` `sleep(1);` `}` `pthread_exit(NULL);` `}` `void` `watch_count(void` `t)` `{` `long` `my_id = (long)t;` `printf("Starting watch_count(): thread %ld\n", my_id);` `/` `Lock mutex and wait for signal. Note that the pthread_cond_wait` `routine will automatically and atomically unlock mutex while it waits.` `Also, note that if COUNT_LIMIT is reached before this routine is run by` `the waiting thread, the loop will be skipped to prevent pthread_cond_wait` `from never returning.` `/` `pthread_mutex_lock(&count_mutex);` `while` `(count<COUNT_LIMIT) {` `pthread_cond_wait(&count_threshold_cv, &count_mutex);` `printf("watch_count(): thread %ld Condition signal received.\n", my_id);` `count += 125;` `printf("watch_count(): thread %ld count now = %d.\n", my_id, count);` `}` `pthread_mutex_unlock(&count_mutex);` `pthread_exit(NULL);` `}` `int` `main (int` `argc,` `char` `argv[])` `{` `int` `i, rc;` `long` `t1=1, t2=2, t3=3;` `pthread_t threads[3];` `pthread_attr_t attr;` `/* Initialize mutex and condition variable objects /` `pthread_mutex_init(&count_mutex, NULL);` `pthread_cond_init (&count_threshold_cv, NULL);` `/ For portability, explicitly create threads in a joinable state /` `pthread_attr_init(&attr);` `pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);` `pthread_create(&threads[0], &attr, watch_count, (void` `)t1);` `pthread_create(&threads[1], &attr, inc_count, (void` `)t2);` `pthread_create(&threads[2], &attr, inc_count, (void` `)t3);` `/* Wait for all threads to complete /` `for` `(i=0; i<NUM_THREADS; i++) {` `pthread_join(threads[i], NULL);` `}` `printf` `("Main(): Waited on %d threads. Done.\n", NUM_THREADS);` `/ Clean up and exit */` `pthread_attr_destroy(&attr);` `pthread_mutex_destroy(&count_mutex);` `pthread_cond_destroy(&count_threshold_cv);` `pthread_exit(NULL);` `}`

译者注：

(1)SMP: 对称多处理（Symmetric multiprocessing，缩写为 SMP），也译为均衡多处理、对称性多重处理，是一种多处理器的电脑硬件架构，在对称多处理架构下，每个处理器的地位都是平等的，对资源的使用权限相同。

(2)POSIX: 可移植操作系统接口（Portable Operating System Interface），是IEEE为要在各种UNIX操作系统上运行的软件，而定义API的一系列互相关联的标准的总称，其正式称呼为IEEE 1003，而国际标准名称为ISO/IEC 9945。OSX 完全兼容，Linux兼容大多数，但没有通过认证。

(3)Linux的线程在内存中的布局可能不是这个样子的，更加详细的情况，可以参考下面这张图：

（4）在linux中，pthreads在glibc中实现，nptl目录中可以找到相关的代码。

（5）前段时间调过一个OJ程序，后台有一个监控线程，利用优先级，他可以终止其他的线程操作。

（6）MPI(Message Passing Interface)与Pthreads是为了不同类型的并行系统编程设计的，前者主要用在分布式的内存系统，后者用在共享内存系统。

（7）由于我自己的多线程编程经验不是很多，听室友说过一个他们写的基于hadoop平台的程序的一个bug：他们的程序出现了诡异的错误，然后他们认为可能是hadoop的代码问题，然后去该代码，最后才发现他们使用了线程不安全的方法。看来线程安全是很重要的呀。。。

(8)临界区（critical section）是修改共享的资源的代码区段，在并发程序中，只要一个进程（线程）能够进入临界区。

[回复]

JDC Result Comilla 说:
2022年8月27日 22:00

Comilla board is another education board working under Secondary and Higher Secondary Education, Bangladesh, and the education board is also successfully completed the Grade 8 terminal examinations at all selected examination test centers at Comilla division, and the Junior School Certificate and Junior Dakhil Certificate terminal examination is a second largest examination in the country. JDC Result Comilla The Comilla Board is also completed those subject wise exams between 1st to 3rd week of November as per schedule announced by School Education department, and there are a huge number of students are participated like as all other educational boards from all districts of the division, right now they are waiting to check JSC & JDC Result 2022 Comilla Board with subject wise marks and total marksheet with final CGPA grade of the student.

[翻译]POSIX 线程编程 - webdancer's Blog

[翻译]POSIX 线程编程