DBA不可不知的操作系统内核参数

背景

操作系统为了适应更多的硬件环境,许多初始的设置值,宽容度都很高。

如果不经调整,这些值可能无法适应HPC,或者硬件稍好些的环境。

无法发挥更好的硬件性能,甚至可能影响某些应用软件的使用,特别是数据库。

数据库关心的OS内核参数

512GB 内存为例

1.

参数

fs.aio-max-nr  

支持系统

CentOS 6, 7       

参数解释

aio-nr & aio-max-nr:    
.  
aio-nr is the running total of the number of events specified on the    
io_setup system call for all currently active aio contexts.    
.  
If aio-nr reaches aio-max-nr then io_setup will fail with EAGAIN.    
.  
Note that raising aio-max-nr does not result in the pre-allocation or re-sizing    
of any kernel data structures.    
.  
aio-nr & aio-max-nr:    
.  
aio-nr shows the current system-wide number of asynchronous io requests.    
.  
aio-max-nr allows you to change the maximum value aio-nr can grow to.    

推荐设置

fs.aio-max-nr = 1xxxxxx  
.  
PostgreSQL, Greenplum 均未使用io_setup创建aio contexts. 无需设置。    
如果Oracle数据库,要使用aio的话,需要设置它。    
设置它也没什么坏处,如果将来需要适应异步IO,可以不需要重新修改这个设置。   

2.

参数

fs.file-max  

支持系统

CentOS 6, 7       

参数解释

file-max & file-nr:    
.  
The value in file-max denotes the maximum number of file handles that the Linux kernel will allocate.   
.  
When you get lots of error messages about running out of file handles,   
you might want to increase this limit.    
.  
Historically, the kernel was able to allocate file handles dynamically,   
but not to free them again.     
.  
The three values in file-nr denote :      
the number of allocated file handles ,     
the number of allocated but unused file handles ,     
the maximum number of file handles.     
.  
Linux 2.6 always reports 0 as the number of free    
file handles -- this is not an error, it just means that the    
number of allocated file handles exactly matches the number of    
used file handles.    
.  
Attempts to allocate more file descriptors than file-max are reported with printk,   
look for "VFS: file-max limit <number> reached".    

推荐设置

fs.file-max = 7xxxxxxx  
.  
PostgreSQL 有一套自己管理的VFS,真正打开的FD与内核管理的文件打开关闭有一套映射的机制,所以真实情况不需要使用那么多的file handlers。     
max_files_per_process 参数。     
假设1GB内存支撑100个连接,每个连接打开1000个文件,那么一个PG实例需要打开10万个文件,一台机器按512G内存来算可以跑500个PG实例,则需要5000万个file handler。     
以上设置绰绰有余。     

3.

参数

kernel.core_pattern  

支持系统

CentOS 6, 7       

参数解释

core_pattern:    
.  
core_pattern is used to specify a core dumpfile pattern name.    
. max length 128 characters; default value is "core"    
. core_pattern is used as a pattern template for the output filename;    
  certain string patterns (beginning with '%') are substituted with    
  their actual values.    
. backward compatibility with core_uses_pid:    
        If core_pattern does not include "%p" (default does not)    
        and core_uses_pid is set, then .PID will be appended to    
        the filename.    
. corename format specifiers:    
        %<NUL>  '%' is dropped    
        %%      output one '%'    
        %p      pid    
        %P      global pid (init PID namespace)    
        %i      tid    
        %I      global tid (init PID namespace)    
        %u      uid    
        %g      gid    
        %d      dump mode, matches PR_SET_DUMPABLE and    
                /proc/sys/fs/suid_dumpable    
        %s      signal number    
        %t      UNIX time of dump    
        %h      hostname    
        %e      executable filename (may be shortened)    
        %E      executable path    
        %<OTHER> both are dropped    
. If the first character of the pattern is a '|', the kernel will treat    
  the rest of the pattern as a command to run.  The core dump will be    
  written to the standard input of that program instead of to a file.    

推荐设置

kernel.core_pattern = /xxx/core_%e_%u_%t_%s.%p    
.  
这个目录要777的权限,如果它是个软链,则真实目录需要777的权限  
mkdir /xxx  
chmod 777 /xxx  
留足够的空间  

4.

参数

kernel.sem   

支持系统

CentOS 6, 7       

参数解释

kernel.sem = 4096 2147483647 2147483646 512000    
.  
4096 每组多少信号量 (>=17, PostgreSQL 每16个进程一组, 每组需要17个信号量) ,     
2147483647 总共多少信号量 (2^31-1 , 且大于4096*512000 ) ,     
2147483646 每个semop()调用支持多少操作 (2^31-1),     
512000 多少组信号量 (假设每GB支持100个连接, 512GB支持51200个连接, 加上其他进程, > 51200*2/16 绰绰有余)     
.  
# sysctl -w kernel.sem="4096 2147483647 2147483646 512000"    
.  
# ipcs -s -l    
  ------ Semaphore Limits --------    
max number of arrays = 512000    
max semaphores per array = 4096    
max semaphores system wide = 2147483647    
max ops per semop call = 2147483646    
semaphore max value = 32767    

推荐设置

kernel.sem = 4096 2147483647 2147483646 512000    
.  
4096可能能够适合更多的场景, 所以大点无妨,关键是512000 arrays也够了。    

5.

参数

kernel.shmall = 107374182    
kernel.shmmax = 274877906944    
kernel.shmmni = 819200    

支持系统

CentOS 6, 7        

参数解释

假设主机内存 512GB    
.  
shmmax 单个共享内存段最大 256GB (主机内存的一半,单位字节)      
shmall 所有共享内存段加起来最大 (主机内存的80%,单位PAGE)      
shmmni 一共允许创建819200个共享内存段 (每个数据库启动需要2个共享内存段。  将来允许动态创建共享内存段,可能需求量更大)     
.  
# getconf PAGE_SIZE    
4096    

推荐设置

kernel.shmall = 107374182    
kernel.shmmax = 274877906944    
kernel.shmmni = 819200    
.  
9.2以及以前的版本,数据库启动时,对共享内存段的内存需求非常大,需要考虑以下几点  
Connections:	(1800 + 270 * max_locks_per_transaction) * max_connections  
Autovacuum workers:	(1800 + 270 * max_locks_per_transaction) * autovacuum_max_workers  
Prepared transactions:	(770 + 270 * max_locks_per_transaction) * max_prepared_transactions  
Shared disk buffers:	(block_size + 208) * shared_buffers  
WAL buffers:	(wal_block_size + 8) * wal_buffers  
Fixed space requirements:	770 kB  
.  
以上建议参数根据9.2以前的版本设置,后期的版本同样适用。  

6.

参数

net.core.netdev_max_backlog  

支持系统

CentOS 6, 7     

参数解释

netdev_max_backlog    
  ------------------    
Maximum number  of  packets,  queued  on  the  INPUT  side,    
when the interface receives packets faster than kernel can process them.    

推荐设置

net.core.netdev_max_backlog=1xxxx    
.  
INPUT链表越长,处理耗费越大,如果用了iptables管理的话,需要加大这个值。    

7.

参数

net.core.rmem_default  
net.core.rmem_max  
net.core.wmem_default  
net.core.wmem_max  

支持系统

CentOS 6, 7     

参数解释

rmem_default    
  ------------    
The default setting of the socket receive buffer in bytes.    
.  
rmem_max    
  --------    
The maximum receive socket buffer size in bytes.    
.  
wmem_default    
  ------------    
The default setting (in bytes) of the socket send buffer.    
.  
wmem_max    
  --------    
The maximum send socket buffer size in bytes.    

推荐设置

net.core.rmem_default = 262144    
net.core.rmem_max = 4194304    
net.core.wmem_default = 262144    
net.core.wmem_max = 4194304    

8.

参数

net.core.somaxconn   

支持系统

CentOS 6, 7        

参数解释

somaxconn - INTEGER    
        Limit of socket listen() backlog, known in userspace as SOMAXCONN.    
        Defaults to 128.    
	See also tcp_max_syn_backlog for additional tuning for TCP sockets.    

推荐设置

net.core.somaxconn=4xxx    

9.

参数

net.ipv4.tcp_max_syn_backlog  

支持系统

CentOS 6, 7         

参数解释

tcp_max_syn_backlog - INTEGER    
        Maximal number of remembered connection requests, which have not    
        received an acknowledgment from connecting client.    
        The minimal value is 128 for low memory machines, and it will    
        increase in proportion to the memory of machine.    
        If server suffers from overload, try increasing this number.    

推荐设置

net.ipv4.tcp_max_syn_backlog=4xxx    
pgpool-II 使用了这个值,用于将超过num_init_child以外的连接queue。     
所以这个值决定了有多少连接可以在队列里面等待。    

10.

参数

net.ipv4.tcp_keepalive_intvl=20    
net.ipv4.tcp_keepalive_probes=3    
net.ipv4.tcp_keepalive_time=60     

支持系统

CentOS 6, 7        

参数解释

tcp_keepalive_time - INTEGER    
        How often TCP sends out keepalive messages when keepalive is enabled.    
        Default: 2hours.    
.  
tcp_keepalive_probes - INTEGER    
        How many keepalive probes TCP sends out, until it decides that the    
        connection is broken. Default value: 9.    
.  
tcp_keepalive_intvl - INTEGER    
        How frequently the probes are send out. Multiplied by    
        tcp_keepalive_probes it is time to kill not responding connection,    
        after probes started. Default value: 75sec i.e. connection    
        will be aborted after ~11 minutes of retries.    

推荐设置

net.ipv4.tcp_keepalive_intvl=20    
net.ipv4.tcp_keepalive_probes=3    
net.ipv4.tcp_keepalive_time=60    
.  
连接空闲60秒后, 每隔20秒发心跳包, 尝试3次心跳包没有响应,关闭连接。 从开始空闲,到关闭连接总共历时120秒。    

11.

参数

net.ipv4.tcp_mem=8388608 12582912 16777216    

支持系统

CentOS 6, 7    

参数解释

tcp_mem - vector of 3 INTEGERs: min, pressure, max    
单位 page    
        min: below this number of pages TCP is not bothered about its    
        memory appetite.    
.  
        pressure: when amount of memory allocated by TCP exceeds this number    
        of pages, TCP moderates its memory consumption and enters memory    
        pressure mode, which is exited when memory consumption falls    
        under "min".    
.  
        max: number of pages allowed for queueing by all TCP sockets.    
.  
        Defaults are calculated at boot time from amount of available    
        memory.    
64GB 内存,自动计算的值是这样的    
net.ipv4.tcp_mem = 1539615      2052821 3079230    
.  
512GB 内存,自动计算得到的值是这样的    
net.ipv4.tcp_mem = 49621632     66162176        99243264    
.  
这个参数让操作系统启动时自动计算,问题也不大  

推荐设置

net.ipv4.tcp_mem=8388608 12582912 16777216    
.  
这个参数让操作系统启动时自动计算,问题也不大  

12.

参数

net.ipv4.tcp_fin_timeout  

支持系统

CentOS 6, 7        

参数解释

tcp_fin_timeout - INTEGER    
        The length of time an orphaned (no longer referenced by any    
        application) connection will remain in the FIN_WAIT_2 state    
        before it is aborted at the local end.  While a perfectly    
        valid "receive only" state for an un-orphaned connection, an    
        orphaned connection in FIN_WAIT_2 state could otherwise wait    
        forever for the remote to close its end of the connection.    
        Cf. tcp_max_orphans    
        Default: 60 seconds    

推荐设置

net.ipv4.tcp_fin_timeout=5    
.  
加快僵尸连接回收速度   

13.

参数

net.ipv4.tcp_synack_retries  

支持系统

CentOS 6, 7         

参数解释

tcp_synack_retries - INTEGER    
        Number of times SYNACKs for a passive TCP connection attempt will    
        be retransmitted. Should not be higher than 255. Default value    
        is 5, which corresponds to 31seconds till the last retransmission    
        with the current initial RTO of 1second. With this the final timeout    
        for a passive TCP connection will happen after 63seconds.    

推荐设置

net.ipv4.tcp_synack_retries=2    
.  
缩短tcp syncack超时时间  

14.

参数

net.ipv4.tcp_syncookies  

支持系统

CentOS 6, 7         

参数解释

tcp_syncookies - BOOLEAN    
        Only valid when the kernel was compiled with CONFIG_SYN_COOKIES    
        Send out syncookies when the syn backlog queue of a socket    
        overflows. This is to prevent against the common 'SYN flood attack'    
        Default: 1    
.  
        Note, that syncookies is fallback facility.    
        It MUST NOT be used to help highly loaded servers to stand    
        against legal connection rate. If you see SYN flood warnings    
        in your logs, but investigation shows that they occur    
        because of overload with legal connections, you should tune    
        another parameters until this warning disappear.    
        See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.    
.  
        syncookies seriously violate TCP protocol, do not allow    
        to use TCP extensions, can result in serious degradation    
        of some services (f.e. SMTP relaying), visible not by you,    
        but your clients and relays, contacting you. While you see    
        SYN flood warnings in logs not being really flooded, your server    
        is seriously misconfigured.    
.  
        If you want to test which effects syncookies have to your    
        network connections you can set this knob to 2 to enable    
        unconditionally generation of syncookies.    

推荐设置

net.ipv4.tcp_syncookies=1    
.  
防止syn flood攻击   

15.

参数

net.ipv4.tcp_timestamps  

支持系统

CentOS 6, 7         

参数解释

tcp_timestamps - BOOLEAN    
        Enable timestamps as defined in RFC1323.    

推荐设置

net.ipv4.tcp_timestamps=1    
.  
tcp_timestamps 是 tcp 协议中的一个扩展项,通过时间戳的方式来检测过来的包以防止 PAWS(Protect Against Wrapped  Sequence numbers),可以提高 tcp 的性能。  

16.

参数

net.ipv4.tcp_tw_recycle  
net.ipv4.tcp_tw_reuse  
net.ipv4.tcp_max_tw_buckets  

支持系统

CentOS 6, 7         

参数解释

tcp_tw_recycle - BOOLEAN    
        Enable fast recycling TIME-WAIT sockets. Default value is 0.    
        It should not be changed without advice/request of technical    
        experts.    
.  
tcp_tw_reuse - BOOLEAN    
        Allow to reuse TIME-WAIT sockets for new connections when it is    
        safe from protocol viewpoint. Default value is 0.    
        It should not be changed without advice/request of technical    
        experts.    
.  
tcp_max_tw_buckets - INTEGER  
        Maximal number of timewait sockets held by system simultaneously.  
        If this number is exceeded time-wait socket is immediately destroyed  
        and warning is printed.   
	This limit exists only to prevent simple DoS attacks,   
	you _must_ not lower the limit artificially,   
        but rather increase it (probably, after increasing installed memory),    
        if network conditions require more than default value.   

推荐设置

net.ipv4.tcp_tw_recycle=0    
net.ipv4.tcp_tw_reuse=1    
net.ipv4.tcp_max_tw_buckets = 2xxxxx    
.  
net.ipv4.tcp_tw_recycle和net.ipv4.tcp_timestamps不建议同时开启    

17.

参数

net.ipv4.tcp_rmem  
net.ipv4.tcp_wmem  

支持系统

CentOS 6, 7         

参数解释

tcp_wmem - vector of 3 INTEGERs: min, default, max    
        min: Amount of memory reserved for send buffers for TCP sockets.    
        Each TCP socket has rights to use it due to fact of its birth.    
        Default: 1 page    
.  
        default: initial size of send buffer used by TCP sockets.  This    
        value overrides net.core.wmem_default used by other protocols.    
        It is usually lower than net.core.wmem_default.    
        Default: 16K    
.  
        max: Maximal amount of memory allowed for automatically tuned    
        send buffers for TCP sockets. This value does not override    
        net.core.wmem_max.  Calling setsockopt() with SO_SNDBUF disables    
        automatic tuning of that socket's send buffer size, in which case    
        this value is ignored.    
        Default: between 64K and 4MB, depending on RAM size.    
.  
tcp_rmem - vector of 3 INTEGERs: min, default, max    
        min: Minimal size of receive buffer used by TCP sockets.    
        It is guaranteed to each TCP socket, even under moderate memory    
        pressure.    
        Default: 1 page    
.  
        default: initial size of receive buffer used by TCP sockets.    
        This value overrides net.core.rmem_default used by other protocols.    
        Default: 87380 bytes. This value results in window of 65535 with    
        default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit    
        less for default tcp_app_win. See below about these variables.    
.  
        max: maximal size of receive buffer allowed for automatically    
        selected receiver buffers for TCP socket. This value does not override    
        net.core.rmem_max.  Calling setsockopt() with SO_RCVBUF disables    
        automatic tuning of that socket's receive buffer size, in which    
        case this value is ignored.    
        Default: between 87380B and 6MB, depending on RAM size.    

推荐设置

net.ipv4.tcp_rmem=8192 87380 16777216    
net.ipv4.tcp_wmem=8192 65536 16777216    
.  
许多数据库的推荐设置,提高网络性能  

18.

参数

net.nf_conntrack_max  
net.netfilter.nf_conntrack_max  

支持系统

CentOS 6    

参数解释

nf_conntrack_max - INTEGER    
        Size of connection tracking table.    
	Default value is nf_conntrack_buckets value * 4.    

推荐设置

net.nf_conntrack_max=1xxxxxx    
net.netfilter.nf_conntrack_max=1xxxxxx    

19.

参数

vm.dirty_background_bytes   
vm.dirty_expire_centisecs   
vm.dirty_ratio   
vm.dirty_writeback_centisecs   

支持系统

CentOS 6, 7        

参数解释

==============================================================    
.  
dirty_background_bytes    
.  
Contains the amount of dirty memory at which the background kernel    
flusher threads will start writeback.    
.  
Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only    
one of them may be specified at a time. When one sysctl is written it is    
immediately taken into account to evaluate the dirty memory limits and the    
other appears as 0 when read.    
.  
==============================================================    
.  
dirty_background_ratio    
.  
Contains, as a percentage of total system memory, the number of pages at which    
the background kernel flusher threads will start writing out dirty data.    
.  
==============================================================    
.  
dirty_bytes    
.  
Contains the amount of dirty memory at which a process generating disk writes    
will itself start writeback.    
.  
Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be    
specified at a time. When one sysctl is written it is immediately taken into    
account to evaluate the dirty memory limits and the other appears as 0 when    
read.    
.  
Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any    
value lower than this limit will be ignored and the old configuration will be    
retained.    
.  
==============================================================    
.  
dirty_expire_centisecs    
.  
This tunable is used to define when dirty data is old enough to be eligible    
for writeout by the kernel flusher threads.  It is expressed in 100'ths    
of a second.  Data which has been dirty in-memory for longer than this    
interval will be written out next time a flusher thread wakes up.    
.  
==============================================================    
.  
dirty_ratio    
.  
Contains, as a percentage of total system memory, the number of pages at which    
a process which is generating disk writes will itself start writing out dirty    
data.    
.  
==============================================================    
.  
dirty_writeback_centisecs    
.  
The kernel flusher threads will periodically wake up and write `old' data    
out to disk.  This tunable expresses the interval between those wakeups, in    
100'ths of a second.    
.  
Setting this to zero disables periodic writeback altogether.    
.  
==============================================================    

推荐设置

vm.dirty_background_bytes = 4096000000    
vm.dirty_expire_centisecs = 6000    
vm.dirty_ratio = 80    
vm.dirty_writeback_centisecs = 50    
.  
减少数据库进程刷脏页的频率,dirty_background_bytes根据实际IOPS能力以及内存大小设置    

20.

参数

vm.extra_free_kbytes  

支持系统

CentOS 6    

参数解释

extra_free_kbytes    
.  
This parameter tells the VM to keep extra free memory   
between the threshold where background reclaim (kswapd) kicks in,   
and the threshold where direct reclaim (by allocating processes) kicks in.    
.  
This is useful for workloads that require low latency memory allocations    
and have a bounded burstiness in memory allocations,   
for example a realtime application that receives and transmits network traffic    
(causing in-kernel memory allocations) with a maximum total message burst    
size of 200MB may need 200MB of extra free memory to avoid direct reclaim    
related latencies.    
.  
目标是尽量让后台进程回收内存,比用户进程提早多少kbytes回收,因此用户进程可以快速分配内存。    

推荐设置

vm.extra_free_kbytes=4xxxxxx    

21.

参数

vm.min_free_kbytes  

支持系统

CentOS 6, 7         

参数解释

min_free_kbytes:    
.  
This is used to force the Linux VM to keep a minimum number    
of kilobytes free.  The VM uses this number to compute a    
watermark[WMARK_MIN] value for each lowmem zone in the system.    
Each lowmem zone gets a number of reserved free pages based    
proportionally on its size.    
.  
Some minimal amount of memory is needed to satisfy PF_MEMALLOC    
allocations; if you set this to lower than 1024KB, your system will    
become subtly broken, and prone to deadlock under high loads.    
.  
Setting this too high will OOM your machine instantly.    

推荐设置

vm.min_free_kbytes = 2xxxxxx    
.  
防止在高负载时系统无响应,减少内存分配死锁概率。    

22.

参数

vm.mmap_min_addr  

支持系统

CentOS 6, 7       

参数解释

mmap_min_addr    
.  
This file indicates the amount of address space  which a user process will    
be restricted from mmapping.  Since kernel null dereference bugs could    
accidentally operate based on the information in the first couple of pages    
of memory userspace processes should not be allowed to write to them.  By    
default this value is set to 0 and no protections will be enforced by the    
security module.  Setting this value to something like 64k will allow the    
vast majority of applications to work correctly and provide defense in depth    
against future potential kernel bugs.    

推荐设置

vm.mmap_min_addr=6xxxx    
.  
防止内核隐藏的BUG导致的问题  

23.

参数

vm.overcommit_memory   
vm.overcommit_ratio   

支持系统

CentOS 6, 7         

参数解释

==============================================================    
.  
overcommit_kbytes:    
.  
When overcommit_memory is set to 2, the committed address space is not    
permitted to exceed swap plus this amount of physical RAM. See below.    
.  
Note: overcommit_kbytes is the counterpart of overcommit_ratio. Only one    
of them may be specified at a time. Setting one disables the other (which    
then appears as 0 when read).    
.  
==============================================================    
.  
overcommit_memory:    
.  
This value contains a flag that enables memory overcommitment.    
.  
When this flag is 0,   
the kernel attempts to estimate the amount    
of free memory left when userspace requests more memory.    
.  
When this flag is 1,   
the kernel pretends there is always enough memory until it actually runs out.    
.  
When this flag is 2,   
the kernel uses a "never overcommit"    
policy that attempts to prevent any overcommit of memory.    
Note that user_reserve_kbytes affects this policy.    
.  
This feature can be very useful because there are a lot of    
programs that malloc() huge amounts of memory "just-in-case"    
and don't use much of it.    
.  
The default value is 0.    
.  
See Documentation/vm/overcommit-accounting and    
security/commoncap.c::cap_vm_enough_memory() for more information.    
.  
==============================================================    
.  
overcommit_ratio:    
.  
When overcommit_memory is set to 2,   
the committed address space is not permitted to exceed   
      swap + this percentage of physical RAM.    
See above.    
.  
==============================================================    

推荐设置

vm.overcommit_memory = 0    
vm.overcommit_ratio = 90    
.  
vm.overcommit_memory = 0 时 vm.overcommit_ratio可以不设置   

24.

参数

vm.swappiness   

支持系统

CentOS 6, 7         

参数解释

swappiness    
.  
This control is used to define how aggressive the kernel will swap    
memory pages.    
Higher values will increase agressiveness, lower values    
decrease the amount of swap.    
.  
The default value is 60.    

推荐设置

vm.swappiness = 0    

25.

参数

vm.zone_reclaim_mode   

支持系统

CentOS 6, 7         

参数解释

zone_reclaim_mode:    
.  
Zone_reclaim_mode allows someone to set more or less aggressive approaches to    
reclaim memory when a zone runs out of memory. If it is set to zero then no    
zone reclaim occurs. Allocations will be satisfied from other zones / nodes    
in the system.    
.  
This is value ORed together of    
.  
1       = Zone reclaim on    
2       = Zone reclaim writes dirty pages out    
4       = Zone reclaim swaps pages    
.  
zone_reclaim_mode is disabled by default.  For file servers or workloads    
that benefit from having their data cached, zone_reclaim_mode should be    
left disabled as the caching effect is likely to be more important than    
data locality.    
.  
zone_reclaim may be enabled if it's known that the workload is partitioned    
such that each partition fits within a NUMA node and that accessing remote    
memory would cause a measurable performance reduction.  The page allocator    
will then reclaim easily reusable pages (those page cache pages that are    
currently not used) before allocating off node pages.    
.  
Allowing zone reclaim to write out pages stops processes that are    
writing large amounts of data from dirtying pages on other nodes. Zone    
reclaim will write out dirty pages if a zone fills up and so effectively    
throttle the process. This may decrease the performance of a single process    
since it cannot use all of system memory to buffer the outgoing writes    
anymore but it preserve the memory on other nodes so that the performance    
of other processes running on other nodes will not be affected.    
.  
Allowing regular swap effectively restricts allocations to the local    
node unless explicitly overridden by memory policies or cpuset    
configurations.    

推荐设置

vm.zone_reclaim_mode=0    
.  
不使用NUMA  

26.

参数

net.ipv4.ip_local_port_range  

支持系统

CentOS 6, 7         

参数解释

ip_local_port_range - 2 INTEGERS  
        Defines the local port range that is used by TCP and UDP to  
        choose the local port. The first number is the first, the  
        second the last local port number. The default values are  
        32768 and 61000 respectively.  
.  
ip_local_reserved_ports - list of comma separated ranges  
        Specify the ports which are reserved for known third-party  
        applications. These ports will not be used by automatic port  
        assignments (e.g. when calling connect() or bind() with port  
        number 0). Explicit port allocation behavior is unchanged.  
.  
        The format used for both input and output is a comma separated  
        list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and  
        10). Writing to the file will clear all previously reserved  
        ports and update the current list with the one given in the  
        input.  
.  
        Note that ip_local_port_range and ip_local_reserved_ports  
        settings are independent and both are considered by the kernel  
        when determining which ports are available for automatic port  
        assignments.  
.  
        You can reserve ports which are not in the current  
        ip_local_port_range, e.g.:  
.  
        $ cat /proc/sys/net/ipv4/ip_local_port_range  
        32000   61000  
        $ cat /proc/sys/net/ipv4/ip_local_reserved_ports  
        8080,9148  
.  
        although this is redundant. However such a setting is useful  
        if later the port range is changed to a value that will  
        include the reserved ports.  
.  
        Default: Empty  

推荐设置

net.ipv4.ip_local_port_range=40000 65535    
.  
限制本地动态端口分配范围,防止占用监听端口。  

27.

参数

  vm.nr_hugepages  

支持系统

CentOS 6, 7  

参数解释

==============================================================  
nr_hugepages  
Change the minimum size of the hugepage pool.  
See Documentation/vm/hugetlbpage.txt  
==============================================================  
nr_overcommit_hugepages  
Change the maximum size of the hugepage pool. The maximum is  
nr_hugepages + nr_overcommit_hugepages.  
See Documentation/vm/hugetlbpage.txt  
.  
The output of "cat /proc/meminfo" will include lines like:  
......  
HugePages_Total: vvv  
HugePages_Free:  www  
HugePages_Rsvd:  xxx  
HugePages_Surp:  yyy  
Hugepagesize:    zzz kB  
.  
where:  
HugePages_Total is the size of the pool of huge pages.  
HugePages_Free  is the number of huge pages in the pool that are not yet  
                allocated.  
HugePages_Rsvd  is short for "reserved," and is the number of huge pages for  
                which a commitment to allocate from the pool has been made,  
                but no allocation has yet been made.  Reserved huge pages  
                guarantee that an application will be able to allocate a  
                huge page from the pool of huge pages at fault time.  
HugePages_Surp  is short for "surplus," and is the number of huge pages in  
                the pool above the value in /proc/sys/vm/nr_hugepages. The  
                maximum number of surplus huge pages is controlled by  
                /proc/sys/vm/nr_overcommit_hugepages.  
.  
/proc/filesystems should also show a filesystem of type "hugetlbfs" configured  
in the kernel.  
.  
/proc/sys/vm/nr_hugepages indicates the current number of "persistent" huge  
pages in the kernel's huge page pool.  "Persistent" huge pages will be  
returned to the huge page pool when freed by a task.  A user with root  
privileges can dynamically allocate more or free some persistent huge pages  
by increasing or decreasing the value of 'nr_hugepages'.  

推荐设置

如果要使用PostgreSQL的huge page,建议设置它。    
大于数据库需要的共享内存即可。    

28.

参数

  fs.nr_open

支持系统

CentOS 6, 7

参数解释

nr_open:

This denotes the maximum number of file-handles a process can
allocate. Default value is 1024*1024 (1048576) which should be
enough for most machines. Actual limit depends on RLIMIT_NOFILE
resource limit.

它还影响security/limits.conf 的文件句柄限制,单个进程的打开句柄不能大于fs.nr_open,所以要加大文件句柄限制,首先要加大nr_open

推荐设置

对于有很多对象(表、视图、索引、序列、物化视图等)的PostgreSQL数据库,建议设置为2000万,
例如fs.nr_open=20480000

数据库关心的资源限制

1. 通过/etc/security/limits.conf设置,或者ulimit设置

2. 通过/proc/$pid/limits查看当前进程的设置

#        - core - limits the core file size (KB)  
#        - memlock - max locked-in-memory address space (KB)  
#        - nofile - max number of open files  建议设置为1000万 , 但是必须设置sysctl, fs.nr_open大于它,否则会导致系统无法登陆。
#        - nproc - max number of processes  
以上四个是非常关心的配置  
....  
#        - data - max data size (KB)  
#        - fsize - maximum filesize (KB)  
#        - rss - max resident set size (KB)  
#        - stack - max stack size (KB)  
#        - cpu - max CPU time (MIN)  
#        - as - address space limit (KB)  
#        - maxlogins - max number of logins for this user  
#        - maxsyslogins - max number of logins on the system  
#        - priority - the priority to run user process with  
#        - locks - max number of file locks the user can hold  
#        - sigpending - max number of pending signals  
#        - msgqueue - max memory used by POSIX message queues (bytes)  
#        - nice - max nice priority allowed to raise to values: [-20, 19]  
#        - rtprio - max realtime priority  

数据库关心的IO调度规则

1. 目前操作系统支持的IO调度策略包括cfq, deadline, noop 等。

/kernel-doc-xxx/Documentation/block  
-r--r--r-- 1 root root   674 Apr  8 16:33 00-INDEX  
-r--r--r-- 1 root root 55006 Apr  8 16:33 biodoc.txt  
-r--r--r-- 1 root root   618 Apr  8 16:33 capability.txt  
-r--r--r-- 1 root root 12791 Apr  8 16:33 cfq-iosched.txt  
-r--r--r-- 1 root root 13815 Apr  8 16:33 data-integrity.txt  
-r--r--r-- 1 root root  2841 Apr  8 16:33 deadline-iosched.txt  
-r--r--r-- 1 root root  4713 Apr  8 16:33 ioprio.txt  
-r--r--r-- 1 root root  2535 Apr  8 16:33 null_blk.txt  
-r--r--r-- 1 root root  4896 Apr  8 16:33 queue-sysfs.txt  
-r--r--r-- 1 root root  2075 Apr  8 16:33 request.txt  
-r--r--r-- 1 root root  3272 Apr  8 16:33 stat.txt  
-r--r--r-- 1 root root  1414 Apr  8 16:33 switching-sched.txt  
-r--r--r-- 1 root root  3916 Apr  8 16:33 writeback_cache_control.txt  

如果你要详细了解这些调度策略的规则,可以查看WIKI或者看内核文档。

从这里可以看到它的调度策略

cat /sys/block/vdb/queue/scheduler   
noop [deadline] cfq   

修改

echo deadline > /sys/block/hda/queue/scheduler  

或者修改启动参数

grub.conf  
elevator=deadline  

从很多测试结果来看,数据库使用deadline调度,性能会更稳定一些。

其他

1. 关闭透明大页

2. 禁用NUMA

3. SSD的对齐

PostgreSQL on Linux 最佳部署手册

背景

数据库的安装一直以来都挺复杂的,特别是Oracle,现在身边都还有安装Oracle数据库赚外快的事情。

PostgreSQL其实安装很简单,但是那仅仅是可用,并不是好用。很多用户使用默认的方法安装好数据库之后,然后测试一通性能,发现性能不行就不用了。

原因不用说,多方面没有优化的结果。

PostgreSQL数据库为了适应更多的场景能使用,默认的参数都设得非常保守,通常需要优化,比如检查点,SHARED BUFFER等。

本文将介绍一下PostgreSQL on Linux的最佳部署方法,其实在我的很多文章中都有相关的内容,但是没有总结成一篇文档。

OS与硬件认证检查

目的是确认服务器与OS通过certification

Intel Xeon v3和v4的cpu,能支持的RHEL的最低版本是不一样的,

详情请见:https://access.redhat.com/support/policy/intel

Intel Xeon v3和v4的cpu,能支持的Oracle Linux 的最低版本是不一样的,

详情请见:http://linux.oracle.com/pls/apex/f?p=117:1

第一:RedHat生态系统–来自RedHat的认证列表https://access.redhat.com/ecosystem

第二:Oracle Linux 对服务器和存储的硬件认证列表 http://linux.oracle.com/pls/apex/f?p=117:1

安装常用包

# yum -y install coreutils glib2 lrzsz mpstat dstat sysstat e4fsprogs xfsprogs ntp readline-devel zlib-devel openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc make smartmontools flex bison perl-devel perl-ExtUtils* openldap-devel jadetex  openjade bzip2

配置OS内核参数

1. sysctl

注意某些参数,根据内存大小配置(已说明)

含义详见

《DBA不可不知的操作系统内核参数》

# vi /etc/sysctl.conf

# add by digoal.zhou
fs.aio-max-nr = 1048576
fs.file-max = 76724600
kernel.core_pattern= /data01/corefiles/core_%e_%u_%t_%s.%p         
# /data01/corefiles事先建好,权限777,如果是软链接,对应的目录修改为777
kernel.sem = 4096 2147483647 2147483646 512000    
# 信号量, ipcs -l 或 -u 查看,每16个进程一组,每组信号量需要17个信号量。
kernel.shmall = 107374182      
# 所有共享内存段相加大小限制(建议内存的80%)
kernel.shmmax = 274877906944   
# 最大单个共享内存段大小(建议为内存一半), >9.2的版本已大幅降低共享内存的使用
kernel.shmmni = 819200         
# 一共能生成多少共享内存段,每个PG数据库集群至少2个共享内存段
net.core.netdev_max_backlog = 10000
net.core.rmem_default = 262144       
# The default setting of the socket receive buffer in bytes.
net.core.rmem_max = 4194304          
# The maximum receive socket buffer size in bytes
net.core.wmem_default = 262144       
# The default setting (in bytes) of the socket send buffer.
net.core.wmem_max = 4194304          
# The maximum send socket buffer size in bytes.
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_intvl = 20
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_mem = 8388608 12582912 16777216
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syncookies = 1    
# 开启SYN Cookies。当出现SYN等待队列溢出时,启用cookie来处理,可防范少量的SYN攻击
net.ipv4.tcp_timestamps = 1    
# 减少time_wait
net.ipv4.tcp_tw_recycle = 0    
# 如果=1则开启TCP连接中TIME-WAIT套接字的快速回收,但是NAT环境可能导致连接失败,建议服务端关闭它
net.ipv4.tcp_tw_reuse = 1      
# 开启重用。允许将TIME-WAIT套接字重新用于新的TCP连接
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.tcp_wmem = 8192 65536 16777216
net.nf_conntrack_max = 1200000
net.netfilter.nf_conntrack_max = 1200000
vm.dirty_background_bytes = 409600000       
#  系统脏页到达这个值,系统后台刷脏页调度进程 pdflush(或其他) 自动将(dirty_expire_centisecs/100)秒前的脏页刷到磁盘
vm.dirty_expire_centisecs = 3000             
#  比这个值老的脏页,将被刷到磁盘。3000表示30秒。
vm.dirty_ratio = 95                          
#  如果系统进程刷脏页太慢,使得系统脏页超过内存 95 % 时,则用户进程如果有写磁盘的操作(如fsync, fdatasync等调用),则需要主动把系统脏页刷出。
#  有效防止用户进程刷脏页,在单机多实例,并且使用CGROUP限制单实例IOPS的情况下非常有效。  
vm.dirty_writeback_centisecs = 100            
#  pdflush(或其他)后台刷脏页进程的唤醒间隔, 100表示1秒。
vm.mmap_min_addr = 65536
vm.overcommit_memory = 0     
#  在分配内存时,允许少量over malloc, 如果设置为 1, 则认为总是有足够的内存,内存较少的测试环境可以使用 1 .  
vm.overcommit_ratio = 90     
#  当overcommit_memory = 2 时,用于参与计算允许指派的内存大小。
vm.swappiness = 0            
#  关闭交换分区
vm.zone_reclaim_mode = 0     
# 禁用 numa, 或者在vmlinux中禁止. 
net.ipv4.ip_local_port_range = 40000 65535    
# 本地自动分配的TCP, UDP端口号范围
fs.nr_open=20480000
# 单个进程允许打开的文件句柄上限

# 以下参数请注意
# vm.extra_free_kbytes = 4096000
# vm.min_free_kbytes = 2097152
# 如果是小内存机器,以上两个值不建议设置
# vm.nr_hugepages = 66536    
#  建议shared buffer设置超过64GB时 使用大页,页大小 /proc/meminfo Hugepagesize
# vm.lowmem_reserve_ratio = 1 1 1
# 对于内存大于64G时,建议设置,否则建议默认值 256 256 32

2. 生效配置

sysctl -p

配置OS资源限制

# vi /etc/security/limits.conf

# nofile超过1048576的话,一定要先将sysctl的fs.nr_open设置为更大的值,并生效后才能继续设置nofile.

* soft    nofile  1024000
* hard    nofile  1024000
* soft    nproc   unlimited
* hard    nproc   unlimited
* soft    core    unlimited
* hard    core    unlimited
* soft    memlock unlimited
* hard    memlock unlimited

最好在关注一下/etc/security/limits.d目录中的文件内容,会覆盖limits.conf的配置。

已有进程的ulimit请查看/proc/pid/limits,例如

Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             11286                11286                processes 
Max open files            1024                 4096                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       11286                11286                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us

如果你要启动其他进程,建议退出SHELL再进一遍,确认ulimit环境配置已生效,再启动。

配置OS防火墙

(建议按业务场景设置,我这里先清掉)

iptables -F

配置范例

# 私有网段
-A INPUT -s 192.168.0.0/16 -j ACCEPT
-A INPUT -s 10.0.0.0/8 -j ACCEPT
-A INPUT -s 172.16.0.0/16 -j ACCEPT

selinux

如果没有这方面的需求,建议禁用

# vi /etc/sysconfig/selinux 

SELINUX=disabled
SELINUXTYPE=targeted

关闭不必要的OS服务

chkconfig --list|grep on  
关闭不必要的,例如 
chkconfig iscsi off

部署文件系统

注意SSD对齐,延长寿命,避免写放大。

parted -s /dev/sda mklabel gpt
parted -s /dev/sda mkpart primary 1MiB 100%

格式化(如果你选择ext4的话)

mkfs.ext4 /dev/sda1 -m 0 -O extent,uninit_bg -E lazy_itable_init=1 -T largefile -L u01

建议使用的ext4 mount选项

# vi /etc/fstab

LABEL=u01 /u01     ext4        defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback    0 0

# mkdir /u01
# mount -a

为什么需要data=writeback?

pic

建议pg_xlog放到独立的IOPS性能贼好的块设备中。

设置SSD盘的调度为deadline

如果不是SSD的话,还是使用CFQ,否则建议使用DEADLINE。

临时设置(比如sda盘)

echo deadline > /sys/block/sda/queue/scheduler

永久设置

编辑grub文件修改块设备调度策略

vi /boot/grub.conf

elevator=deadline

注意,如果既有机械盘,又有SSD,那么可以使用/etc/rc.local,对指定磁盘修改为对应的调度策略。

关闭透明大页、numa

加上前面的默认IO调度,如下

vi /boot/grub.conf

elevator=deadline numa=off transparent_hugepage=never 

编译器

建议使用较新的编译器,安装 gcc 6.2.0 略

cd ~
tar -jxvf gcc6.2.0.tar.bz2
tar -jxvf python2.7.12.tar.bz2


# vi /etc/ld.so.conf

/home/digoal/gcc6.2.0/lib
/home/digoal/gcc6.2.0/lib64
/home/digoal/python2.7.12/lib

# ldconfig

环境变量

# vi ~/env_pg.sh

export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=$1
export PGDATA=/$2/digoal/pg_root$PGPORT
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql9.6
export LD_LIBRARY_PATH=/home/digoal/gcc6.2.0/lib:/home/digoal/gcc6.2.0/lib64:/home/digoal/python2.7.12/lib:$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export PATH=/home/digoal/gcc6.2.0/bin:/home/digoal/python2.7.12/bin:/home/digoal/cmake3.6.3/bin:$PGHOME/bin:$PATH:.
export DATE=`date +"%Y%m%d%H%M"`
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi

icc, clang

如果你想使用ICC或者clang编译PostgreSQL,请参考

《[转载]用intel编译器icc编译PostgreSQL》

《PostgreSQL clang vs gcc 编译》

编译PostgreSQL

建议使用NAMED_POSIX_SEMAPHORES

src/backend/port/posix_sema.c

create sem : 
named :
                mySem = sem_open(semname, O_CREAT | O_EXCL,
                                                 (mode_t) IPCProtection, (unsigned) 1);


unamed :
/*
 * PosixSemaphoreCreate
 *
 * Attempt to create a new unnamed semaphore.
 */
static void
PosixSemaphoreCreate(sem_t * sem)
{
        if (sem_init(sem, 1, 1) < 0)
                elog(FATAL, "sem_init failed: %m");
}


remove sem : 

#ifdef USE_NAMED_POSIX_SEMAPHORES
        /* Got to use sem_close for named semaphores */
        if (sem_close(sem) < 0)
                elog(LOG, "sem_close failed: %m");
#else
        /* Got to use sem_destroy for unnamed semaphores */
        if (sem_destroy(sem) < 0)
                elog(LOG, "sem_destroy failed: %m");
#endif

编译项

. ~/env_pg.sh 1921 u01

cd postgresql-9.6.1
export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto" ./configure --prefix=/home/digoal/pgsql9.6
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto" make world -j 64
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto" make install-world

如果你是开发环境,需要调试,建议这样编译。

cd postgresql-9.6.1
export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O0 -flto -g -ggdb -fno-omit-frame-pointer" ./configure --prefix=/home/digoal/pgsql9.6 --enable-cassert
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O0 -flto -g -ggdb -fno-omit-frame-pointer" make world -j 64
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O0 -flto -g -ggdb -fno-omit-frame-pointer" make install-world

初始化数据库集群

pg_xlog建议放在IOPS最好的分区。

. ~/env_pg.sh 1921 u01
initdb -D $PGDATA -E UTF8 --locale=C -U postgres -X /u02/digoal/pg_xlog$PGPORT

配置postgresql.conf

以PostgreSQL 9.6, 512G内存主机为例

最佳到文件末尾即可,重复的会以末尾的作为有效值。  
  
$ vi postgresql.conf

listen_addresses = '0.0.0.0'
port = 1921
max_connections = 5000
unix_socket_directories = '.'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 128GB
maintenance_work_mem = 4GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
bgwriter_flush_after = 0  # IO很好的机器,不需要考虑平滑调度
max_worker_processes = 128
max_parallel_workers_per_gather = 0
old_snapshot_threshold = -1
backend_flush_after = 0  # IO很好的机器,不需要考虑平滑调度
wal_level = replica
synchronous_commit = off
full_page_writes = on
wal_buffers = 1GB
wal_writer_delay = 10ms
wal_writer_flush_after = 0  # IO很好的机器,不需要考虑平滑调度
checkpoint_timeout = 30min  # 不建议频繁做检查点,否则XLOG会产生很多的FULL PAGE WRITE。
max_wal_size = 256GB  # 建议是SHARED BUFFER的2倍
min_wal_size = 64GB
checkpoint_completion_target = 0.05  # 硬盘好的情况下,可以让检查点快速结束,恢复时也可以快速达到一致状态。
checkpoint_flush_after = 0  # IO很好的机器,不需要考虑平滑调度
archive_mode = on
archive_command = '/bin/date'      #  后期再修改,如  'test ! -f /disk1/digoal/arch/%f && cp %p /disk1/digoal/arch/%f'
max_wal_senders = 8
random_page_cost = 1.3  # IO很好的机器,不需要考虑离散和顺序扫描的成本差异
parallel_tuple_cost = 0
parallel_setup_cost = 0
min_parallel_relation_size = 0
effective_cache_size = 300GB  # 看着办,剩下的都是OS可用的CACHE。
force_parallel_mode = off
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
vacuum_defer_cleanup_age = 0
hot_standby_feedback = off
max_standby_archive_delay = 300s
max_standby_streaming_delay = 300s
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 16  # CPU核多,并且IO好的情况下,可多点,但是注意16*autovacuum mem,会消耗较多内存,所以内存也要有基础。  
autovacuum_naptime = 30s
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.2
autovacuum_freeze_max_age = 1600000000
autovacuum_multixact_freeze_max_age = 1600000000
vacuum_freeze_table_age = 1500000000
vacuum_multixact_freeze_table_age = 1500000000
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
shared_preload_libraries='pg_stat_statements'

## 如果你的数据库有非常多小文件(比如有几十万以上的表,还有索引等,并且每张表都会被访问到时),建议FD可以设多一些,避免进程需要打开关闭文件。
## 但是不要大于前面章节系统设置的ulimit -n(open files)
max_files_per_process=655360

配置pg_hba.conf

避免不必要的访问,开放允许的访问,建议务必使用密码访问。

$ vi pg_hba.conf

host replication xx 0.0.0.0/0 md5  # 流复制

host all postgres 0.0.0.0/0 reject # 拒绝超级用户从网络登录
host all all 0.0.0.0/0 md5  # 其他用户登陆

启动数据库

pg_ctl start

好了,你的PostgreSQL数据库基本上部署好了,可以愉快的玩耍了。

分析java进程假死情况

1 引言

 

1.1 编写目的

 

为了方便大家以后发现进程假死的时候能够正常的分析并且第一时间保留现场快照。

 

1.2编写背景

 

最近服务器发现tomcat的应用会偶尔出现无法访问的情况。经过一段时间的观察最近又发现有台tomcat的应用出现了无法访问情况。简单描述下该台tomcat当时具体的表现:客户端请求没有响应,查看服务器端tomcat的进程是存活的,查看业务日志的时候发现日志停止没有任何最新的访问日志。连tomcat下面的catalina.log也没有任何访问记录,基本断定该台tomcat已不能提供服务。

 

2 分析步骤

 

根据前面我描述的假死现象,我最先想到的是网络是否出现了问题,是不是有什么丢包严重的情况,于是我开始从请求的数据流程开始分析,由于我们业务的架构采用的是nginx+tomcat的集群配置,一个请求上来的流向可以用下图来简单的描述一下:

 

 

 

2.1检查nginx的网络情况

 

更改nginx的配置,让该台nginx请求只转到本机器的出现问题的tomcat应用上面,在access.log里看是否有网络请求,结果可以查看到当前所有的网络请求,也就是说可以排除是网络的问题。

 

2.2检查tomcat 的网络情况

 

分析业务配置的tomcat访问日志xxxx.log上是否有日志访问记录,经过查询该台tomcat应用日志完全没有任何访问记录,由于我们的部署是本机的nginx转到本机的tomcat应用,所以可以排除不是网络问题。到此基本可以断定网络没有问题,tomcat 本身出现了假死的情况。在tomcat的日志里有报过OutOfMemoryError的异常,所以可以肯定tomcat假死的原因是OOM

 

3 分析JVM内存溢出

 

3.1为什么会发生内存泄漏

 

在我们学习Java的时候就知道它最为方便的地方就是我们不需要管理内存的分配和释放,一切由JVM自己来进行处理,当Java对象不再被应用时,等到堆内存不够用时JVM会进行GC处理,清除这些对象占用的堆内存空间,但是如果对象一直被应用,那么JVM是无法对其进行GC处理的,那么我们创建新的对象时,JVM就没有办法从堆中获取足够的内存分配给此对象,这时就会导致OOM。我们出现OOM原因,一般都是因为我们不断的往容器里存放对象,然而容器没有相应的大小限制或清除机制,这样就容易导致OOM。

 

3.2快速定位问题

 

当我们的应用服务器占用了过多内存的时候,我们怎么样才能快速的定位问题呢?要想快速定位问题,首先我们必需获取服务器JVM某时刻的内存快照。Jdk里面提供了很多相应的命令比如:jstack,jstat,jmap,jps等等. 在出现问题后我们应该快速保留现场。

 

3.2.1 jstack

 

可以观察到jvm中当前所有线程的运行情况和线程当前状态.

 

sudo jstack -F 进程ID
输出内容如下:

从上面的图我们可以看到tomcat进程里面没有死锁的情况,而且每个线程都处理等待的状态。这个时候我们可以telnet命令连上tomcat的端口查看tomcat进程是否有任务回应。这时发现tomcat没有任何回应可以证明tomcat应用已没有响应处理假死状态。

 

3.2.2 jstat

这是jdk命令中比较重要,也是相当实用的一个命令,可以观察到classloader,compiler,gc相关信息
具体参数如下:
-class:统计class loader行为信息
-compile:统计编译行为信息
-gc:统计jdk gc时heap信息
-gccapacity:统计不同的generations(包括新生区,老年区,permanent区)相应的heap容量情况
-gccause:统计gc的情况,(同-gcutil)和引起gc的事件
-gcnew:统计gc时,新生代的情况
-gcnewcapacity:统计gc时,新生代heap容量
-gcold:统计gc时,老年区的情况
-gcoldcapacity:统计gc时,老年区heap容量
-gcpermcapacity:统计gc时,permanent区heap容量
-gcutil:统计gc时,heap情况
-printcompilation:不知道干什么的,一直没用过。

一般比较常用的几个参数是:
sudo jstat -class 2083 1000 10 (每隔1秒监控一次,一共做10次)

 

查看当时的head情况

 

sudo jstat -gcutil  20683 2000

注:该图不是出错截取

 

出现时候截取的数据是gc已经完全没有处理了,因为没有加上full gc的日志所以不确定JVM GC 时间过长,导致应用暂停.

 

3.2.3获取内存快照

 

Jdk自带的jmap可以获取内在某一时刻的快照

 

命令:jmap -dump:format=b,file=heap.bin <pid>
file:保存路径及文件名
pid:进程编号(windows通过任务管理器查看,linux通过ps aux查看)
dump文件可以通过MemoryAnalyzer分析查看,网址:http://www.eclipse.org/mat/,可以查看dump时对象数量,内存占用,线程情况等。

从上面的图可以看得出来对象没有内存溢出。

从上图我们可以明确的看出此项目的HashMap内存使用率比较高,因为我们的系统都是返回Map的数据结构所以占用比较高的内存是正常情况。

 

 

3.2.4观察运行中的jvm物理内存的占用情况

 

观察运行中的jvm物理内存的占用情况。我们也可以用jmap命令
参数如下:
-heap:打印jvm heap的情况
-histo:打印jvm heap的直方图。其输出信息包括类名,对象数量,对象占用大小。
-histo:live :同上,但是只答应存活对象的情况
-permstat:打印permanent generation heap情况

命令使用:
jmap -heap 2083
可以观察到New Generation(Eden Space,From Space,To Space),tenured generation,Perm Generation的内存使用情况
输出内容:

上图为tomcat应用出错前JVM的配置信息,可以明确的看到当时的信息:

 

MaxHeapSize堆内存大小为:3500M

 

MaxNewSize新生代内存大小:512M

 

PermSize永久代内存大小:192M

 

NewRatio设置年轻代(包括Eden和两个Survivor区)与年老代的比值(除去持久代)。设置为2,则年轻代与年老代所占比值为1:2,年轻代占整个堆栈的1/3

 

SurvivorRatio设置年轻代中Eden区与Survivor区的大小比值。设置为8,则两个Survivor区与一个Eden区的比值为2:8,一个Survivor区占整个年轻代的1/10

 

 

 

在New Generation中,有一个叫Eden的空间,主要是用来存放新生的对象,还有两个Survivor Spaces(from,to), 它们用来存放每次垃圾回收后存活下来的对象。在Old Generation中,主要存放应用程序中生命周期长的内存对象,还有个Permanent Generation,主要用来放JVM自己的反射对象,比如类对象和方法对象等。

 

 

 

 

从上面的图可以看出来JVM的新生代设置太小,可以看出应用的新生代区完全占满了,无法再往新生代区增加新的对象此时的这些对象都处于活跃状态,所以不会被GC处理,但是tomcat应用还在继续产生新的对象,这样就会导致OOM的发生,这就是导致tomcat假死的原因.

 

4 Tomcat假死其它情况

 

以下是网上资料说的tomcat假的情况:

 

1、应用本身程序的问题,造成死锁。

 

2、load 太高,已经超出服务的极限

 

3、jvm GC 时间过长,导致应用暂停

 

因为出错项目里面没有打出GC的处理情况,所以不确定此原因是否也是我项目tomcat假死的原因之一。

 

4、大量tcp 连接 CLOSE_WAIT

 

netstat -n | awk ‘/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}’

 

TIME_WAIT 48

 

CLOSE_WAIT 2228

 

ESTABLISHED 86

 

常用的三个状态是:ESTABLISHED 表示正在通信,TIME_WAIT 表示主动关闭,CLOSE_WAIT 表示被动关

JetBrains IntelliJ IDEA Ultimate 2016.3.4 Crack

How to crack

1.Install the latest version of IntelliJ IDEA (v2016.3.4)
2.Start it
3.When you have to enter the license, change to [License server]
In the Server URL input field enter: http://idea.strongd.net/ . For older Servers, check out the bottom of the page, they are all listed

4.Click on [Ok] and everything should work

Building TensorFlow for Raspberry Pi: a Step-By-Step Guide

What You Need

  • Raspberry Pi 2 or 3 Model B
  • An SD card running Raspbian with several GB of free space
    • An 8 GB card with a fresh install of Raspbian does not have enough space. A 16 GB SD card minimum is recommended.
    • These instructions may work on Linux distributions other than Raspbian
  • Internet connection to the Raspberry Pi
  • A USB memory drive that can be installed as swap memory (if it is a flash drive, make sure you don’t care about the drive). Anything over 1 GB should be fine
  • A fair amount of time

Overview

These instructions were crafted for a Raspberry Pi 3 Model B running a vanilla copy of Raspbian 8.0 (jessie). It appears to work on Raspberry Pi 2, but there are some kinks that are being worked out. If these instructions work for different distributions, let me know!

Here’s the basic plan: build a 32-bit version of Protobuf, use that to build a RPi-friendly version of Bazel, and finally use Bazel to build TensorFlow.

The Build

1. Install basic dependencies

First, update apt-get to make sure it knows where to download everything.

sudo apt-get update

Next, install some base dependencies and tools we’ll need later.

For Protobuf:

sudo apt-get install autoconf automake libtool maven

For gRPC:

sudo apt-get install oracle-java7-jdk
# Select the jdk-7-oracle option for the update-alternatives command
sudo update-alternatives --config java

For Bazel:

sudo apt-get install pkg-config zip g++ zlib1g-dev unzip

For TensorFlow:

# For Python 2.7
sudo apt-get install python-pip python-numpy swig python-dev
sudo pip install wheel

# For Python 3.3+
sudo apt-get install python3-pip python3-numpy swig python3-dev
sudo pip3 install wheel

To be able to take advantage of certain optimization flags:

sudo apt-get install gcc-4.8 g++-4.8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 100

Finally, for cleanliness, make a directory that will hold the Protobuf, Bazel, and TensorFlow repositories.

mkdir tf
cd tf

2. Build Protobuf

Clone the Protobuf repository.

git clone https://github.com/google/protobuf.git

Now move into the new protobuf directory, configure it, and make it. Note: this takes a little while.

cd protobuf
git checkout v3.0.0-beta-3.3
./autogen.sh
./configure --prefix=/usr
make -j 4
sudo make install

Once it’s made, we can move into the java directory and use Maven to build the project.

cd java
mvn package

After following these steps, you’ll have two spiffy new files: /usr/bin/protoc and protobuf/java/core/target/protobuf-java-3.0.0-beta3.jar

3. Build gRPC

Next, we need to build gRPC-Java, the Java implementation of gRPC. Move out of the protobuf/java directory and clone gRPC’s repository.

cd ../..
git clone https://github.com/grpc/grpc-java.git
cd grpc-java
git checkout v0.14.1
cd compiler
nano build.gradle

Around line 47:

gcc(Gcc) {
    target("linux_arm-v7") {
        cppCompiler.executable = "/usr/bin/gcc"
    }
}

Around line 60, add section for 'linux_arm-v7':

...
    x86_64 {
        architecture "x86_64"
    }
    'linux_arm-v7' {
        architecture "arm32"
        operatingSystem "linux"
    }

Around line 64, add 'arm32' to list of architectures:

...
components {
    java_plugin(NativeExecutableSpec) {
            if (arch in ['x86_32', 'x86_64', 'arm32'])
...

Around line 148, replace content inside of protoc section to hard code path to protoc binary:

protoc {
    path = '/usr/bin/protoc'
}

Once all of that is taken care of, run this command to build gRPC:

../gradlew java_pluginExecutable

4. Build Bazel

First, move out of the grpc-java/compiler directory and clone Bazel’s repository.

cd ../..
git clone https://github.com/bazelbuild/bazel.git

Next, go into the new bazel directory and immediately checkout version 0.3.1 of Bazel.

cd bazel
git checkout 0.3.2

After that, copy the generated Protobuf and gRPC files we created earlier into the Bazel project. Note the naming of the files in this step- it must be precise.

sudo cp /usr/bin/protoc third_party/protobuf/protoc-linux-arm32.exe
sudo cp ../protobuf/java/core/target/protobuf-java-3.0.0-beta-3.jar third_party/protobuf/protobuf-java-3.0.0-beta-1.jar
sudo cp ../grpc-java/compiler/build/exe/java_plugin/protoc-gen-grpc-java third_party/grpc/protoc-gen-grpc-java-0.15.0-linux-x86_32.exe

Before building Bazel, we need to set the javac maximum heap size for this job, or else we’ll get an OutOfMemoryError. To do this, we need to make a small addition to bazel/scripts/bootstrap/compile.sh. (Shout-out to @SangManLINUX for pointing this out..

nano scripts/bootstrap/compile.sh

Around line 46, you’ll find this code:

if [ "${MACHINE_IS_64BIT}" = 'yes' ]; then
    PROTOC=${PROTOC:-third_party/protobuf/protoc-linux-x86_64.exe}
    GRPC_JAVA_PLUGIN=${GRPC_JAVA_PLUGIN:-third_party/grpc/protoc-gen-grpc-java-0.15.0-linux-x86_64.exe}
else
    if [ "${MACHINE_IS_ARM}" = 'yes' ]; then
        PROTOC=${PROTOC:-third_party/protobuf/protoc-linux-arm32.exe}
    else
        PROTOC=${PROTOC:-third_party/protobuf/protoc-linux-x86_32.exe}
        GRPC_JAVA_PLUGIN=${GRPC_JAVA_PLUGIN:-third_party/grpc/protoc-gen-grpc-java-0.15.0-linux-x86_32.exe}
    fi
fi

Change it to the following:

if [ "${MACHINE_IS_64BIT}" = 'yes' ]; then
    PROTOC=${PROTOC:-third_party/protobuf/protoc-linux-x86_64.exe}
    GRPC_JAVA_PLUGIN=${GRPC_JAVA_PLUGIN:-third_party/grpc/protoc-gen-grpc-java-0.15.0-linux-x86_64.exe}
else
    PROTOC=${PROTOC:-third_party/protobuf/protoc-linux-arm32.exe}
    GRPC_JAVA_PLUGIN=${GRPC_JAVA_PLUGIN:-third_party/grpc/protoc-gen-grpc-java-0.15.0-linux-linux-arm32.exe}
fi

Move down to line 149, where you’ll see the following block of code:

run "${JAVAC}" -classpath "${classpath}" -sourcepath "${sourcepath}" \
      -d "${output}/classes" -source "$JAVA_VERSION" -target "$JAVA_VERSION" \
      -encoding UTF-8 "@${paramfile}"

At the end of this block, add in the -J-Xmx500M flag, which sets the maximum size of the Java heap to 500 MB:

run "${JAVAC}" -classpath "${classpath}" -sourcepath "${sourcepath}" \
      -d "${output}/classes" -source "$JAVA_VERSION" -target "$JAVA_VERSION" \
      -encoding UTF-8 "@${paramfile}" -J-Xmx500M

Next up, we need to adjust third_party/protobuf/BUILD – open it up in your text editor.

nano third_party/protobuf/BUILD

We need to add this last line around line 29:

...
    "//third_party:freebsd": ["protoc-linux-x86_32.exe"],
    "//third_party:arm": ["protoc-linux-arm32.exe"],
}),
...

Finally, we have to add one thing to tools/cpp/cc_configure.bzl – open it up for editing:

nano tools/cpp/cc_configure.bzl

And place this in around line 141 (at the beginning of the _get_cpu_value function):

...
"""Compute the cpu_value based on the OS name."""
return "arm"
...

Now we can build Bazel! Note: this also takes some time.

sudo ./compile.sh

When the build finishes, you end up with a new binary, output/bazel. Copy that to your /usr/local/bin directory.

sudo mkdir /usr/local/bin
sudo cp output/bazel /usr/local/bin/bazel

To make sure it’s working properly, run bazel on the command line and verify it prints help text. Note: this may take 15-30 seconds to run, so be patient!

$ bazel

Usage: bazel <command> <options> ...

Available commands:
  analyze-profile     Analyzes build profile data.
  build               Builds the specified targets.
  canonicalize-flags  Canonicalizes a list of bazel options.
  clean               Removes output files and optionally stops the server.
  dump                Dumps the internal state of the bazel server process.
  fetch               Fetches external repositories that are prerequisites to the targets.
  help                Prints help for commands, or the index.
  info                Displays runtime info about the bazel server.
  mobile-install      Installs targets to mobile devices.
  query               Executes a dependency graph query.
  run                 Runs the specified target.
  shutdown            Stops the bazel server.
  test                Builds and runs the specified test targets.
  version             Prints version information for bazel.

Getting more help:
  bazel help <command>
                   Prints help and options for <command>.
  bazel help startup_options
                   Options for the JVM hosting bazel.
  bazel help target-syntax
                   Explains the syntax for specifying targets.
  bazel help info-keys
                   Displays a list of keys used by the info command.

Move out of the bazel directory, and we’ll move onto the next step.

cd ..

5. Install a Memory Drive as Swap for Compiling

In order to succesfully build TensorFlow, your Raspberry Pi needs a little bit more memory to fall back on. Fortunately, this process is pretty straightforward. Grab a USB storage drive that has at least 1GB of memory. I used a flash drive I could live without that carried no important data. That said, we’re only going to be using the drive as swap while we compile, so this process shouldn’t do too much damage to a relatively new USB drive.

First, put insert your USB drive, and find the /dev/XXX path for the device.

sudo blkid

As an example, my drive’s path was /dev/sda1

Once you’ve found your device, unmount it by using the umount command.

sudo umount /dev/XXX

Then format your device to be swap:

sudo mkswap /dev/XXX

If the previous command outputted an alphanumeric UUID, copy that now. Otherwise, find the UUID by running blkid again. Copy the UUID associated with /dev/XXX

sudo blkid

Now edit your /etc/fstab file to register your swap file. (I’m a Vim guy, but Nano is installed by default)

sudo nano /etc/fstab

On a separate line, enter the following information. Replace the X’s with the UUID (without quotes)

UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX none swap sw,pri=5 0 0

Save /etc/fstab, exit your text editor, and run the following command:

sudo swapon -a

If you get an error claiming it can’t find your UUID, go back and edit /etc/fstab. Replace the UUID=XXX.. bit with the original /dev/XXX information.

sudo nano /etc/fstab
# Replace the UUID with /dev/XXX
/dev/XXX none swap sw,pri=5 0 0

Alright! You’ve got swap! Don’t throw out the /dev/XXX information yet- you’ll need it to remove the device safely later on.

6. Compiling TensorFlow

First things first, clone the TensorFlow repository and move into the newly created directory.

git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd tensorflow

Note: if you’re looking to build to a specific version or commit of TensorFlow (as opposed to the HEAD at master), you should git checkout it now.

Once in the directory, we have to write a nifty one-liner that is incredibly important. The next line goes through all files and changes references of 64-bit program implementations (which we don’t have access to) to 32-bit implementations. Neat!

grep -Rl 'lib64' | xargs sed -i 's/lib64/lib/g'

Next, we need to delete a particular line in tensorflow/core/platform/platform.h. Open up the file in your favorite text editor:

$ sudo nano tensorflow/core/platform/platform.h

Now, scroll down toward the bottom and delete the following line containing #define IS_MOBILE_PLATFORM:

#elif defined(__arm__)
#define PLATFORM_POSIX
...
#define IS_MOBILE_PLATFORM   <----- DELETE THIS LINE

This keeps our Raspberry Pi device (which has an ARM CPU) from being recognized as a mobile device.

Now let’s configure Bazel:

$ ./configure

Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
Do you wish to build TensorFlow with GPU support? [y/N] N

Note: if you want to build for Python 3, specify /usr/bin/python3 for Python’s location.

Now we can use it to build TensorFlow! Warning: This takes a really, really long time. Several hours.

bazel build -c opt --copt="-mfpu=neon-vfpv4" --copt="-funsafe-math-optimizations" --copt="-ftree-vectorize" --local_resources 1024,1.0,1.0 --verbose_failures tensorflow/tools/pip_package:build_pip_package

Note: I toyed around with telling Bazel to use all four cores in the Raspberry Pi, but that seemed to make compiling more prone to completely locking up. This process takes a long time regardless, so I’m sticking with the more reliable options here. If you want to be bold, try using --local_resources 1024,2.0,1.0 or --local_resources 1024,4.0,1.0

When you wake up the next morning and it’s finished compiling, you’re in the home stretch! Use the built binary file to create a Python wheel.

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

And then install it!

sudo pip install /tmp/tensorflow_pkg/tensorflow-0.10-cp27-none-linux_armv7l.whl

7. Cleaning Up

There’s one last bit of house-cleaning we need to do before we’re done: remove the USB drive that we’ve been using as swap.

First, turn off your drive as swap:

sudo swapoff /dev/XXX

Finally, remove the line you wrote in /etc/fstab referencing the device

sudo nano /etc/fstab

Then reboot your Raspberry Pi.

And you’re done! You deserve a break.

Centos6安装ocserv/openconnect/cisco AnyConnect vpn

安装编译环境及依赖,如部分软件不能安装请先安装epel源。

yum install pam-devel readline-devel http-parser-devel unbound gmp-devel
yum install tar gzip xz wget gcc make autoconf
ocserv编译安装依赖,ocserv需要gnutls3版本以上,gnutls依赖nettle2.7.1:

wget ftp://ftp.gnu.org/gnu/nettle/nettle-2.7.1.tar.gz
tar zxvf nettle-2.7.1.tar.gz
cd nettle-2.7.1/
./configure –prefix=/usr/local/nettle
make && make install
echo ‘/usr/local/nettle/lib64/’ > /etc/ld.so.conf.d/nettle.conf
ldconfig
安装gnutls3.3.9:

export NETTLE_CFLAGS=”-I/usr/local/nettle/include/”
export NETTLE_LIBS=”-L/usr/local/nettle/lib64/ -lnettle”
export HOGWEED_LIBS=”-L/usr/local/nettle/lib64/ -lhogweed”
export HOGWEED_CFLAGS=”-I/usr/local/nettle/include”
wget ftp://ftp.gnutls.org/gcrypt/gnutls/v3.3/gnutls-3.3.9.tar.xz
tar xvf gnutls-3.3.9.tar.xz
cd gnutls-3.3.9/
./configure –prefix=/usr/local/gnutls
make && make install
ln -s /usr/local/gnutls/bin/certtool /usr/bin/certtool
echo ‘/usr/local/gnutls/lib/’ > /etc/ld.so.conf.d/gnutls.conf
ldconfig
安装libnl:

yum install bison flex
wget http://www.carisma.slowglass.com/~tgr/libnl/files/libnl-3.2.24.tar.gz
tar xvf libnl-3.2.24.tar.gz
cd libnl-3.2.24
./configure –prefix=/usr/local/libnl
make && make install
echo ‘/usr/local/libnl/lib/’ > /etc/ld.so.conf.d/libnl.conf
ldconfig
安装ocserv:

export LIBNL3_CFLAGS=”-I/usr/local/libnl/include/libnl3″
export LIBNL3_LIBS=”-L//usr/local/libnl/lib/ -lnl-3 -lnl-route-3″
export LIBGNUTLS_LIBS=”-L/usr/local/gnutls/lib/ -lgnutls”
export LIBGNUTLS_CFLAGS=”-I/usr/local/gnutls/include/”
wget ftp://ftp.infradead.org/pub/ocserv/ocserv-0.9.0.1.tar.xz
tar xvf ocserv-0.9.0.1.tar.xz
cd ocserv-0.9.0
./configure –prefix=/usr/local/ocserv
make && make install
echo ‘export PATH=$PATH://usr/local/ocserv/sbin/:/usr/local/ocserv/bin/’ >> $HOME/.bashrc
source $HOME/.bashrc
生成SSL证书:

mkdir /etc/ocserv/
cd /etc/ocserv
#CA私钥:
certtool –generate-privkey –outfile ca-key.pem
#CA模板:
cat << EOF > ca.tmpl
cn = “thinkingandcreating.com”
organization = “thinkingandcreating.com”
serial = 1
expiration_days = 3650
ca
signing_key
cert_signing_key
crl_signing_key
EOF
#CA证书:
certtool –generate-self-signed –load-privkey ca-key.pem –template ca.tmpl –outfile ca-cert.pem
#Server私钥:
certtool –generate-privkey –outfile server-key.pem
#Server证书模板:
cat << EOF > server.tmpl
cn = “thinkingandcreating.com”
o = “thinkingandcreating.com”
expiration_days = 3650
signing_key
encryption_key
tls_www_server
EOF
#Server证书:
certtool –generate-certificate –load-privkey server-key.pem –load-ca-certificate ca-cert.pem –load-ca-privkey ca-key.pem –template server.tmpl –outfile server-cert.pem
密码登录,生成密码文件:

ocpasswd -c /etc/ocserv/passwd username
证书登录:

#user私钥
certtool –generate-privkey –outfile user-key.pem
#user模板
cat << EOF > user.tmpl
cn = “some random name”
unit = “some random unit”
expiration_days = 365
signing_key
tls_www_client
EOF
#user证书
certtool –generate-certificate –load-privkey user-key.pem –load-ca-certificate ca-cert.pem –load-ca-privkey ca-key.pem –template user.tmpl –outfile user-cert.pem
配置文件:

auth = “plain[/etc/ocserv/passwd]”
#证书认证
#auth = “certificate”
ca-cert /etc/ocserv/ca-cert.pem
max-clients = 16
max-same-clients = 2
tcp-port = 5551
udp-port = 5551
keepalive = 32400
try-mtu-discovery = true
cisco-client-compat = true
server-cert = /etc/ocserv/server-cert.pem
server-key = /etc/ocserv/server-key.pem
auth-timeout = 40
pid-file = /var/run/ocserv.pid
socket-file = /var/run/ocserv-socket
run-as-user = nobody
run-as-group = daemon
device = vpns
ipv4-network = 172.16.37.0
ipv4-netmask = 255.255.255.0
route = 172.16.37.0/255.255.255.0
启动opserv:

ocserv -f -c /etc/ocserv/ocserv.conf
IP转发及SNAT:

iptables -t nat -A POSTROUTING -j SNAT –to-source <服务器公网 IP > -o <对应网卡的名称>
iptables -t nat -A POSTROUTING -s 172.16.37.0/24 -o venet0 -j MASQUERADE
iptables -A FORWARD -s 172.16.37.0/24 -j ACCEPT
service iptables save
记得开始iptables
使用用户密码连接:

echo passwd|openconnect -u username thinkingandcreating.com:5551 –no-cert-check
使用证书连接:

openconnect -k user-key.pem -c user-cert.pem thinkingandcreating.com:5551 –no-cert-check

使用 ffmpeg 缩放、裁剪、剪辑视频

使用 ffmpeg 缩放、裁剪、剪辑视频

我们平时使用手机拍摄的视频一般都在1080p及以上,在实际应用中,比如共享给朋友等,可能需要处理一下会比较方便。下面列出几个 ffmpeg 常用的处理视频方法(如果只需要格式转换,可以参考我之前的一篇文章《使用H264编码转换视频》

缩小视频

假设原始视频尺寸是 1080p(即 1920×1080 px,16:9),使用下面命令可以缩小到 480p:

$ ffmpeg -i a.mov -vf scale=853:480 -acodec aac -vcodec h264 out.mp4

各个参数的含义:

  • -i a.mov 指定待处理视频的文件名
  • -vf scale=853:480 vf 参数用于指定视频滤镜,其中 scale 表示缩放,后面的数字表示缩放至 853×480 px,其中的 853px 是计算而得,因为原始视频的宽高比为 16:9,所以为了让目标视频的高度为 480px,则宽度 = 480 x 9 / 16 = 853
  • -acodec aac 指定音频使用 aac 编码。注:因为 ffmpeg 的内置 aac 编码目前(写这篇文章时)还是试验阶段,故会提示添加参数 “-strict -2” 才能继续,尽管添加即可。又或者使用外部的 libfaac(需要重新编译 ffmpeg)。
  • -vcodec h264 指定视频使用 h264 编码。注:目前手机一般视频拍摄的格式(封装格式、文件格式)为 mov 或者 mp4,这两者的音频编码都是 aac,视频都是 h264。
  • out.mp4 指定输出文件名

上面的参数 scale=853:480 当中的宽度和高度实际应用场景中通常只需指定一个,比如指定高度为 480 或者 720,至于宽度则可以传入 “-1” 表示由原始视频的宽高比自动计算而得。即参数可以写为:scale=-1:480,当然也可以 scale=480:-1

 裁剪视频

有时可能只需要视频的正中一块,而两头的内容不需要,这时可以对视频进行裁剪(crop),比如有一个竖向的视频 1080 x 1920,如果指向保留中间 1080×1080 部分,可以使用下面的命令:

$ ffmpeg -i a.mov -strict -2 -vf crop=1080:1080:0:420 out.mp4

其中的 crop=1080:1080:0:420 才裁剪参数,具体含义是 crop=width:height:x:y,其中 width 和 height 表示裁剪后的尺寸,x:y 表示裁剪区域的左上角坐标。比如当前这个示例,我们只需要保留竖向视频的中间部分,所以 x 不用偏移,故传入0,而 y 则需要向下偏移:(1920 – 1080) / 2 = 420

视频缩放和裁剪是可以同时进行的,如下命令则为将视频缩小至 853×480,然后裁剪保留横向中间部分:

$ ffmpeg -i IMG_4940.MOV -strict -2 -vf scale=853:480,crop=480:480:186:0 out.mp4

剪辑视频

如果有一段很长的视频只需保留其中的一段,可以使用下面命令对视频进行剪辑。

$ ffmpeg -i a.mov -ss 00:00:21 -t 00:00:10 -acodec aac -vcodec h264 -strict -2 out.mp4

其中 -ss 00:00:21 表示开始剪辑的位置(时间点),-t 00:00:10 表示剪辑的长度,即 10 秒钟。

当然一段视频是可以在一个命令里同时进行剪辑、缩放、裁剪的,只需把相关的参数合在一起即可。

使用H264编码转换视频

使用H264编码转换视频

H264(即MPEG4 AVC)是目前比较流行的视频编码格式,相对MPEG2编码而言,在画质大致相同的情况下能将视频大小再压缩到50%~25%,即如果一个MPEG2(如DVD)视频大小是1GB,用H264编码能缩小到250MB左右。另外H264编码的视频还能直接在浏览器(如Chrome)和移动设备(如iPhone、Android手机)上直接播放。
如果你有一堆家庭视频(旧款的家用DV一般是MPEG2格式)想刻录到光盘保存,或者有一堆手机不直接支持的格式的视频想在手机上播放,那么用H264编码转换和压缩它们是一个不错的选择。

mencoder 是一个很方便的视频编码程序,它几乎支持所有的视频格式,而且参数丰富、速度快。

首选你需要安装 mencoder 程序(下面分别是 Archlinux, Fedora, Ubuntu下的安装方式):

$ sudo pacman -S memcoder
$ sudo yum install mencoder
$ sudo apt-get install mencoder

然后就可以查看你当前系统支持哪些视频和音频编码器,以及支持哪些封装格式了:

$ mencoder -ovc help
$ mencoder -oac help
$ mencoder -of help

如果看到有x264视频编码器、有mp3lame音频编码器、以及有mp4封装格式,那么就可以开始下面的编码转换了,否则你可能需要安装相应的音频和视频编码器,一般安装 ffmpeg 组件就会同时附带这些编码器。

压缩一段MPEG2视频:

$ mencoder m001.mpg -o m001.mp4 -oac mp3lame -ovc x264 -of lavf -vf lavcdeint

上面命令中的 m001.mpg 和 m001.mp4 分别是输入和输出文件名,-oac 用于指定音频编码器,-ovc 指定视频编码器, -of 指定输出文件封装方式,lavf表示输出文件封装方式由输出的文件名(的扩展名)决定(比如m001.mp4表示用mp4封装,m001.avi表示用avi封装),最后 -vf lavcdeint 参数用于去除视频中的拉丝条纹(锯齿纹),如果没有的话不要这个参数也可以。

h264的编码过程比较耗时,比如 AMD 四核2.8G的编码速度大概是 30fps,大概是视频正常播放所需的时间。

如果待编码转换的视频文件很多,则最好写一个批量处理的脚本:

#!/bin/bash
find . -type f ( -name "*.mpg" -o -name "*.mpeg" )|while read line;do
echo $line
mencoder $line -o ${line}.mp4 -oac mp3lame -ovc x264 -of lavf -vf lavcdeint
done

执行上面的脚本会将当前目录里所有后缀名为“mpg”和“mpeg”的视频编码为H264格式。

最后,除了mencoder之外,还可以使用ffmpeg来编码,详细方法请参考这里

MYSQL入门全套

mysql简介

1、什么是数据库 ?

数据库(Database)是按照数据结构来组织、存储和管理数据的仓库,它产生于距今六十多年前,随着信息技术和市场的发展,特别是二十世纪九十年代以后,数据管理不再仅仅是存储和管理数据,而转变成用户所需要的各种数据管理的方式。数据库有很多种类型,从最简单的存储有各种数据的表格到能够进行海量数据存储的大型数据库系统都在各个方面得到了广泛的应用。

主流的数据库有:sqlserver,mysql,Oracle、SQLite、Access、MS SQL Server等,本文主要讲述的是mysql

2、数据库管理是干什么用的?

  • a. 将数据保存到文件或内存
  • b. 接收特定的命令,然后对文件进行相应的操作

PS:如果有了以上管理系统,无须自己再去创建文件和文件夹,而是直接传递 命令 给上述软件,让其来进行文件操作,他们统称为数据库管理系统(DBMS,Database Management System)

mysql安装

MySQL是一种开放源代码的关系型数据库管理系统(RDBMS),MySQL数据库系统使用最常用的数据库管理语言–结构化查询语言(SQL)进行数据库管理。在 WEB 应用方面MySQL是最好的 RDBMS (Relational Database Management System,关系数据库管理系统) 应用软件之一。

使用mysql必须具备一下条件

  •   a. 安装MySQL服务端
  •   b. 安装MySQL客户端
  •   b. 【客户端】连接【服务端】
  •   c. 【客户端】发送命令给【服务端MySQL】服务的接受命令并执行相应操作(增删改查等)

1、下载地址:http://dev.mysql.com/downloads/mysql/

2、安装

注:以上两个链接有完整的安装方式,撸主也是参考他的安装的,安装完以后mysql.server start启动mysql服务

mysql操作

一、连接数据库

mysql  -u user -p                   例:mysql -u root -p

常见错误如下:

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2), it means that the MySQL server daemon (Unix) or service (Windows) is not running.

退出连接:

QUIT 或者 Ctrl+D

二、查看数据库,创建数据库,使用数据库查看数据库: show databases;

默认数据库:
             mysql - 用户权限相关数据
             test - 用于用户测试数据
             information_schema - MySQL本身架构相关数据
 
创建数据库:     
               create database db1 DEFAULT CHARSET utf8 COLLATE utf8_general_ci;     # utf8编码                       create database db1 DEFAULT CHARACTER SET gbk COLLATE gbk_chinese_ci; # gbk编码
使用数据库:     use db1;

显示当前使用的数据库中所有表:SHOW TABLES;

三、用户管理

创建用户
    create user '用户名'@'IP地址' identified by '密码';
删除用户
    drop user '用户名'@'IP地址';
修改用户
    rename user '用户名'@'IP地址'; to '新用户名'@'IP地址'$$
修改密码
    set password for '用户名'@'IP地址' = Password('新密码')

注:用户权限相关数据保存在mysql数据库的user表中,所以也可以直接对其进行操作(不建议)

四、权限管理

mysql对于权限这块有以下限制:

all privileges          除grant外的所有权限
            select                  仅查权限
            select,insert           查和插入权限
            ...
            usage                   无访问权限
            alter                   使用alter table
            alter routine           使用alter procedure和drop procedure
            create                  使用create table
            create routine          使用create procedure
            create temporary tables 使用create temporary tables
            create user             使用create user、drop user、rename user和revoke  all privileges
            create view             使用create view
            delete                  使用delete
            drop                    使用drop table
            execute                 使用call和存储过程
            file                    使用select into outfile 和 load data infile
            grant option            使用grant 和 revoke
            index                   使用index
            insert                  使用insert
            lock tables             使用lock table
            process                 使用show full processlist
            select                  使用select
            show databases          使用show databases
            show view               使用show view
            update                  使用update
            reload                  使用flush
            shutdown                使用mysqladmin shutdown(关闭MySQL)
            super                   使用change master、kill、logs、purge、master和set global。还允许mysqladmin调试登陆
            replication client      服务器位置的访问
            replication slave       由复制从属使用

对于数据库及内部其他权限如下:

数据库名.*           数据库中的所有
            数据库名.表          指定数据库中的某张表
            数据库名.存储过程     指定数据库中的存储过程
            *.*                所有数据库

对于用户和IP的权限如下:

用户名@IP地址         用户只能在改IP下才能访问
            用户名@192.168.1.%   用户只能在改IP段下才能访问(通配符%表示任意)
            用户名@%             用户可以再任意IP下访问(默认IP地址为%)

1、查看权限:

show grants for '用户'@'IP地址'

2、授权

grant  权限 on 数据库.表 to   '用户'@'IP地址'

3、取消授权

revoke 权限 on 数据库.表 from '用户'@'IP地址'

授权实例如下:

grant all privileges on db1.tb1 TO '用户名'@'IP'

grant select on db1.* TO '用户名'@'IP'

grant select,insert on *.* TO '用户名'@'IP'

revoke select on db1.tb1 from '用户名'@'IP'

mysql表操作

1、查看表

show tables;                    # 查看数据库全部表

select * from 表名;             # 查看表所有内容

2、创建表

create table 表名(
    列名  类型  是否可以为空,
    列名  类型  是否可以为空
)ENGINE=InnoDB DEFAULT CHARSET=utf8

来一个实例好详解

CREATE TABLE `tab1` (
  `nid` int(11) NOT NULL auto_increment,                   # not null表示不能为空,auto_increment表示自增
  `name` varchar(255) DEFAULT zhangyanlin,                 # default 表示默认值
  `email` varchar(255),
  PRIMARY KEY (`nid`)                                      # 把nid列设置成主键
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

注:

  • 默认值,创建列时可以指定默认值,当插入数据时如果未主动设置,则自动添加默认值
  • 自增,如果为某列设置自增列,插入数据时无需设置此列,默认将自增(表中只能有一个自增列)注意:1、对于自增列,必须是索引(含主键)2、对于自增可以设置步长和起始值
  • 主键,一种特殊的唯一索引,不允许有空值,如果主键使用单个列,则它的值必须唯一,如果是多列,则其组合必须唯一。

3、删除表

drop table 表名

3、清空表内容

delete from 表名
truncate table 表名

4、修改表

添加列:   alter table 表名 add 列名 类型
删除列:   alter table 表名 drop column 列名
修改列:
          alter table 表名 modify column 列名 类型;  -- 类型
          alter table 表名 change 原列名 新列名 类型; -- 列名,类型
  
添加主键:
          alter table 表名 add primary key(列名);
删除主键:
          alter table 表名 drop primary key;
          alter table 表名  modify  列名 int, drop primary key;
  
添加外键: alter table 从表 add constraint 外键名称(形如:FK_从表_主表) foreign key 从表(外键字段) references 主表(主键字段);
删除外键: alter table 表名 drop foreign key 外键名称
  
修改默认值:ALTER TABLE testalter_tbl ALTER i SET DEFAULT 1000;
删除默认值:ALTER TABLE testalter_tbl ALTER i DROP DEFAULT;

对于上述这些操作是不是看起来很麻烦,很浪费时间,别慌!有专门的软件能提供这些功能,操作起来非常简单,这个软件名字叫Navicat Premium ,大家自行在网上下载,练练手,但是下面的即将讲到表内容操作还是建议自己写命令来进行

5、基本数据类型

MySQL的数据类型大致分为:数值、时间和字符串

bit[(M)]
            二进制位(101001),m表示二进制位的长度(1-64),默认m=1

        tinyint[(m)] [unsigned] [zerofill]

            小整数,数据类型用于保存一些范围的整数数值范围:
            有符号:
                -128 ~ 127.
            无符号:
                0 ~ 255

            特别的: MySQL中无布尔值,使用tinyint(1)构造。

        int[(m)][unsigned][zerofill]

            整数,数据类型用于保存一些范围的整数数值范围:
                有符号:
                    -2147483648 ~ 2147483647
                无符号:
                    0 ~ 4294967295

            特别的:整数类型中的m仅用于显示,对存储范围无限制。例如: int(5),当插入数据2时,select 时数据显示为: 00002

        bigint[(m)][unsigned][zerofill]
            大整数,数据类型用于保存一些范围的整数数值范围:
                有符号:
                    -9223372036854775808 ~ 9223372036854775807
                无符号:
                    0  ~  18446744073709551615

        decimal[(m[,d])] [unsigned] [zerofill]
            准确的小数值,m是数字总个数(负号不算),d是小数点后个数。 m最大值为65,d最大值为30。

            特别的:对于精确数值计算时需要用此类型
                   decaimal能够存储精确值的原因在于其内部按照字符串存储。

        FLOAT[(M,D)] [UNSIGNED] [ZEROFILL]
            单精度浮点数(非准确小数值),m是数字总个数,d是小数点后个数。
                无符号:
                    -3.402823466E+38 to -1.175494351E-38,
                    0
                    1.175494351E-38 to 3.402823466E+38
                有符号:
                    0
                    1.175494351E-38 to 3.402823466E+38

            **** 数值越大,越不准确 ****

        DOUBLE[(M,D)] [UNSIGNED] [ZEROFILL]
            双精度浮点数(非准确小数值),m是数字总个数,d是小数点后个数。

                无符号:
                    -1.7976931348623157E+308 to -2.2250738585072014E-308
                    0
                    2.2250738585072014E-308 to 1.7976931348623157E+308
                有符号:
                    0
                    2.2250738585072014E-308 to 1.7976931348623157E+308
            **** 数值越大,越不准确 ****


        char (m)
            char数据类型用于表示固定长度的字符串,可以包含最多达255个字符。其中m代表字符串的长度。
            PS: 即使数据小于m长度,也会占用m长度
        varchar(m)
            varchars数据类型用于变长的字符串,可以包含最多达255个字符。其中m代表该数据类型所允许保存的字符串的最大长度,只要长度小于该最大值的字符串都可以被保存在该数据类型中。

            注:虽然varchar使用起来较为灵活,但是从整个系统的性能角度来说,char数据类型的处理速度更快,有时甚至可以超出varchar处理速度的50%。因此,用户在设计数据库时应当综合考虑各方面的因素,以求达到最佳的平衡

        text
            text数据类型用于保存变长的大字符串,可以组多到65535 (2**16 − 1)个字符。

        mediumtext
            A TEXT column with a maximum length of 16,777,215 (2**24 − 1) characters.

        longtext
            A TEXT column with a maximum length of 4,294,967,295 or 4GB (2**32 − 1) characters.


        enum
            枚举类型,
            An ENUM column can have a maximum of 65,535 distinct elements. (The practical limit is less than 3000.)
            示例:
                CREATE TABLE shirts (
                    name VARCHAR(40),
                    size ENUM('x-small', 'small', 'medium', 'large', 'x-large')
                );
                INSERT INTO shirts (name, size) VALUES ('dress shirt','large'), ('t-shirt','medium'),('polo shirt','small');

        set
            集合类型
            A SET column can have a maximum of 64 distinct members.
            示例:
                CREATE TABLE myset (col SET('a', 'b', 'c', 'd'));
                INSERT INTO myset (col) VALUES ('a,d'), ('d,a'), ('a,d,a'), ('a,d,d'), ('d,a,d');

        DATE
            YYYY-MM-DD(1000-01-01/9999-12-31)

        TIME
            HH:MM:SS('-838:59:59'/'838:59:59')

        YEAR
            YYYY(1901/2155)

        DATETIME

            YYYY-MM-DD HH:MM:SS(1000-01-01 00:00:00/9999-12-31 23:59:59    Y)

        TIMESTAMP

            YYYYMMDD HHMMSS(1970-01-01 00:00:00/2037 年某时)

mysql表内容操作

表内容操作无非就是增删改查,当然用的最多的还是查,而且查这一块东西最多,用起来最难,当然对于大神来说那就是so easy了,对于我这种小白还是非常难以灵活运用的,下面咱来一一操作一下

1、增

insert into 表 (列名,列名...) values (值,值,...)
insert into 表 (列名,列名...) values (值,值,...),(值,值,值...)
insert into 表 (列名,列名...) select (列名,列名...) from 表
例:
    insert into tab1(name,email) values('zhangyanlin','[email protected]')

2、删

delete from 表                                      # 删除表里全部数据
delete from 表 where id=1 and name='zhangyanlin'   # 删除ID =1 和name='zhangyanlin' 那一行数据

3、改

update 表 set name = 'zhangyanlin' where id>1

4、查

select * from 表
select * from 表 where id > 1
select nid,name,gender as gg from 表 where id > 1

查这块的条件太多太多我给列举出来至于组合还得看大家的理解程度哈

a、条件判断where

select * from 表 where id > 1 and name != 'aylin' and num = 12;
    select * from 表 where id between 5 and 16;
    select * from 表 where id in (11,22,33)
    select * from 表 where id not in (11,22,33)
    select * from 表 where id in (select nid from 表)

b、通配符like

select * from 表 where name like 'zhang%'  # zhang开头的所有(多个字符串)
    select * from 表 where name like 'zhang_'  # zhang开头的所有(一个字符)

c、限制limit

select * from 表 limit 5;            - 前5行
    select * from 表 limit 4,5;          - 从第4行开始的5行
    select * from 表 limit 5 offset 4    - 从第4行开始的5行

d、排序asc,desc

select * from 表 order by 列 asc              - 根据 “列” 从小到大排列
    select * from 表 order by 列 desc             - 根据 “列” 从大到小排列
    select * from 表 order by 列1 desc,列2 asc    - 根据 “列1” 从大到小排列,如果相同则按列2从小到大排序

e、分组group by

select num from 表 group by num
    select num,nid from 表 group by num,nid
    select num,nid from 表  where nid > 10 group by num,nid order nid desc
    select num,nid,count(*),sum(score),max(score),min(score) from 表 group by num,nid
    select num from 表 group by num having max(id) > 10
 
    特别的:group by 必须在where之后,order by之前