nmcli使用方法非常类似linux ip命令、cisco交换机命令,并且支持tab补全,也可在命令最后通过-h、–help、help查看帮助。在nmcli中有2个命令最为常用:

nmcli [ OPTIONS ] OBJECT { COMMAND | help }



nmcli          ##查看ip(类似于ifconfig、ip addr)

nmcli device status      ##所有接口的简略信息

nmcli device show       ##所有接口的详细信息

nmcli device show interface-name     ##特定接口的详细信息

nmcli connection show         ##所有连接的简略信息

nmcli connection show –active      ##显示激活的连接

nmcli connection show inteface-name   ##某个接口的详细连接信息

nmcli connection up connection-name
nmcli device connect interface-name

nmcli connection down connection-name    ##这个操作当取消一个激活后,如果有其它连接会自动激活其它连接
nmcli device disconnect interface-name     ##这个操作会取消接口上的激活,如果有其它连接也不会自动激活其它连接
建议使用 nmcli device disconnect(connect) interface-name,因为连接断开可将该接口放到“手动”模式,这样做用户让 NetworkManager 启动某个连接前,或发生外部事件(比如载波变化、休眠或睡眠)前,不会启动任何自动连接。

nmcli connection add type ethernet con-name connection-name ifname interface-name

add表示添加连接,type后面是指定创建连接时候必须指定类型,类型有很多,可以通过nmcli c add type -h看到,这里指定为ethernet。con-name后面是指定创建连接的名字,ifname后面是指定物理设备,网络接口

例子:nmcli connection add type ethernet con-name dhcp-ens33 ifname ens33

nmcli connection add type ethernet con-name connection-name ifname interface-name ipv4.method manual ipv4.addresses address ipv4.gateway address


例子:nmcli connection add type ethernet con-name static-enp0s3 ifname enp0s3 ipv4.method manual ipv4.addresses ipv4.gateway
注意:创建连接后,NetworkManager 自动将 connection.autoconnect 设定为 yes。还会将设置保存到 /etc/sysconfig/network-scripts/connection-name 文件中,且自动将 ONBOOT 参数设定为 yes。



nmcli connection modify connection-name ipv4.addresses     ##如果已经存在ip会更改现有ip

nmcli connection modify connection-name ipv4.addresses


nmcli connection modify connection-name ipv4.method manual


nmcli connection modify connection-name ipv4.method auto


nmcli connection modify connection-name ipv4.dns


nmcli connection modify connection-name -ipv4.dns (注意这里的减号)


nmcli connection modify connection-name ipv4.gateway


nmcli connection modify connection-name ipv4.dns ipv4.gateway

nmcli connection modify connection-name connection.autoconnect no/on


nmcli connection modify connection-name +ipv4.routes “”

这样会将 子网的流量指向位于 的网关,同时在 /etc/sysconfig/network-scripts/目录下生产一个route-connection-name的文件,这里记录了这个连接的路由信息


nmcli connection reload
nmcli connection load /etc/sysconfig/network-scripts/ifcfg-connection-name
nmcli connection load /etc/sysconfig/network-scripts/route-connection-name

nmcli connection delete connection-name

nmcli general hostname

nmcli general hostname new-hostname

systemctl restart systemd-hostnamed
注意:CentOS7 / Redhat7 下的主机名管理是基于系统服务systemd-hostnamed,服务自身提供了hostnamectl命令用于修改主机名,推荐这种方式进行修改;
使用nmcli命令更改主机名时,systemd-hostnamed服务并不知晓 /etc/hostname 文件被修改,因此需要重启服务去读取配置;

How to Setup VyprVPN on the Raspberry Pi

In this tutorial, I will be going through all the steps to setting up Raspberry Pi VyprVPN.

Raspberry Pi VyprVPN

This tutorial is handy if you’re looking to connect your Pi to the VyprVPN service.

There are many reasons why you may want to set up a VPN on the Raspberry Pi. The most common is that you want an extra layer of security and anonymity to your network activities. These benefits are handy for a range of different Raspberry Pi projects.

Most of our projects have been tested for the latest version of Raspbian. I recommend upgrading to the most recent for the best experience when following this tutorial.

If VyprVPN doesn’t take your fancy, then we do have other tutorials that cover services such as ExpressVPN or NordVPN.

You can find the tutorial right below if you have any issues then be sure to let us know over at our forum.


All the equipment that you need to set up this Raspberry Pi VyprVPN tutorial is listed right below.


 Raspberry Pi

 Micro SD Card

 Ethernet Cable or WiFi dongle (Pi 3 has WiFi inbuilt)

 Power Adapter

 VyprVPN Subscription


 Raspberry Pi Case

 USB Keyboard

 USB Mouse

 Installing VyprVPN to the Raspberry Pi

VyprVPN isn’t much different to installing most VPN services on the Raspberry Pi as most make use of the OpenVPN software.

1. If you haven’t already, then you will need to sign up to VyprVPN.

2. Load the terminal on the Raspberry Pi or make use of SSH to remotely it access.

3. Update the Raspbian to the latest packages.

sudo apt-get update
sudo apt-get upgrade

4. Now, let’s install the OpenVPN package, you can do this by entering the following command.

sudo apt-get install openvpn

5. Change directory to the OpenVPN directory by entering the following.

cd /etc/openvpn/

6. We will now need to download the VyprVPN ovpn files.

sudo wget -O vyprvpn.zip \

7. Next, we will now need to extract the files that we need.

sudo unzip vyprvpn.zip

8. Now let’s move all the files to the base directory and delete VyprVPN directory.

sudo mv /etc/openvpn/OpenVPN256/* /etc/openvpn/
sudo rm -r /etc/openvpn/OpenVPN256

9. To connect to VyprVPN simply use the following command.

sudo openvpn file_name

Replace file_name with the location of where you wish to connect. For example, If I wanted Canada for example, then I will use Canada.ovpn. You can view all the locations by using the following command.

ls -l /etc/openvpn

Below is an example of connecting to Canada.

sudo openvpn /etc/openvpn/Canada.ovpn

10. It will now ask for your credentials, and you will need to enter them to be able to connect to VyprVPN. Test your connection by going ipleak.net. You should have a different IP to your usual one.

11. If you need to disconnect, then you can easily use either ctrl+c or the following command.

sudo killall openvpn

 Auto Start VyprVPN

Most of us love to reduce the amount of manual input required for when it comes to technology. The following steps will show you how to set up VyprVPN to connect automatically on bootup.

1. Firstly, we will need to save both our username and password in a file.

sudo nano /etc/openvpn/auth.txt

2. In this file, add your chosen username and password for the service. Make sure the username and password are both on separate lines.


3. Save and exit by pressing ctrl+x, then y and lastly enter.

4. Now we will need to copy the ovpn file, simplify its name at the same time.

sudo cp "/etc/openvpn/Australia - Sydney.ovpn" /etc/openvpn/aussyd.conf

5. Now let’s edit this new file.

sudo nano /etc/openvpn/aussyd.conf

6. We will only need to do a straightforward edit in this file.



Replace with

auth-user-pass auth.txt

7. Finally, we need to setup OpenVPN to auto start using our ovpn file.

sudo nano /etc/default/openvpn



Replace with


Replace aussyd with the filename you set.

8. Save and exit.

9. Reboot the Raspberry Pi to test out our new configuration.

sudo reboot

10. Now test the VPN by going to ipleak.net or a similar website. The IP should be VyprVPNs and not your own. Doing this step will confirm that we have successfully set up VyprVPN on the Raspberry Pi.

 Preventing DNS Leaks

To ensure that your DNS isn’t leaking your location you will need to do a tweak on your Pi. To fix this, we will simply force our DNS to run through Cloudflare’s public DNS rather than our internet service providers (ISP) DNS. This process is pretty easy and won’t take long to do.

1. Firstly, load into the dhcpcd configuration file and update the following line.


sudo nano /etc/dhcpcd.conf


#static domain_name_servers=

Replace with

static domain_name_servers=

2. Save & exit the file.

3. Now reboot your Pi by entering the following command.

sudo reboot

4. Go to ipleak.net and check that your DNS is no longer leaking. If you’re still leaking. then you might want to look at this page on WebRTC requests for more information.


If you run into trouble while setting up Raspberry Pi VyprVPN then the troubleshooting tips might help you out.

  • You’re able to start and stop your VPN by using the following command. Replacing stop with start will start the VPN backup. This command will only work if you have it set up for autostart.
sudo systemctl stop openvpn
  • It’s important to be aware that we are storing credentials in plain text. This lack of security makes it essential that you keep your Pi secure against unauthorized access. Just changing the default password will heavily improve your security.

As I mentioned above, there is plenty of other projects that work great with a VPN. Something as simple as a Torrentbox will benefit. Just make sure your VPN provider allows torrenting as some will ban you for using up too much bandwidth.

Hopefully, by the end of this Raspberry Pi VyprVPN tutorial, you have everything set up and working as it should be. If you require further help, then I highly recommend that you leave a comment.

Top 10 Python Libraries You Must Know in 2019

In this article, we will discuss some of the top libraries in Python that can be used by developers to prase, clean, and represent data and implement machine learning in their existing applications.

We will be considering the following 10 libraries:

  • TensorFlow
  • Scikit-Learn
  • Numpy
  • Keras
  • PyTorch
  • LightGBM
  • Eli5
  • SciPy
  • Theano
  • Pandas

Image title


Python is one of the most popular and widely used programming languages and has replaced many programming languages in the industry.

There are many reasons why Python is popular among developers. However, one of the most significant is its large collection of libraries that users can work with.

The simplicity of Python has attracted many developers to create new libraries for machine learning. Because of the huge collection of libraries, Python is becoming hugely popular among machine learning experts.

So, the first library is TensorFlow.


Top 10 Python Libraries - Edureka

What Is TensorFlow?

If you are currently working on a machine learning project in Python, then you may have heard about this popular open-source library known as TensorFlow.

This library was developed by Google in collaboration with the Brain Team. TensorFlow is used in almost every Google application for machine learning.

TensorFlow works like a computational library for writing new algorithms that involve a large number of tensor operations. Since neural networks can be easily expressed as computational graphs, they can be implemented using TensorFlow as a series of operations on Tensors. Plus, tensors are N-dimensional matrices that represent your data.

Features of TensorFlow

TensorFlow is optimized for speed, and it makes use of techniques like XLA for quick linear algebra operations.

1. Responsive Construct

With TensorFlow, we can easily visualize each and every part of the graph, which is not an option while using Numpy or SciKit.

2. Flexible

One of the very important Tensorflow Features is that it is flexible in its operability, meaning it has modularity, and for the parts of it that you want to make stand alone, it offers you that option.

3. Easily Trainable

It is easily trainable on CPU as well as GPU for distributed computing.

4. Parallel Neural Network Training

TensorFlow offers pipelining, in the sense that you can train multiple neural networks and multiple GPUs, which makes the models very efficient on large-scale systems.

5. Large Community

Needless to say, if it has been developed by Google, there is already a large team of software engineers who work on stability improvements continuously.

6. Open Source

The best thing about this machine learning library is that it is open source, so anyone can use it as long as they have internet connectivity.

Where Is TensorFlow Used?

You are using TensorFlow daily but indirectly with applications like Google Voice Search or Google Photos. These applications are developed using this library.

All the libraries created in TensorFlow are written in C and C++. However, it has a complicated frontend for Python. Your Python code will get compiled and then executed on TensorFlow distributed execution engine built using C and C++.

The number of applications of TensorFlow is literally unlimited, and that is the beauty of TensorFlow.


Top 10 Python Libraries - Edureka

What Is Scikit-learn?

It is a Python library is associated with NumPy and SciPy. It is considered one of the best libraries for working with complex data.

There are a lot of changes being made in this library. One modification is the cross-validation feature, providing the ability to use more than one metric. Lots of training methods like logistics regression and nearest neighbors have received some little improvements.

Features Of Scikit-Learn

1. Cross-validation: There are various methods to check the accuracy of supervised models on unseen data.

2.Unsupervised learning algorithms: Again, there is a large spread of algorithms in the offering — starting from clustering, factor analysis, and principal component analysis to unsupervised neural networks.

3. Feature extraction: Useful for extracting features from images and text (e.g. Bag of words

Where Is Scikit-Learn Used?

It contains a numerous number of algorithms for implementing standard machine learning and data mining tasks like reducing dimensionality, classification, regression, clustering, and model selection.


Top 10 Python Libraries - Edureka

What Is Numpy?

Numpy is considered one of the most popular machine learning libraries in Python.

TensorFlow and other libraries use Numpy internally for performing multiple operations on Tensors. Array interface is the best and the most important feature of Numpy.

Features Of Numpy

  1. Interactive: Numpy is very interactive and easy to use
  2. Mathematics: Makes complex mathematical implementations very simple
  3. Intuitive: Makes coding real easy and grasping the concepts is easy
  4. Lots of Interaction: Widely used, hence a lot of open source contribution

Where Is Numpy Used?

This interface can be utilized for expressing images, sound waves, and other binary raw streams as an array of real numbers in N-dimensional.

For implementing this library for machine learning, having knowledge of Numpy is important for full-stack developers.


Top 10 Python Libraries - Edureka

What Is Keras?

Keras is considered one of the coolest machine learning libraries in Python. It provides an easier mechanism to express neural networks. Keras also provides some of the best utilities for compiling models, processing data-sets, visualization of graphs, and much more.

In the backend, Keras uses either Theano or TensorFlow internally. Some of the most popular neural networks like CNTK can also be used. Keras is comparatively slow when we compare it with other machine learning libraries because it creates a computational graph by using back-end infrastructure and then makes use of it to perform operations. All the models in Keras are portable.

Features Of Keras

  • It runs smoothly on both CPU and GPU.
  • Keras supports almost all the models of a neural network — fully connected, convolutional, pooling, recurrent, embedding, etc. Furthermore, these models can be combined to build more complex models.
  • Keras, being modular in nature, is incredibly expressive, flexible, and apt for innovative research.
  • Keras is a completely Python-based framework, which makes it easy to debug and explore.

Where Is Keras Used?

You are already constantly interacting with features built with Keras — it is in use at Netflix, Uber, Yelp, Instacart, Zocdoc, Square, and many others. It is especially popular among startups that place deep learning at the core of their products.

Keras contains numerous implementations of commonly used neural network building blocks such as layers, objectives, activation functions, optimizers and a host of tools to make working with image and text data easier.

Plus, it provides many pre-processed data-sets and pre-trained models like MNIST, VGG, Inception, SqueezeNet, ResNet, etc.

Keras is also a favorite among deep learning researchers, coming in at #2. Keras has also been adopted by researchers at large scientific organizations, in particular, CERN and NASA.


Top 10 Python Libraries - Edureka

What Is PyTorch?

PyTorch is the largest machine learning library that allows developers to perform tensor computations with the acceleration of GPU, creates dynamic computational graphs, and calculate gradients automatically. Other than this, PyTorch offers rich APIs for solving application issues related to neural networks.

This machine learning library is based on Torch, which is an open-source machine library implemented in C with a wrapper in Lua.

This machine library, in Python, was introduced in 2017, and since its inception, the library is gaining popularity and attracting an increasing number of machine learning developers.

Features Of PyTorch

Hybrid Front-End

A new hybrid frontend provides ease-of-use and flexibility in eager mode, while seamlessly transitioning to graph mode for speed, optimization, and functionality in C++ runtime environments.

Distributed Training

Optimize performance in both research and production by taking advantage of native support for asynchronous execution of collective operations and peer-to-peer communication that is accessible from Python and C++.

Python First

PyTorch is not a Python binding into a monolithic C++ framework. It’s built to be deeply integrated into Python so it can be used with popular libraries and packages such as Cython and Numba.

Libraries and Tools

An active community of researchers and developers have built a rich ecosystem of tools and libraries for extending PyTorch and supporting development in areas from computer vision to reinforcement learning.

Where Is PyTorch Used?

PyTorch is primarily used for applications such as natural language processing.

It is primarily developed by Facebook’s artificial-intelligence research group and Uber’s “Pyro” software for probabilistic programming is built on it.

PyTorch is outperforming TensorFlow in multiple ways and it is gaining a lot of attention in recent days.


Top 10 Python Libraries - Edureka

What Is LightGBM?

Gradient Boosting is one of the best and most popular machine learning(ML) library, which helps developers in building new algorithms by using redefined elementary models and namely decision trees. Therefore, there are special libraries that are designed for fast and efficient implementation of this method.

These libraries are LightGBM, XGBoost, and CatBoost. All these libraries are competitors that help in solving a common problem and can be utilized in almost a similar manner.

Features of LightGBM

Very fast computation ensures high production efficiency.

Intuitive, hence makes it user-friendly.

Faster training than many other deep learning libraries.

Will not produce errors when you consider NaN values and other canonical values.

Where Is LightGBM Used?

This library provides highly scalable, optimized, and fast implementations of gradient boosting, which makes it popular among machine learning developers. Because most of the machine learning full-stack developers won machine learning competitions by using these algorithms.


Top 10 Python Libraries - Edureka

What Is Eli5?

Most often, the results of machine learning model predictions are not accurate, and Eli5 machine learning library built-in Python helps in overcoming this challenge. It is a combination of visualization and debugs all the machine learning models and tracks all working steps of an algorithm.

Features of Eli5

Moreover, Eli5 supports other libraries XGBoost, lightning, scikit-learn, and sklearn-crfsuite libraries. All the above-mentioned libraries can be used to perform different tasks using each one of them.

Where Is Eli5 Used?

  • Mathematical applications that require a lot of computation in a short time.
  • Eli5 plays a vital role where there are dependencies with other Python packages.
  • Legacy applications and implementing newer methodologies in various fields.


Top 10 Python Libraries - Edureka

What Is SciPy?

SciPy is a machine learning library for application developers and engineers. However, you still need to know the difference between SciPy library and SciPy stack. SciPy library contains modules for optimization, linear algebra, integration, and statistics.

Features Of SciPy

The main feature of the SciPy library is that it is developed using NumPy, and its array makes the most use of NumPy.

In addition, SciPy provides all the efficient numerical routines like optimization, numerical integration, and many others using its specific submodules.

All the functions in all submodules of SciPy are well documented.

Where Is SciPy Used?

SciPy is a library that uses NumPy for the purpose of solving mathematical functions. SciPy uses NumPy arrays as the basic data structure and comes with modules for various commonly used tasks in scientific programming.

Tasks including linear algebra, integration (calculus), ordinary differential equation solving and signal processing are handled easily by SciPy.


Top 10 Python Libraries - Edureka

What Is Theano?

Theano is a computational framework machine learning library in Python for computing multidimensional arrays. Theano works similar to TensorFlow, but it not as efficient as TensorFlow. Because of its inability to fit into production environments.

Moreover, Theano can also be used on a distributed or parallel environments just similar to TensorFlow.

Features Of Theano

  • Tight integration with NumPy – Ability to use completely NumPy arrays in Theano-compiled functions.
  • Transparent use of a GPU – Perform data-intensive computations much faster than on a CPU.
  • Efficient symbolic differentiation – Theano does your derivatives for functions with one or many inputs.
  • Speed and stability optimizations – Get the right answer for log(1+x) even when x is very tiny. This is just one of the examples to show the stability of Theano.
  • Dynamic C code generation – Evaluate expressions faster than ever before, thereby increasing efficiency by a lot.
  • Extensive unit-testing and self-verification – Detect and diagnose multiple types of errors and ambiguities in the model.

Where Is Theano Used?

The actual syntax of Theano expressions is symbolic, which can be off-putting to beginners used to normal software development. Specifically, an expression is defined in the abstract sense, compiled, and later actually used to make calculations.

It was specifically designed to handle the types of computation required for large neural network algorithms used in Deep Learning. It was one of the first libraries of its kind (development started in 2007) and is considered an industry standard for Deep Learning research and development.

Theano is being used in multiple neural network projects today, and the popularity of Theano is only growing with time.


Top 10 Python Libraries - Edureka

What Is Pandas?

Pandas is a machine learning library in Python that provides data structures of high-level and a wide variety of tools for analysis. One of the great features of this library is the ability to translate complex operations with data using one or two commands. Pandas has so many inbuilt methods for grouping, combining data, filtering, as well as time-series functionality.

All these are followed by outstanding speed indicators.

Features Of Pandas

Pandas makes sure that the entire process of manipulating data will be easier. Support for operations such as Re-indexing, Iteration, Sorting, Aggregations, Concatenations, and Visualizations are among the feature highlights of Pandas.

Where Is Pandas Used?

Currently, there are fewer releases of the Pandas library, which includes hundreds of new features, bug fixes, enhancements, and changes in API. The improvements in Pandas are its ability to group and sort data, select the best-suited output for the applied method, and provide support for performing custom types operations.

Data Analysis, among everything else, takes the highlight when it comes to using Pandas. But when used with other libraries and tools, Pandas ensures high functionality and a good amount of flexibility.

That’s it, folks! I hope this article helped you kickstart your learning the libraries available in Python.

18.4 systemd-journald.service 簡介

過去只有rsyslogd 的年代中,由於rsyslogd 必須要開機完成並且執行了rsyslogd 這個daemon 之後,登錄文件才會開始記錄。所以,核心還得要自己產生一個klogd 的服務, 才能將系統在開機過程、啟動服務的過程中的信息記錄下來,然後等rsyslogd 啟動後才傳送給它來處理~

現在有了systemd 之後,由於這玩意兒是核心喚醒的,然後又是第一支執行的軟件,它可以主動調用systemd-journald 來協助記載登錄文件~ 因此在開機過程中的所有信息,包括啟動服務與服務若啟動失敗的情況等等,都可以直接被記錄到systemd-journald 裡頭去!

不過systemd-journald 由於是使用於內存的登錄文件記錄方式,因此重新開機過後,開機前的登錄文件信息當然就不會被記載了。為此,我們還是建議啟動rsyslogd 來協助分類記錄!也就是說, systemd-journald 用來管理與查詢這次開機後的登錄信息,而rsyslogd 可以用來記錄以前及現在的所以數據到磁盤文件中,方便未來進行查詢喔!


Tips雖然systemd-journald所記錄的數據其實是在內存中,但是系統還是利用文件的型態將它記錄到/run/log/下面!不過我們從前面幾章也知道, /run在CentOS 7其實是內存內的數據,所以重新開機過後,這個/run/log下面的數據當然就被刷新,舊的當然就不再存在了!

18.4.1 使用journalctl 觀察登錄信息

那麼systemd-journald.service 的數據要如何叫出來查閱呢?很簡單!就通過journalctl 即可!讓我們來瞧瞧這個指令可以做些什麼事?

[root@study ~]# journalctl [-nrpf] [--since TIME] [--until TIME] _optional
默认会秀出全部的 log 内容,从旧的输出到最新的讯息
-n  :秀出最近的几行的意思~找最新的信息相当有用
-r  :反向输出,从最新的输出到最旧的数据
-p  :秀出后面所接的讯息重要性排序!请参考前一小节的 rsyslogd 信息
-f  :类似 tail -f 的功能,持续显示 journal 日志的内容(实时监测时相当有帮助!)
--since --until:设置开始与结束的时间,让在该期间的数据输出而已
_SYSTEMD_UNIT=unit.service :只输出 unit.service 的信息而已
_COMM=bash :只输出与 bash 有关的信息
_PID=pid   :只输出 PID 号码的信息
_UID=uid   :只输出 UID 为 uid 的信息
SYSLOG_FACILITY=[0-23] :使用 syslog.h 规范的服务相对序号来调用出正确的数据!

范例一:秀出目前系统中所有的 journal 日志数据
[root@study ~]# journalctl
-- Logs begin at Mon 2015-08-17 18:37:52 CST, end at Wed 2015-08-19 00:01:01 CST. --
Aug 17 18:37:52 study.centos.vbird systemd-journal[105]: Runtime journal is using 8.0M (max 
 142.4M, leaving 213.6M of free 1.3G, current limit 142.4M).
Aug 17 18:37:52 study.centos.vbird systemd-journal[105]: Runtime journal is using 8.0M (max
 142.4M, leaving 213.6M of free 1.3G, current limit 142.4M).
Aug 17 18:37:52 study.centos.vbird kernel: Initializing cgroup subsys cpuset
Aug 17 18:37:52 study.centos.vbird kernel: Initializing cgroup subsys cpu
Aug 19 00:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[19268]: finished 0anacron
Aug 19 00:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[19270]: starting 0yum-hourly.cron
Aug 19 00:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[19274]: finished 0yum-hourly.cron
# 从这次开机以来的所有数据都会显示出来!通过 less 一页页翻动给管理员查阅!数据量相当大!

范例二:(1)仅显示出 2015/08/18 整天以及(2)仅今天及(3)仅昨天的日志数据内容
[root@study ~]# journalctl --since "2015-08-18 00:00:00" --until "2015-08-19 00:00:00"
[root@study ~]# journalctl --since today
[root@study ~]# journalctl --since yesterday --until today

范例三:只找出 crond.service 的数据,同时只列出最新的 10 笔即可
[root@study ~]# journalctl _SYSTEMD_UNIT=crond.service -n 10

范例四:找出 su, login 执行的登录文件,同时只列出最新的 10 笔即可
[root@study ~]# journalctl _COMM=su _COMM=login -n 10

范例五:找出讯息严重等级为错误 (error) 的讯息!
[root@study ~]# journalctl -p err

范例六:找出跟登录服务 (auth, authpriv) 有关的登录文件讯息
[root@study ~]# journalctl SYSLOG_FACILITY=4 SYSLOG_FACILITY=10
# 更多关于 syslog_facility 的数据,请参考 18.2.1 小节的内容啰!

基本上,有journalctl 就真的可以搞定你的訊息數據囉!全部的數據都在這裡面耶~再來假設一下,你想要了解到登錄文件的實時變化, 那又該如何處置呢?現在,請開兩個終端機,讓我們來處理處理!

# 第一号终端机,请使用下面的方式持续侦测系统!
[root@study ~]# journalctl -f
# 这时系统会好像卡住~其实不是卡住啦!是类似 tail -f 在持续的显示登录文件信息的!

# 第二号终端机,使用下面的方式随便发一封 email 给系统上的帐号!
[root@study ~]# echo "testing" | mail -s 'tset' dmtsai
# 这时,你会发现到第一号终端机竟然一直输出一些讯息吧!没错!这就对了!

如果你有一些必須要偵測的行為,可以使用這種方式來實時了解到系統出現的訊息~而取消journalctl -f 的方法,就是[crtl]+c 啊!

18.4.2 logger 指令的應用

上面談到的是叫出登錄文件給我們查閱,那換個角度想,“如果你想要讓你的數據儲存到登錄文件當中”呢?那該如何是好?這時就得要使用logger 這個好用的傢伙了!這個傢伙可以傳輸很多信息,不過,我們只使用最簡單的本機信息傳遞~ 更多的用法就請您自行man logger 囉!

[root@study ~]# logger [-p 服务名称.等级] "讯息"
服务名称.等级 :这个项目请参考 rsyslogd 的本章后续小节的介绍;

范例一:指定一下,让 dmtsai 使用 logger 来传送数据到登录文件内
[root@study ~]# logger -p user.info "I will check logger command"
[root@study ~]# journalctl SYSLOG_FACILITY=1 -n 3
-- Logs begin at Mon 2015-08-17 18:37:52 CST, end at Wed 2015-08-19 18:03:17 CST. --
Aug 19 18:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[29710]: starting 0yum-hourly.cron
Aug 19 18:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[29714]: finished 0yum-hourly.cron
Aug 19 18:03:17 study.centos.vbird dmtsai[29753]: I will check logger command

現在,讓我們來瞧一瞧,如果我們之前寫的backup.service 服務中,如果使用手動的方式來備份,亦即是使用”/backups/backup.sh log” 來執行備份時, 那麼就通過logger來記錄備份的開始與結束的時間!該如何是好呢?這樣作看看!

[root@study ~]# vim /backups/backup.sh

if [ "${1}" == "log" ]; then
        logger -p syslog.info "backup.sh is starting"
source="/etc /home /root /var/lib /var/spool/{cron,at,mail}"
target="/backups/backup-system-$(date +%Y-%m-%d).tar.gz"
[ ! -d /backups ] && mkdir /backups
tar -zcvf ${target} ${source} &> /backups/backup.log
if [ "${1}" == "log" ]; then
        logger -p syslog.info "backup.sh is finished"

[root@study ~]# /backups/backup.sh log
[root@study ~]# journalctl SYSLOG_FACILITY=5 -n 3
Aug 19 18:09:37 study.centos.vbird dmtsai[29850]: backup.sh is starting
Aug 19 18:09:54 study.centos.vbird dmtsai[29855]: backup.sh is finished


18.4.3 保存journal 的方式

再強調一次,這個systemd-journald.servicd 的訊息是不會放到下一次開機後的,所以,重新開機後,那之前的記錄通通會遺失。雖然我們大概都有啟動rsyslogd 這個服務來進行後續的登錄文件放置,不過如果你比較喜歡journalctl 的存取方式,那麼可以將這些數據儲存下來喔!

基本上,systemd-journald.service 的配置文件主要參考/etc/systemd/journald.conf 的內容,詳細的參數你可以參考man 5 journald.conf 的數據。因為默認的情況下面,配置文件的內容應該已經符合我們的需求,所以這邊鳥哥就不再修改配置文件了。只是如果想要保存你的journalctl 所讀取的登錄文件, 那麼就得要創建一個/var/log/journal 的目錄,並且處理一下該目錄的權限,那麼未來重新啟動systemd-journald.service 之後, 日誌登錄文件就會主動的複制一份到/var/log/journal 目錄下囉!

# 1\. 先处理所需要的目录与相关权限设置
[root@study ~]# mkdir /var/log/journal
[root@study ~]# chown root:systemd-journal /var/log/journal
[root@study ~]# chmod 2775 /var/log/journal

# 2\. 重新启动 systemd-journald 并且观察备份的日志数据!
[root@study ~]# systemctl restart systemd-journald.service
[root@study ~]# ll /var/log/journal/
drwxr-sr-x. 2 root systemd-journal 27 Aug 20 02:37 309eb890d09f440681f596543d95ec7a

你得要注意的是,因為現在整個日誌登錄文件的容量會持續長大,因此你最好還是觀察一下你係統能用的總容量喔!避免不小心文件系統的容量被灌爆!此外,未來在/run/log 下面就沒有相關的日誌可以觀察了!因為移動到/var/log/journal 下面來囉!

其實鳥哥是這樣想的,既然我們還有rsyslog.service 以及logrotate 的存在,因此這個systemd-journald.service 產生的登錄文件, 個人建議最好還是放置到/run/log 的內存當中,以加快存取的速度!而既然rsyslog.service 可以存放我們的登錄文件, 似乎也沒有必要再保存一份journal 登錄文件到系統當中就是了。單純的建議!如何處理,依照您的需求即可喔!

Go 語言很好很強大,但我有幾個問題想吐槽

Go 是一門非常不錯的編程語言。然而,我在公司的Slack 編程頻道中對Go 的抱怨卻越來越多(猜到我是做啥了的吧?),因此我認為有必要把這些吐槽寫下來並放在這裡,這樣當人們問我抱怨什麼時,我給他們一個鏈接就行了。



我這些批評全部是針對Go 語言的。但是,我對使用過的每種語言都有不滿。我非常贊同下面的話:

“世界上只有兩種語言:人們抱怨的語言和沒人使用的語言。” —— Bjarne Stroustrup

1 不支持函數式編程

我並不是一個函數式編程狂熱者。說到Lisp 語言,我首先想到的是語言障礙。

這可能是Go 語言最大的痛點了。與大部分人不同,我不希望Go 支持泛型,因為它會為多數Go 項目帶來不必要的複雜性。我希望Go 語言支持適用於內置切片和Map 的函數式方法。切片和Map 具有通用性,並且可以容納任何類型,從這個意義上講,它們已經非常神奇。在Go 語言中只有利用接口才能實現類似效果,但這樣一來將喪失安全性和速度。




existsBoth := []string{}
for _, first := range firstSlice {
for _, second := range secondSlice {
if first == second {
existsBoth = append(existsBoth, proxy)

上面是一個用Go 語言實現的簡單方案。當然還有其它方法,比如借助Map 來減少運行時間。這裡我們假設內存足夠用或者切片都不太大,同時假設優化運行時間帶來的複雜性遠超收益,因此不值得優化。作為對比,使用Java 流和函數式編程把相同的邏輯重寫如下:


var existsBoth = firstList.stream()
.filter(x -> secondList.contains(x))


與Go 代碼相比,Java 代碼的意圖一目了然。真正靈活之處在於,添加更多的過濾條件易如反掌。如果使用Go 語言添加下面例子中的過濾條件,我們需要在嵌套的for 循環中再添加兩個if 條件。


var existsBoth = firstList.stream()
.filter(x -> secondList.contains(x))
.filter(x -> x.startsWith(needle))
.filter(x -> x.length() >= 5)

有些借助go generate 命令的項目可以幫你實現上面的一些功能。但是,如果缺少良好的IDE 支持,抽取循環中的語句作為單獨的方法是一件低效又麻煩的事情。

2 通道/ 並行切片處理

Go 通道通常都很好用。但它並不能提供無限的並發能力。它確實存在一些會導致永久阻塞的問題,但這些問題用競爭檢測器能很容易地解決。對於數量不確定或不知何時結束的流式數據,以及非CPU 密集型的數據處理方法,Go 通道都是很好的選擇。

Go 通道不太適合併行處理大小已知的切片。



幾乎在其它任何語言中,當列表或切片很大時,為了充分利用所有CPU 內核,通常都會使用並行流、並行Linq、Rayon、多處理或其它語法來遍歷列表。遍歷後的返回值是一個包含已處理元素的列表。如果元素足夠多,或者處理元素的函數足夠複雜,多核系統會更高效。

但是在Go 語言中,實現高效處理所需要做的事情卻並不顯而易見。

一種可能的解決方案是為切片中的每個元素都創建一個Go 例程。由於Go 例程的開銷很低,因此從某種程度上來說這是一個有效的策略。


toProcess := []int{1,2,3,4,5,6,7,8,9}
var wg sync.WaitGroup
for i, _ := range toProcess {
go func(j int) {
toProcess[j] = someSlowCalculation(toProcess[j])


這段代碼的第一個問題是增加了一個WaitGroup,並且必須要記得調用它的Add 和Done 方法。這增加了開發人員的工作量。如果弄錯了,這個程序不會產生正確的輸出,結果是要么輸出不確定,要么程序永不結束。此外,如果列表很長,你會為每個列表創建一個Go 例程。正如我之前所說,這不是問題,因為Go 能輕鬆搞定。問題在於,每個Go 例程都會爭搶CPU 時間片。因此,這不是執行該任務的最有效方式。



toProcess := []int{1,2,3,4,5,6,7,8,9}
var input = make(chan int, len(toProcess))
for i, _ := range toProcess {
input <- i
var wg sync.WaitGroup
for i := 0; i < runtime.NumCPU(); i++ {
go func(input chan int, output []int) {
for j := range input {
toProcess[j] = someSlowCalculation(toProcess[j])
}(input, toProcess)

上面的代碼創建了一個通道,然後遍歷切片,將索引值放入通道。接下來我們為每個CPU 內核創建一個Go 例程,操作系統會報告並處理相應的輸入,然後等待,直到所有操作完成。這裡有很多代碼需要理解。

然而,這種實現有待商榷。如果切片非常大,通道的緩衝區長度和切片大小相同,你可能不希望創建一個有這麼大緩衝區的通道。因此,你應該創建另一個Go 例程來遍歷切片,並將切片中的值放入通道,完成後關閉通道。但這樣一來代碼會變得冗長,因此我把它去掉了。我希望可以大概地闡明基本思路。

使用Java 語言大致這樣實現:


var firstList = List.of(1,2,3,4,5,6,7,8,9);
firstList = firstList.parallelStream()

通道和流並不等價。使用隊列去仿寫Go 代碼的邏輯更好一些,因為它們更具有可比性,但我們的目的不是進行1 對1 的比較。我們的目標是充分利用所有的CPU 內核處理切片或列表。

如果someSlowCalucation 方法調用了網絡或其它非CPU 密集型任務,這當然不是問題。在這種情況下,通道和Go 例程都會表現得很好。

這個問題與問題#1 有關。如果Go 語言支持適用於切片/Map 對象的函數式方法,那麼就能實現這個功能。但是,如果Go 語言支持泛型,有人就可以把上面的功能封裝成像Rust 的Rayon 一樣的庫,讓每個人都從中受益,這就很令人討厭了(我不希望Go 支持泛型)。

順便說一下,我認為這個缺陷妨礙了Go 語言在數據科學領域的成功,這也是為什麼Python 仍然是數據科學領域的王者。Go 語言在數值操作方面缺乏表現力和能力,原因就是以上討論的這些。

3 垃圾回收器

Go 的垃圾回收器做得非常不錯。我開發的應用程序通常都會因為新版本的改進而變得更快。但是,它以低延遲為最高優先級。對於API 和UI 應用來說,這個選擇完全可以接受。對於包含網絡調用的應用,因為網絡調用往往會是瓶頸,所以它也沒問題。


缺乏對GC 的控制時常令人沮喪。你得學會適應它,但是,有時候如果能做到這樣該有多好:“嘿,這些代碼確實需要盡可能快地運行,所以如果你能在高吞吐模式運行一會,那就太好了。”


我認為這種情況在Go 1.12 版本中有所改善,因為GC 得到了進一步的改進。但僅僅是關閉和打開GC 還不夠,我期望更多的控制。如果有時間我會再進行研究。

4 錯誤處理



value, err := someFunc()
if err != nil {
// Do something here
err = someOtherFunc(value)
if err != nil {
// Do something here

上面的代碼很乏味。Go 甚至不會像有些人建議的那樣強制你處理錯誤。你可以使用“_”顯式忽略它(這是否算作對它進行了處理呢?),你還可以完全忽略它。比如上面的代碼可以重寫為:


value, _ := someFunc()

很顯然,我顯式忽略了someFunc 方法的返回。someOtherFunc(value)方法也可能返回錯誤值,但我完全忽略了它。這裡的錯誤都沒有得到處理。

說實話,我不知道如何解決這個問題。我喜歡Rust中的“?”運算符,它可以幫助避免這種情況。V-Lang https://vlang.io/看起來也可能有一些有趣的解決方案。

另一個辦法是使用可選類型(Optional types)並去掉nil,但這不會發生在Go 語言裡,即使是Go 2.0 版本,因為它會破壞向後兼容性。


Go 仍然是一種非常不錯的語言。如果你讓我寫一個API,或者完成某個需要大量磁盤/ 網絡調用的任務,它依然是我的首選。現在我會用Go 而非Python 去完成很多一次性任務,數據合併任務是例外,因為函數式編程的缺失使執行效率難以達到要求。

與Java 不同,Go 語言盡量遵循“最小驚喜“原則。比如可以這樣比較字兩個符串是否相等:stringA == stringB。但如果你這樣比較兩個切片,那麼會產生編譯錯誤。這些都是很好的特性。


它仍然是我使用過的效率較高的語言之一。我會繼續使用它,雖然我希望https://vlang.io/能最終發布,並解決我的很多抱怨。V語言或Go 2.0,Nim或Rust。現在有很多很酷的新語言可以使用,我們開發人員真的要被寵壞了。



本博客Nginx 配置之安全篇

之前有細心的朋友問我,為什麼你的博客副標題是「專注WEB 端開發」,是不是少了「前端」的「前」。我想說的是,儘管我從畢業到現在七年左右的時間一直都在專業前端團隊從事前端相關工作,但這並不意味著我的知識體係就必須局限於前端這個範疇內。現在比較流行「全棧工程師」的概念,我覺得全棧意味著一個項目中,各個崗位所需要的技能你都具備,但並不一定意味著你什麼都需要做。你需要做什麼,更多是由能力、人員配比以及成本等各個因素所決定。儘管我現在的工作職責是在WEB 前端領域,但是我的關注點在整個WEB 端。


去年我用Lua + OpenResty替換了線上千萬級的PHP + Nginx服務,至今穩定運行,算是前端之外的一點嘗試。我一直認為學習任何知識很重要的一點是實踐,所以我一直都在折騰我的VPS,進行各種WEB安全、優化相關的嘗試。我打算從安全和性能兩方面介紹一下本博客所用Nginx的相關配置,今天先寫安全相關的。


大家可以看一下我的博客請求響應頭,有這麼一行server: nginx,說明我用的是Nginx服務器,但並沒有具體的版本號。由於某些Nginx漏洞只存在於特定的版本,隱藏版本號可以提高安全性。這只需要在配置裡加上這個就可以了:

server_tokens   off;

如果想要更徹底隱藏所用Web Server,可以修改Nginx源碼,把Server Name改掉再編譯,具體步驟可以自己搜索。需要提醒的是:如果你的網站支持SPDY,只改動網上那些文章寫到的地方還不夠,跟SPDY有關的代碼也要改。更簡單的做法是改用Tengine這個Nginx的增強版,並指定server_tag為off或者任何想要的值就可以了。另外,既然想要徹底隱藏Nginx,404、500等各種出錯頁也需要自定義。


proxy_hide_header        X-Powered-By;


由於我的博客只處理了GET、POST 兩種請求方法,而HTTP/1 協議還規定了TRACE 這樣的方法用於網絡診斷,這也可能會暴露一些信息。所以我針對GET、POST 以及HEAD 之外的請求,直接返回了444 狀態碼(444 是Nginx 定義的響應狀態碼,會立即斷開連接,沒有響應正文)。具體配置是這樣的:

NGINXif ($request_method !~ ^(GET|HEAD|POST)$ ) {
    return    444;


我的博客是由自己用ThinkJS 寫的Node 程序提供服務,Nginx 通過proxy_pass 把請求反向代理給Node 綁定的IP 和端口。在最終輸出時,我給響應增加了以下頭部:

NGINXadd_header  Strict-Transport-Security  "max-age=31536000";
add_header  X-Frame-Options  deny;
add_header  X-Content-Type-Options  nosniff;
add_header  Content-Security-Policy  "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://a.disquscdn.com; img-src 'self' data: https://www.google-analytics.com; style-src 'self' 'unsafe-inline'; frame-src https://disqus.com";




Content-Security-Policy(簡稱為CSP)用來指定頁面可以加載哪些資源,主要目的是減少XSS的發生。我允許了來自本站、disquscdn的外鏈JS,還允許內聯JS,以及在JS中使用eval;允許來自本站和google統計的圖片,以及內聯圖片(Data URI形式);允許本站外鏈CSS以及內聯CSS;允許iframe加載來自disqus的頁面。對於其他未指定的資源,都會走默認規則self,也就是只允許加載本站的。關於CSP的詳細介紹請看這裡



HTTPS 安全配置

啟用HTTPS 並正確配置了證書,意味著數據傳輸過程中無法被第三者解密或修改。有了HTTPS,也得合理配置好Web Server,才能發揮最大價值。我的博客關於HTTPS 這一塊有以下配置:

NGINXssl_certificate      /home/jerry/ssl/server.crt;
ssl_certificate_key  /home/jerry/ssl/server.key;
ssl_dhparam          /home/jerry/ssl/dhparams.pem;


ssl_prefer_server_ciphers  on;

ssl_protocols        TLSv1 TLSv1.1 TLSv1.2;


ssllabs test

如何配置ssl_ciphers可以參考這個網站。需要注意的是,這個網站默認提供的加密方式安全性較高,一些低版本客戶端並不支持,例如IE9-、Android2.2-和Java6-。如果需要支持這些老舊的客戶端,需要點一下網站上的「Yes, give me a ciphersuite that works with legacy / old software」鏈接。


chacha20 poly1305





關於ssl_dhparam的配置,可以參考這篇文章:Guide to Deploying Diffie-Hellman for TLS




搭建 Docker 环境

安装 Docker

Docker 软件包已经包括在默认的 CentOS-Extras 软件源里。因此想要安装 docker,只需要运行下面的 yum 命令:

yum install docker-io -y


docker -v


service docker start


chkconfig docker on

配置 Docker

因为国内访问 Docker Hub 较慢, 可以使用腾讯云提供的国内镜像源, 加速访问 Docker Hub


echo "OPTIONS='--registry-mirror=https://mirror.ccs.tencentyun.com'" >> /etc/sysconfig/docker
systemctl daemon-reload
service docker restart

Docker 的简单操作

任务时间:10min ~ 20min


下载一个官方的 CentOS 镜像到本地

docker pull centos


docker images


这时我们可以在刚才下载的 CentOS 镜像生成的容器内操作了。

生成一个 centos 镜像为模板的容器并使用 bash shell

docker run -it centos /bin/bash

这个时候可以看到命令行的前端已经变成了 [root@(一串 hash Id)] 的形式, 这说明我们已经成功进入了 CentOS 容器。

在容器内执行任意命令, 不会影响到宿主机, 如下

mkdir -p /data/simple_docker

可以看到 /data 目录下已经创建成功了 simple_docker 文件夹

ls /data



查看宿主机的 /data 目录, 并没有 simple_docker 文件夹, 说明容器内的操作不会影响到宿主机

ls /data


查看所有的容器信息, 能获取容器的id

docker ps -a


docker commit -m="备注" 你的CONTAINER_ID 你的IMAGE

请自行将 -m 后面的信息改成自己的容器的信息


恭喜你结束了 Docker 的教程并学会了 Docker 的一些基本操作,


使用docker attach命令
docker attach db3 或者 docker attach d48b21a7e439


db3是后台容器的NAMES,d48b21a7e439是容器的进程ID  CONTAINER ID
使用docker exec命令
docker exec -it db3 /bin/sh 或者 docker exec -it d48b21a7e439 /bin/sh

Jenkins 环境搭建

Jenkins 简介

Jenkins 是一个开源软件项目,是基于Java开发的一种持续集成工具,用于监控持续重复的工作,旨在提供一个开放易用的软件平台,使软件的持续集成变成可能。

Java 安装

首先我们需要准备 Java 环境,使用下面命令来安装 Java:

yum -y install java-1.8.0-openjdk-devel

Jenkins 安装

为了使用 Jenkins 仓库,我们要执行以下命令:

sudo wget -O /etc/yum.repos.d/jenkins.repo https://pkg.jenkins.io/redhat-stable/jenkins.repo
sudo rpm --import https://pkg.jenkins.io/redhat-stable/jenkins.io.key

如果您以前从 Jenkins 导入过 key,那么 rpm --import 将失败,因为您已经有一个 key。请忽略,继续下面步骤。


接着我们可以使用 yum 安装 Jenkins:

yum -y install jenkins

启动 Jenkins

任务时间:5min ~ 10min


启动 Jenkins 并设置为开机启动:

systemctl start jenkins.service
chkconfig jenkins on


Jenkins 默认运行在 8080端口。

稍等片刻,打开 http://<您的 CVM IP 地址>:8080 测试访问。

进入 Jenkins

任务时间:5min ~ 10min


登入 Jenkins 需要输入管理员密码,按照提示,我们使用以下命令查看初始密码:

cat /var/lib/jenkins/secrets/initialAdminPassword

复制密码,填入,进入 Jenkins。

定制 Jenkins

我们选择默认的 install suggested plugins 来安装插件。


请填入相应信息创建用户,然后即可登入 Jenkins 的世界。


恭喜,您已完成 搭建 Jenkins 环境搭建 实验。

搭建 Hadoop 伪分布式环境


  • CentOS 7.2 64位
  • OpenJDK-1.7
  • Hadoop-2.7



安装 SSH 客户端

任务时间:1min ~ 5min



sudo yum install openssh-clients openssh-server


ssh localhost


安装 JAVA 环境

任务时间:5min ~ 10min

安装 JDK


sudo yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel


配置 JAVA 环境变量


编辑 ~/.bashrc,在结尾追加:

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk


source ~/.bashrc


java -version
$JAVA_HOME/bin/java -version


安装 Hadoop

任务时间:10min ~ 15min

下载 Hadoop


wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz

安装 Hadoop


tar -zxf hadoop-2.7.4.tar.gz -C /usr/local


cd /usr/local
mv ./hadoop-2.7.4/ ./hadoop


/usr/local/hadoop/bin/hadoop version


Hadoop 伪分布式环境配置

任务时间:15min ~ 30min


设置 Hadoop 的环境变量

编辑 ~/.bashrc,在结尾追加如下内容:

export HADOOP_HOME=/usr/local/hadoop


source ~/.bashrc

修改 Hadoop 的配置文件



编辑 core-site.xml,修改<configuration></configuration>节点的内容为如下所示:

        <description>location to store temporary files</description>

同理,编辑 hdfs-site.xml,修改<configuration></configuration>节点的内容为如下所示:


格式化 NameNode


/usr/local/hadoop/bin/hdfs namenode -format


Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
Exiting with status 0

启动 NameNode 和 DataNode 守护进程




检查 NameNode 和 DataNode 是否正常启动:



[hadoop@VM_80_152_centos ~]$ jps
3689 SecondaryNameNode
3520 DataNode
3800 Jps
3393 NameNode

运行 Hadoop 伪分布式实例

任务时间:10min ~ 20min

Hadoop自带了丰富的例子,包括 wordcount、grep、sort 等。下面我们将以grep例子为教程,输入一批文件,从中筛选出符合正则表达式dfs[a-z.]+的单词并统计出现的次数。

查看 Hadoop 自带的例子

Hadoop 附带了丰富的例子, 执行下面命令可以查看:

cd /usr/local/hadoop
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar

在 HDFS 中创建用户目录

在 HDFS 中创建用户目录 hadoop:

/usr/local/hadoop/bin/hdfs dfs -mkdir -p /user/hadoop


本教程中,我们将以 Hadoop 所有的 xml 配置文件作为输入数据来完成实验。执行下面命令在 HDFS 中新建一个 input 文件夹并将 hadoop 配置文件上传到该文件夹下:

cd /usr/local/hadoop
./bin/hdfs dfs -mkdir /user/hadoop/input
./bin/hdfs dfs -put ./etc/hadoop/*.xml /user/hadoop/input

使用下面命令可以查看刚刚上传到 HDFS 的文件:

/usr/local/hadoop/bin/hdfs dfs -ls /user/hadoop/input



cd /usr/local/hadoop
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar grep /user/hadoop/input /user/hadoop/output 'dfs[a-z.]+'

上述命令以 HDFS 文件系统中的 input 为输入数据来运行 Hadoop 自带的 grep 程序,提取其中符合正则表达式 dfs[a-z.]+ 的数据并进行次数统计,将结果输出到 HDFS 文件系统的 output 文件夹下。


上述例子完成后的结果保存在 HDFS 中,通过下面命令查看结果:

/usr/local/hadoop/bin/hdfs dfs -cat /user/hadoop/output/*


1       dfsadmin
1       dfs.replication
1       dfs.namenode.name.dir
1       dfs.datanode.data.dir

删除 HDFS 上的输出结果

删除 HDFS 中的结果目录:

/usr/local/hadoop/bin/hdfs dfs -rm -r /user/hadoop/output

运行 Hadoop 程序时,为了防止覆盖结果,程序指定的输出目录不能存在,否则会提示错误,因此在下次运行前需要先删除输出目录。

关闭 Hadoop 进程

关闭 Hadoop 进程:







恭喜您已经完成了搭建 Hadoop 伪分布式环境的学习