One Program Written in Python, Go, and Rust

Python, Go, Rust mascots

Update (2019-07-04): Some kind folks have suggested changes on the implementations to make them more idiomatic, so the code here may differ from what’s currently in the repos.


This is a subjective, primarily developer-ergonomics-based comparison of the three languages from the perspective of a Python developer, but you can skip the prose and go to the code samplesthe performance comparison if you want some hard numbers, the takeaway for the tl;dr, or the PythonGo, and Rust diffimg implementations.

A few years ago, I was tasked with rewriting an image processing service. To tell whether my new service was creating the same output as the old given an image and one or more transforms (resize, make a circular crop, change formats, etc.), I had to inspect the images myself. Clearly I needed to automate this, but I could find no existing Python library that simply told me how different two images were on a per-pixel basis. Hence diffimg, which can give you a difference ratio/percentage, or generate a diff image (check out the readme to see an example).

The initial implementation was in Python (the language I’m most comfortable in), with the heavy lifting done by Pillow. It’s usable as a library or a command line tool. The actual meatof the program is very small, only a few dozen lines, thanks to Pillow. Not a lot of effort went into building this tool (xkcd was right, there’s a Python module for nearly everything), but it’s at least been useful for a few dozen people other than myself.

A few months ago, I joined a company that had several services written in Go, and I needed to get up to speed quickly on the language. Writing diffimg-go seemed like an fun and possibly even useful way to do this. Here are a few points of interest that came out of the experience, along with some that came up while using it at work:

Comparing Python and Go

(Again, the code: diffimg (python) and diffimg-go)

  • Standard Library: Go comes with a decent image standard library module, as well as a command line flag parsing library. I didn’t need to look for any external dependencies; the diffimg-go implementation has none, where the Python implementation uses the fairly heavy third party module (ironically) named Pillow. Go’s standard library in general is more structured and well thought out, while Python’s is organically evolved, created by many authors over years, with many differing conventions. The Go standard library’s consistency makes it easier to predict how any given module will function, and the source code is extremely well documented.
    • One downside of using the standard image library is that it does not automatically detect if the image has an alpha channel; pixel values have four channels (RGBA) for all image types. The diffimg-go implementation therefore requires the user to indicate whether or not they want to use the alpha channel. This small inconvenience wasn’t worth finding a third party library to fix.
    • One big upside is that there’s enough in the standard library that you don’t need a web framework like Django. It’s possible to build a real, usable web service in Go without any dependencies. Python’s claim is that it’s batteries-included, but Go does it better, in my opinion.
  • Static Type System: I’ve used statically typed languages in the past, but my programming for the past few years has mostly been in Python. The experience was somewhat annoying at first, it felt as though it was simply slowing me down and forcing me to be excessively explicit whereas Python would just let me do what I wanted, even if I got it wrong occasionally. Somewhat like giving instructions to someone who always stops you to ask you to clarify what you mean, versus someone who always nods along and seems to understand you, though you’re not always sure they’re absorbing everything. It will decrease the amount of type-related bugs for free, but I’ve found that I still need to spend nearly the same amount of time writing tests.
    • One of the common complaints of Go is that it does not have user-implementable generic types. While this is not a must-have feature for building a large, extensible application, it certainly slows development speed. Alternative patterns have been suggested, but none of them are as effective as having real generic types.
    • One upside of the static type system is that it reading through an unfamiliar codebase is easier and faster. Good use of types imbues a lot of extra information that is lost with a dynamic type system.
  • Interfaces and Structs: Go uses interfaces and structs where Python would use classes. This was probably the most interesting difference to me, as it forced me to differentiate the concept of a type that defines behavior versus a type that holds information. Python and other “traditionally object-oriented” languages would encourage you to mash these together, but there are pros and cons to both paradigms:
    • Go heavily encourages composition over inheritance. While it has inheritance via embedding, without classes, it’s not as easy to forward both data and methods. I generally agree that composition is the better default pattern to reach for, but I’m not an absolutist and some situations are a better fit for inheritance, so I’d prefer not to have the language make this decision for me.
    • Divorcing implementations for interfaces means you need to write similar code several times if you have many types that are similar to each other. Because of the lack of generic types, there are situations in Go where I wouldn’t be able to reuse code, though I would in Python.
    • However, because Go is statically typed, the compiler/linter will tell you when you’re writing code that would have caused a runtime error in Python when you try to access a method or attribute that may not exist. Python linters can get a bit of this functionality, but because of the language’s dynamicity, the linter can’t know exactly what methods/attributes will exist until runtime. Statically defined interfaces and structs are the only way to know what’s available at compile time and during development, making Go that compiles more trustworthy than Python that runs.
  • No Optional Arguments: Go only has variadic functions which are similar to Python’s keyword arguments, but less useful, since the arguments need to be of the same type. I found keyword arguments to be something I really missed, mainly for how much easier refactoring is if you can just throw a kwarg of any type onto whatever function needs it without having to rewrite every one of its calls. I use this feature quite often in at work, it’s saved me a lot of time over the years. Not having the feature made my implementation for how to handle whether or not the diff image should be created based on the command line flags somewhat clumsy.
  • Verbosity: Go is a bit more verbose (though not Java verbose). Part of that is because type system does not have generics, but mainly the fact that the language itself is very small and not heavily loaded with features (you only get one looping construct!). I missed having Python’s list comprehensions and other functional programming features. If you’re comfortable with Python, you can go through the Tour of Go in a day or two, and you’ll have been exposed to the entirety of the language.
  • Error Handling: Python has exceptions, whereas Go propagates errors by returning tuples: value, error from functions wherever something may go wrong. Python lets you catch errors at any point in the call stack as opposed to requiring you to manually pass them back up over and over again. This again results in brevity and code that isn’t littered with Go’s infamous if err != nil pattern, though you do need to be aware of what possible exceptions can be thrown by a function and all(!) of its internal calls (using except Exception: is a usually-bad-practice workaround for this). Good docstrings and tests can help here, which you should be writing in either language. Go’s system is definitely safer. You’re still allowed to shoot yourself in the foot by ignoring the err value, but the system makes it obvious that this is a bad idea.
  • Third Party Modules: Prior to Go modules, Go’s package manager would just throw all downloaded packages into $GOPATH/src instead of the project’s directory (like most other languages). The path for these modules inside $GOPATH would also be built from the URL where the package is hosted, so your import would look something like import "github.com/someuser/somepackage". Embedding github.cominside the source code of almost all Go codebases seems like a strange choice. In any case, Go now allows the conventional way of doing things, but Go modules are still new so this quirk will remain common in wild Go code for some time.
  • Asynchronicity: Goroutines are a very convenient way to fire off asynchronous tasks. Before async/await, Python’s asynchronous solutions were somewhat hairy. Unfortunately I haven’t written much real-world async code in Python or Go, and the simplicity of diffimg didn’t seem to lend itself to the added overhead of asynchronicity, so I don’t have too much to say here, though I do like Go’s channels as a way to handle multiple async tasks. My understanding is that for performance, Go still has the upper hand here as goroutines can make use of full multiprocessor parallelism, where Python’s basic async/await is still stuck on one processor, so mainly useful for I/O bound tasks.
  • Debugging: Python wins. pdb (and more sophisticated options like ipdb are available) is extremely flexible, once you’ve entered the REPL, you’re able to write whatever code you want. Delve is a good debugger, but it’s not the same as dropping straight into an interpreter, the full power of the language at your fingertips.

Go Summary

My initial impression of Go is that because its ability to abstract is (purposely) limited, it’s not as fun a language as Python is. Python has more features and thus more ways of doing something, and it can be a lot of fun to find the fastest, most readable, or “cleverest” solution. Go actively tries to stop you from being “clever.” I would go as far as saying that Go’s strength is that it’s not clever.

Its minimalism and lack of freedom are constraining as a single developer just trying to materialize an idea. However, this weakness becomes its strength when the project scales to dozens or hundreds of developers – because everyone’s working with the same small toolset of language features, it’s more likely to be uniform and thus understandable by others. It’s still very possible to write bad Go, but it’s more difficult to create monstrosities that more “powerful” languages will let you produce.

After using it for a while, it makes sense to me why a company like Google would want a language like this. New engineers are being introduced to enormous codebases constantly, and in a messier/more powerful language and under the pressure of deadlines, complexity could be introduced faster than it can be removed. The best way to prevent that is with a language that has less capacity for it.

With that said, I’m happy to work on a Go codebase in the context of a large application with a diverse and ever-growing team. In fact, I think I’d prefer it. I just have no desire to use it for my own personal projects.

Enter Rust

A few weeks ago, I decided to give an honest go at learning Rust. I had attempted to do so before but found the type system and borrow checker confusing and without enough context for why all these constraints were being forced on me, cumbersome for the tasks I was trying to do. However, since then, I’ve learned a bit more about what happens with memory during the execution of a program. I also started with the book instead of just attempting to dive in headfirst. This was massively helpful, and probably the best introduction to any programming language I’ve ever experienced.

After I had gone through the first dozen or so chapters of the book, I felt confident enough to try another implementation of diffimg (at this point, I had about as much experience with Rust as I’d had with Go when I wrote diffimg-go). It took me a bit longer to write than the Go implementation, which itself took longer than Python. I think this would be true even taking into account my greater comfort with Python – there’s just more to write in both languages.

Some of the things that I took notice of when writing diffimg-rs:

  • Type System: I was comfortable with the more basic static type system of Go by now, but Rust’s is significantly more powerful (and complicated). Generic types, enumerated types, traits, reference types, lifetimes are all additional concepts that I had to learn on top of Go’s much simpler interfaces and structs. Additionally, Rust uses its type system to implement features that other languages don’t use the type system for (example: the Result type, which I’ll talk about soon). Luckily, the compiler/linter is extremely helpful in telling you what you’re doing wrong, and often even tells you exactly how to fix it. Despite this, I’ve spent significantly more time than I did learning Go’s type system and I’m still not comfortable with all the features yet.
    • There was one place where because of the type system, the implementation of the imaging library I was using would have led to an uncomfortable amount of code repetition. I only ended up matching the two most important enum types, but matching the others would lead another half dozen or so lines of nearly identical code. At this scale it’s not an issue, but it rubs me the wrong way. Maybe it’s a good candidate for using macros, which I still need to experiment with.
      let mut diff = match image1.color() {
          image::ColorType::RGB(_) => image::DynamicImage::new_rgb8(w, h),
          image::ColorType::RGBA(_) => image::DynamicImage::new_rgba8(w, h),
          // keep going for all 7 types?
          _ => return Err(
              format!("color mode {:?} not yet supported", image1.color())
          ),
      };
      
  • Manual Memory Management: Python and Go pick up your trash for you. C lets you litter everywhere, but throws a fit when it steps on your banana peel. Rust slaps you and demands that you clean up after yourself. This stung at first, since I’m spoiled and usually have my languages pick up after me, moreso even than moving from a dynamic to a statically typed language. Again, the compiler tries to help you as much as is possible, but there’s still a good amount of studying you’ll need to do to understand what’s really going on.
    • One nice part about having such direct access to the memory (and the functional programming features of Rust) is that it simplified the difference ratio calculationbecause I could simply map over the raw byte arrays instead of having to index each pixel by coordinate.
  • Functional Features: Rust strongly encourages a functional approach: it has a FP-friendly type system like Haskell, immutable types, closures, iterators, pattern matching, and more, but also allows imperative code. It’s similar to writing OCaml (interestingly, the original Rust compiler was written in OCaml). Because of this, code is more concise than you’d expect for a language that competes with C.
  • Error Handling: Instead of the exception model that Python uses or the tuple returns that Go uses for error handling, Rust makes use of its enumerated types: Resultreturns either Ok(value) or Err(error). This is closer to Go’s way if you squint, but is a bit more explicit and leverages the type system. There’s also syntactic sugar for checking a statement for an Err and returning early: the ? operator (Go could use something like this, IMO).
  • Asynchronicity: Async/await hasn’t quite landed for Rust yet, but the final syntax has recently been agreed upon. Rust also has some basic threading features in the standard library that seem a bit easier to use than Python’s, but I haven’t spent much time with it. Go still seems to have the best offerings here.
  • Toolingrustup and cargo are extremely polished implementations of a language version manager and package/module manager, respectively. Everything “just works.” I especially love the autogenerated docs. The Python options for these are somewhat organic and finicky, and as I mentioned before, Go has a strange way of managing modules, though aside from that, its tooling is in a much better state than Python’s.
  • Editor Plugins: My .vimrc is embarrassingly large, with at least three dozen plugins. I have some plugins for linting, autocompleting, and formatting both Python and Go, but the Rust plugins were easier to set up, more helpful, and more consistent compared to the other two languages. The rust.vim and vim-lsp plugins (along with the Rust Language Server) were all I needed to get an extremely powerful configuration. I haven’t tested out other editors with Rust but with the excellent editor-agnostic tooling that Rust comes with, I’d expect them to be just as helpful. The setup provides the best go-to-definition I’ve ever used. It works perfectly on local, standard library, and third-party code out of the box.
  • Debugging: I haven’t tried out a debugger with Rust yet (since the type system andprintln! take you pretty far), but you can use rust-gdb and rust-lldb, wrappers around the gdb and lldb debuggers that are installed with the initial rustup. The experience should be predictable if you’ve used those debuggers before with C. As mentioned previously, the compiler error messages are extremely helpful.

Rust Summary

I definitely wouldn’t recommend attempting to write Rust without at least going through the first few chapters of the book, even if you’re already familiar with C and memory management. With Go and Python, as long as you have some experience with another modern imperative programming language, they’re not difficult to just start writing, referring to the docs when necessary. Rust is a large language. Python also has a lot of features, but they’re mostly opt-in. You can get a lot done just by understanding a few primitive data structures and some builtin functions. With Rust, you really need to understand the complexity inherent to the type system and borrow checker, or you’re going to be getting tangled up a lot.

As far as how I feel when I write Rust, it’s a lot of fun, like Python. Its breadth of features makes it very expressive. While the compiler stops you a lot, it’s also very helpful, and its suggestions on how to solve your borrowing/typing problems usually work. The tooling as I’ve mentioned is the best I’ve encountered for any language and doesn’t bring me a lot of headaches like some other languages I’ve used. I really like using the language and will continue to look for opportunities to do so, where the performance of Python isn’t good enough.

Code Samples

I’ve extracted the chunks of each diffimg which calculate the difference ratio. To summarize how it works for Python, this takes the diff image generated by Pillow, sums the values of all channels of all pixels, and returns the ratio produced by dividing the maximum possible value (a pure white image of the same size) by this sum.

Python:


diff_img = ImageChops.difference(im1, im2)
stat = ImageStat.Stat(diff_img)
sum_channel_values = sum(stat.mean)
max_all_channels = len(stat.mean) * 255
diff_ratio = sum_channel_values / max_all_channels

For Go and Rust, the method is a little different: Instead of creating a diff image, we just loop over both input images and keep a running sum of the differences of each pixel. In Go, we’re indexing into each image by coordinate…

Go:


func GetRatio(im1, im2 image.Image, ignoreAlpha bool) float64 {
  var sum uint64
  width, height := getWidthAndHeight(im1)
  for y := 0; y < height; y++ {
    for x := 0; x < width; x++ {
      sum += uint64(sumPixelDiff(im1, im2, x, y, ignoreAlpha))
    }
  }
  var numChannels = 4
  if ignoreAlpha {
    numChannels = 3
  }
  totalPixVals := (height * width) * (maxChannelVal * numChannels)
  return float64(sum) / float64(totalPixVals)
}

… but in Rust, we’re treating the images as what they really are in memory, a series of bytes that we can just zip together and consume.

Rust:


pub fn calculate_diff(
    image1: DynamicImage,
    image2: DynamicImage
  ) -> f64 {
  let max_val = u64::pow(2, 8) - 1;
  let mut diffsum: u64 = 0;
  for (&p1, &p2) in image1
      .raw_pixels()
      .iter()
      .zip(image2.raw_pixels().iter()) {
    diffsum += u64::from(abs_diff(p1, p2));
  }
  let total_possible = max_val * image1.raw_pixels().len() as u64;
  let ratio = diffsum as f64 / total_possible as f64;

  ratio
}

Some things to take note of in these examples:

  • Python has the least code by far. Obviously, it’s leaning heavily on features of the image library it’s using, but this is indicative of the general experience of using Python. In many cases, a lot of the work has been done for you because the ecosystem is so developed that there are mature pre-existing solutions for everything.
  • There’s type conversion in the Go and Rust examples. In each block there are three numerical types being used: uint8/u8 for the pixel channel values (the type is inferred in both Go and Rust, so you don’t see any explicit mention of either type),uint64/u64 for the sum, and float64/f64 for the final ratio. For Go and Rust, there was time spent getting the types to line up, whereas Python converts everything implicitly.
  • The Go implementation’s style is very imperative, but also explicit and understandable (minus the ignoreAlpha part I mentioned earlier), even to those unaccustomed to the language. The Python example is fairly clear as well, once you understand what ImageStat is doing. Rust is definitely murkier to those unfamiliar with the language:
    • .raw_pixels() gets the image as a vector of unsigned 8-bit integers.
    • .iter() creates an iterator for that vector. Vectors by default are not iterable.
    • .zip() you may be familiar with, it takes two iterators and produces one, with each element being a tuple: (element from first vector, element from second vector).
    • We need a mut in our diffsum declaration because by default, variables are immutable.
    • If you’re familiar with C you can probably figure out why we have the &s in for (&p1, &p2): The iterator produces references to the pixel values, but abs_diff() takes the values themselves. Go supports pointers (which are not quite the same as references), but they’re not as commonly used as references are in Rust.
    • The last statement in a function is used as the return value if there isn’t a line-ending ;. A few other functional languages do this as well.

    This snippet gives you some insight into how much language-specific knowledge you’ll need to pick up to be effective in Rust.

Performance

Now for something resembling a scientific comparison. I first generated three random images of different sizes: 1×1, 2000×2000, and 10,000×10,000. Then I measured each (language, image size) combination’s performance 10 times for each diffimg ratio calculation and averaged them, using the values given by the real values from the timecommand. diffimg-rs was built using --releasediffimg-go with just go build, and the Python diffimg invoked with python3 -m diffimg. The results, on a 2015 Macbook Pro:

Image size: 1×1 2000×2000 10,000×10,000
Rust 0.001s 0.490s 5.871s
Go 0.002s (2x) 0.756s (1.54x) 14.060s (2.39x)
Python 0.095s (95x) 1.419s (2.90x) 28.751s (4.89x)

I’m losing a lot of precision because time only goes down to 10ms resolution (one more digit is shown here because of the averaging). The task only requires a very specific type of calculation as well, so a different or more complex one could have very different numbers. Despite these caveats, we can still learn something from the data.

With the 1×1 image, virtually all the time is spent in setup, not ratio calculation. Rust wins, despite using two third-party libraries (clap and image) and Go only using the standard library. I’m not surprised Python’s startup is as slow as it is, since importing a large library (Pillow) is one of its steps, and even just time python -c '' takes 0.030s.

At 2000×2000, the gap narrows for both Go and Python compared to Rust, presumably because less of the overall time is spent in setup compared to calculation. However, at 10,000×10,000, Rust is more performant in comparison, which I would guess is due to its compiler’s optimizations producing the smallest block of machine code that is looped through 100,000,000 times, dwarfing the setup time. Never needing to pause for garbage collection could also be a factor.

The Python implementation definitely has room for improvement, because as efficient as Pillow is, we’re still creating a diff image in memory (traversing both input images) and then adding up each of its pixel’s channel values. A more direct approach like the Go and Rust implementations would probably be marginally faster. However, a pure Python implementation would be wildly slower, since Pillow does its main work in C. Because the other two are pure language implementations, this isn’t really a fair comparison, though in some ways it is, because Python has an absurd amount of libraries available to you that are performant thanks to C extensions (and Python and C have a very tight relationship in general).

I should also mention the binary sizes: Rust’s is 2.1mb with the --release build, and Go’s is comparable at 2.5mb. Python doesn’t create binaries, but .pyc files are sort ofcomparable, and diffimg’s .pyc files are about 3kb in total. Its source code is also only about 3kb, but including the Pillow dependency, it weighs in at 24mb(!). Again, not a fair comparison because I’m using a third party imaging library, but it should be mentioned.

The Takeaway

Obviously, these are three very different languages fulfilling different niches. I’ve heard Go and Rust often mentioned together, but I think Go and Python are the two more similar/competing languages. They’re both good for writing server-side application logic (what I spend most of my time doing at work). Comparing just native code performance, Go blows Python away, but many of Python’s libraries that require speed are wrappers around fast C implementations – in practice, it’s more complicated than a naive comparison. Writing a C extension for Python doesn’t really count as Python anymore (and then you’ll need to know C), but the option is open to you.

For your backend server needs, Python has proven itself to be “fast enough” for most applications, though if you need more performance, Go has it. Rust even more so, but you pay for it with development time. Go is not far off from Python in this regard, though it certainly is slower to develop, primarily due to its small feature set. Rust is very fully featured, but managing memory will always take more time than having the language do it, and this outweighs having to deal with Go’s minimality.

It should also be mentioned that there are many, many Python developers in the world, some with literally decades of experience. It will likely never be hard to find more people with language experience to add to your backend team if you choose Python. However, Go developers are not particularly rare, and can easily be created because the language is so easy to learn. Rust developers are both rarer and harder to make since the language takes longer to internalize.

With respect to the type systems: static type systems make it easier to write more correct code, but it’s not a panacea. You still need to write comprehensive tests no matter the language you use. It requires a bit more discipline, but I’ve found that the code I write in Python is not necessarily more error prone than Go as long as I’m able to write a good suite of tests. That said, I much prefer Rust’s type system to Go’s: it supports generics, pattern matching, handles errors, and just does more for you in general.

In the end, this comparison is a bit silly, because though the use cases of these languages overlap, they occupy very different niches. Python is high on the development-speed, low on the performance scale, while Rust is the opposite, and Go is in the middle. I enjoy writing Python and Rust more than Go (this may be unsurprising), though I’ll continue to use Go at work happily (along with Python) since it really is a great language for building stable and maintainable applications with many contributors from many backgrounds. Its inflexibility and minimalism which makes it less enjoyable to use (for me) becomes its strength here. If I had to choose the language for the backend of a new web application, it would be Go.

I’m pretty satisfied with the range of programming tasks that are covered by these three languages – there’s virtually no project that one of them wouldn’t be a great choice for.

Top 10 Python Libraries You Must Know in 2019

In this article, we will discuss some of the top libraries in Python that can be used by developers to prase, clean, and represent data and implement machine learning in their existing applications.

We will be considering the following 10 libraries:

  • TensorFlow
  • Scikit-Learn
  • Numpy
  • Keras
  • PyTorch
  • LightGBM
  • Eli5
  • SciPy
  • Theano
  • Pandas

Image title

Introduction

Python is one of the most popular and widely used programming languages and has replaced many programming languages in the industry.

There are many reasons why Python is popular among developers. However, one of the most significant is its large collection of libraries that users can work with.

The simplicity of Python has attracted many developers to create new libraries for machine learning. Because of the huge collection of libraries, Python is becoming hugely popular among machine learning experts.

So, the first library is TensorFlow.

TensorFlow

Top 10 Python Libraries - Edureka

What Is TensorFlow?

If you are currently working on a machine learning project in Python, then you may have heard about this popular open-source library known as TensorFlow.

This library was developed by Google in collaboration with the Brain Team. TensorFlow is used in almost every Google application for machine learning.

TensorFlow works like a computational library for writing new algorithms that involve a large number of tensor operations. Since neural networks can be easily expressed as computational graphs, they can be implemented using TensorFlow as a series of operations on Tensors. Plus, tensors are N-dimensional matrices that represent your data.

Features of TensorFlow

TensorFlow is optimized for speed, and it makes use of techniques like XLA for quick linear algebra operations.

1. Responsive Construct

With TensorFlow, we can easily visualize each and every part of the graph, which is not an option while using Numpy or SciKit.

2. Flexible

One of the very important Tensorflow Features is that it is flexible in its operability, meaning it has modularity, and for the parts of it that you want to make stand alone, it offers you that option.

3. Easily Trainable

It is easily trainable on CPU as well as GPU for distributed computing.

4. Parallel Neural Network Training

TensorFlow offers pipelining, in the sense that you can train multiple neural networks and multiple GPUs, which makes the models very efficient on large-scale systems.

5. Large Community

Needless to say, if it has been developed by Google, there is already a large team of software engineers who work on stability improvements continuously.

6. Open Source

The best thing about this machine learning library is that it is open source, so anyone can use it as long as they have internet connectivity.

Where Is TensorFlow Used?

You are using TensorFlow daily but indirectly with applications like Google Voice Search or Google Photos. These applications are developed using this library.

All the libraries created in TensorFlow are written in C and C++. However, it has a complicated frontend for Python. Your Python code will get compiled and then executed on TensorFlow distributed execution engine built using C and C++.

The number of applications of TensorFlow is literally unlimited, and that is the beauty of TensorFlow.

Scikit-Learn

Top 10 Python Libraries - Edureka

What Is Scikit-learn?

It is a Python library is associated with NumPy and SciPy. It is considered one of the best libraries for working with complex data.

There are a lot of changes being made in this library. One modification is the cross-validation feature, providing the ability to use more than one metric. Lots of training methods like logistics regression and nearest neighbors have received some little improvements.

Features Of Scikit-Learn

1. Cross-validation: There are various methods to check the accuracy of supervised models on unseen data.

2.Unsupervised learning algorithms: Again, there is a large spread of algorithms in the offering — starting from clustering, factor analysis, and principal component analysis to unsupervised neural networks.

3. Feature extraction: Useful for extracting features from images and text (e.g. Bag of words

Where Is Scikit-Learn Used?

It contains a numerous number of algorithms for implementing standard machine learning and data mining tasks like reducing dimensionality, classification, regression, clustering, and model selection.

Numpy

Top 10 Python Libraries - Edureka

What Is Numpy?

Numpy is considered one of the most popular machine learning libraries in Python.

TensorFlow and other libraries use Numpy internally for performing multiple operations on Tensors. Array interface is the best and the most important feature of Numpy.

Features Of Numpy

  1. Interactive: Numpy is very interactive and easy to use
  2. Mathematics: Makes complex mathematical implementations very simple
  3. Intuitive: Makes coding real easy and grasping the concepts is easy
  4. Lots of Interaction: Widely used, hence a lot of open source contribution

Where Is Numpy Used?

This interface can be utilized for expressing images, sound waves, and other binary raw streams as an array of real numbers in N-dimensional.

For implementing this library for machine learning, having knowledge of Numpy is important for full-stack developers.

Keras

Top 10 Python Libraries - Edureka

What Is Keras?

Keras is considered one of the coolest machine learning libraries in Python. It provides an easier mechanism to express neural networks. Keras also provides some of the best utilities for compiling models, processing data-sets, visualization of graphs, and much more.

In the backend, Keras uses either Theano or TensorFlow internally. Some of the most popular neural networks like CNTK can also be used. Keras is comparatively slow when we compare it with other machine learning libraries because it creates a computational graph by using back-end infrastructure and then makes use of it to perform operations. All the models in Keras are portable.

Features Of Keras

  • It runs smoothly on both CPU and GPU.
  • Keras supports almost all the models of a neural network — fully connected, convolutional, pooling, recurrent, embedding, etc. Furthermore, these models can be combined to build more complex models.
  • Keras, being modular in nature, is incredibly expressive, flexible, and apt for innovative research.
  • Keras is a completely Python-based framework, which makes it easy to debug and explore.

Where Is Keras Used?

You are already constantly interacting with features built with Keras — it is in use at Netflix, Uber, Yelp, Instacart, Zocdoc, Square, and many others. It is especially popular among startups that place deep learning at the core of their products.

Keras contains numerous implementations of commonly used neural network building blocks such as layers, objectives, activation functions, optimizers and a host of tools to make working with image and text data easier.

Plus, it provides many pre-processed data-sets and pre-trained models like MNIST, VGG, Inception, SqueezeNet, ResNet, etc.

Keras is also a favorite among deep learning researchers, coming in at #2. Keras has also been adopted by researchers at large scientific organizations, in particular, CERN and NASA.

PyTorch

Top 10 Python Libraries - Edureka

What Is PyTorch?

PyTorch is the largest machine learning library that allows developers to perform tensor computations with the acceleration of GPU, creates dynamic computational graphs, and calculate gradients automatically. Other than this, PyTorch offers rich APIs for solving application issues related to neural networks.

This machine learning library is based on Torch, which is an open-source machine library implemented in C with a wrapper in Lua.

This machine library, in Python, was introduced in 2017, and since its inception, the library is gaining popularity and attracting an increasing number of machine learning developers.

Features Of PyTorch

Hybrid Front-End

A new hybrid frontend provides ease-of-use and flexibility in eager mode, while seamlessly transitioning to graph mode for speed, optimization, and functionality in C++ runtime environments.

Distributed Training

Optimize performance in both research and production by taking advantage of native support for asynchronous execution of collective operations and peer-to-peer communication that is accessible from Python and C++.

Python First

PyTorch is not a Python binding into a monolithic C++ framework. It’s built to be deeply integrated into Python so it can be used with popular libraries and packages such as Cython and Numba.

Libraries and Tools

An active community of researchers and developers have built a rich ecosystem of tools and libraries for extending PyTorch and supporting development in areas from computer vision to reinforcement learning.

Where Is PyTorch Used?

PyTorch is primarily used for applications such as natural language processing.

It is primarily developed by Facebook’s artificial-intelligence research group and Uber’s “Pyro” software for probabilistic programming is built on it.

PyTorch is outperforming TensorFlow in multiple ways and it is gaining a lot of attention in recent days.

LightGBM

Top 10 Python Libraries - Edureka

What Is LightGBM?

Gradient Boosting is one of the best and most popular machine learning(ML) library, which helps developers in building new algorithms by using redefined elementary models and namely decision trees. Therefore, there are special libraries that are designed for fast and efficient implementation of this method.

These libraries are LightGBM, XGBoost, and CatBoost. All these libraries are competitors that help in solving a common problem and can be utilized in almost a similar manner.

Features of LightGBM

Very fast computation ensures high production efficiency.

Intuitive, hence makes it user-friendly.

Faster training than many other deep learning libraries.

Will not produce errors when you consider NaN values and other canonical values.

Where Is LightGBM Used?

This library provides highly scalable, optimized, and fast implementations of gradient boosting, which makes it popular among machine learning developers. Because most of the machine learning full-stack developers won machine learning competitions by using these algorithms.

Eli5

Top 10 Python Libraries - Edureka

What Is Eli5?

Most often, the results of machine learning model predictions are not accurate, and Eli5 machine learning library built-in Python helps in overcoming this challenge. It is a combination of visualization and debugs all the machine learning models and tracks all working steps of an algorithm.

Features of Eli5

Moreover, Eli5 supports other libraries XGBoost, lightning, scikit-learn, and sklearn-crfsuite libraries. All the above-mentioned libraries can be used to perform different tasks using each one of them.

Where Is Eli5 Used?

  • Mathematical applications that require a lot of computation in a short time.
  • Eli5 plays a vital role where there are dependencies with other Python packages.
  • Legacy applications and implementing newer methodologies in various fields.

SciPy

Top 10 Python Libraries - Edureka

What Is SciPy?

SciPy is a machine learning library for application developers and engineers. However, you still need to know the difference between SciPy library and SciPy stack. SciPy library contains modules for optimization, linear algebra, integration, and statistics.

Features Of SciPy

The main feature of the SciPy library is that it is developed using NumPy, and its array makes the most use of NumPy.

In addition, SciPy provides all the efficient numerical routines like optimization, numerical integration, and many others using its specific submodules.

All the functions in all submodules of SciPy are well documented.

Where Is SciPy Used?

SciPy is a library that uses NumPy for the purpose of solving mathematical functions. SciPy uses NumPy arrays as the basic data structure and comes with modules for various commonly used tasks in scientific programming.

Tasks including linear algebra, integration (calculus), ordinary differential equation solving and signal processing are handled easily by SciPy.

Theano

Top 10 Python Libraries - Edureka

What Is Theano?

Theano is a computational framework machine learning library in Python for computing multidimensional arrays. Theano works similar to TensorFlow, but it not as efficient as TensorFlow. Because of its inability to fit into production environments.

Moreover, Theano can also be used on a distributed or parallel environments just similar to TensorFlow.

Features Of Theano

  • Tight integration with NumPy – Ability to use completely NumPy arrays in Theano-compiled functions.
  • Transparent use of a GPU – Perform data-intensive computations much faster than on a CPU.
  • Efficient symbolic differentiation – Theano does your derivatives for functions with one or many inputs.
  • Speed and stability optimizations – Get the right answer for log(1+x) even when x is very tiny. This is just one of the examples to show the stability of Theano.
  • Dynamic C code generation – Evaluate expressions faster than ever before, thereby increasing efficiency by a lot.
  • Extensive unit-testing and self-verification – Detect and diagnose multiple types of errors and ambiguities in the model.

Where Is Theano Used?

The actual syntax of Theano expressions is symbolic, which can be off-putting to beginners used to normal software development. Specifically, an expression is defined in the abstract sense, compiled, and later actually used to make calculations.

It was specifically designed to handle the types of computation required for large neural network algorithms used in Deep Learning. It was one of the first libraries of its kind (development started in 2007) and is considered an industry standard for Deep Learning research and development.

Theano is being used in multiple neural network projects today, and the popularity of Theano is only growing with time.

Pandas

Top 10 Python Libraries - Edureka

What Is Pandas?

Pandas is a machine learning library in Python that provides data structures of high-level and a wide variety of tools for analysis. One of the great features of this library is the ability to translate complex operations with data using one or two commands. Pandas has so many inbuilt methods for grouping, combining data, filtering, as well as time-series functionality.

All these are followed by outstanding speed indicators.

Features Of Pandas

Pandas makes sure that the entire process of manipulating data will be easier. Support for operations such as Re-indexing, Iteration, Sorting, Aggregations, Concatenations, and Visualizations are among the feature highlights of Pandas.

Where Is Pandas Used?

Currently, there are fewer releases of the Pandas library, which includes hundreds of new features, bug fixes, enhancements, and changes in API. The improvements in Pandas are its ability to group and sort data, select the best-suited output for the applied method, and provide support for performing custom types operations.

Data Analysis, among everything else, takes the highlight when it comes to using Pandas. But when used with other libraries and tools, Pandas ensures high functionality and a good amount of flexibility.

That’s it, folks! I hope this article helped you kickstart your learning the libraries available in Python.

18.4 systemd-journald.service 簡介

過去只有rsyslogd 的年代中,由於rsyslogd 必須要開機完成並且執行了rsyslogd 這個daemon 之後,登錄文件才會開始記錄。所以,核心還得要自己產生一個klogd 的服務, 才能將系統在開機過程、啟動服務的過程中的信息記錄下來,然後等rsyslogd 啟動後才傳送給它來處理~

現在有了systemd 之後,由於這玩意兒是核心喚醒的,然後又是第一支執行的軟件,它可以主動調用systemd-journald 來協助記載登錄文件~ 因此在開機過程中的所有信息,包括啟動服務與服務若啟動失敗的情況等等,都可以直接被記錄到systemd-journald 裡頭去!

不過systemd-journald 由於是使用於內存的登錄文件記錄方式,因此重新開機過後,開機前的登錄文件信息當然就不會被記載了。為此,我們還是建議啟動rsyslogd 來協助分類記錄!也就是說, systemd-journald 用來管理與查詢這次開機後的登錄信息,而rsyslogd 可以用來記錄以前及現在的所以數據到磁盤文件中,方便未來進行查詢喔!

 

Tips雖然systemd-journald所記錄的數據其實是在內存中,但是系統還是利用文件的型態將它記錄到/run/log/下面!不過我們從前面幾章也知道, /run在CentOS 7其實是內存內的數據,所以重新開機過後,這個/run/log下面的數據當然就被刷新,舊的當然就不再存在了!

18.4.1 使用journalctl 觀察登錄信息

那麼systemd-journald.service 的數據要如何叫出來查閱呢?很簡單!就通過journalctl 即可!讓我們來瞧瞧這個指令可以做些什麼事?

[root@study ~]# journalctl [-nrpf] [--since TIME] [--until TIME] _optional
选项与参数:
默认会秀出全部的 log 内容,从旧的输出到最新的讯息
-n  :秀出最近的几行的意思~找最新的信息相当有用
-r  :反向输出,从最新的输出到最旧的数据
-p  :秀出后面所接的讯息重要性排序!请参考前一小节的 rsyslogd 信息
-f  :类似 tail -f 的功能,持续显示 journal 日志的内容(实时监测时相当有帮助!)
--since --until:设置开始与结束的时间,让在该期间的数据输出而已
_SYSTEMD_UNIT=unit.service :只输出 unit.service 的信息而已
_COMM=bash :只输出与 bash 有关的信息
_PID=pid   :只输出 PID 号码的信息
_UID=uid   :只输出 UID 为 uid 的信息
SYSLOG_FACILITY=[0-23] :使用 syslog.h 规范的服务相对序号来调用出正确的数据!

范例一:秀出目前系统中所有的 journal 日志数据
[root@study ~]# journalctl
-- Logs begin at Mon 2015-08-17 18:37:52 CST, end at Wed 2015-08-19 00:01:01 CST. --
Aug 17 18:37:52 study.centos.vbird systemd-journal[105]: Runtime journal is using 8.0M (max 
 142.4M, leaving 213.6M of free 1.3G, current limit 142.4M).
Aug 17 18:37:52 study.centos.vbird systemd-journal[105]: Runtime journal is using 8.0M (max
 142.4M, leaving 213.6M of free 1.3G, current limit 142.4M).
Aug 17 18:37:52 study.centos.vbird kernel: Initializing cgroup subsys cpuset
Aug 17 18:37:52 study.centos.vbird kernel: Initializing cgroup subsys cpu
.....(中间省略).....
Aug 19 00:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[19268]: finished 0anacron
Aug 19 00:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[19270]: starting 0yum-hourly.cron
Aug 19 00:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[19274]: finished 0yum-hourly.cron
# 从这次开机以来的所有数据都会显示出来!通过 less 一页页翻动给管理员查阅!数据量相当大!

范例二:(1)仅显示出 2015/08/18 整天以及(2)仅今天及(3)仅昨天的日志数据内容
[root@study ~]# journalctl --since "2015-08-18 00:00:00" --until "2015-08-19 00:00:00"
[root@study ~]# journalctl --since today
[root@study ~]# journalctl --since yesterday --until today

范例三:只找出 crond.service 的数据,同时只列出最新的 10 笔即可
[root@study ~]# journalctl _SYSTEMD_UNIT=crond.service -n 10

范例四:找出 su, login 执行的登录文件,同时只列出最新的 10 笔即可
[root@study ~]# journalctl _COMM=su _COMM=login -n 10

范例五:找出讯息严重等级为错误 (error) 的讯息!
[root@study ~]# journalctl -p err

范例六:找出跟登录服务 (auth, authpriv) 有关的登录文件讯息
[root@study ~]# journalctl SYSLOG_FACILITY=4 SYSLOG_FACILITY=10
# 更多关于 syslog_facility 的数据,请参考 18.2.1 小节的内容啰!

基本上,有journalctl 就真的可以搞定你的訊息數據囉!全部的數據都在這裡面耶~再來假設一下,你想要了解到登錄文件的實時變化, 那又該如何處置呢?現在,請開兩個終端機,讓我們來處理處理!

# 第一号终端机,请使用下面的方式持续侦测系统!
[root@study ~]# journalctl -f
# 这时系统会好像卡住~其实不是卡住啦!是类似 tail -f 在持续的显示登录文件信息的!

# 第二号终端机,使用下面的方式随便发一封 email 给系统上的帐号!
[root@study ~]# echo "testing" &#124; mail -s 'tset' dmtsai
# 这时,你会发现到第一号终端机竟然一直输出一些讯息吧!没错!这就对了!

如果你有一些必須要偵測的行為,可以使用這種方式來實時了解到系統出現的訊息~而取消journalctl -f 的方法,就是[crtl]+c 啊!

18.4.2 logger 指令的應用

上面談到的是叫出登錄文件給我們查閱,那換個角度想,“如果你想要讓你的數據儲存到登錄文件當中”呢?那該如何是好?這時就得要使用logger 這個好用的傢伙了!這個傢伙可以傳輸很多信息,不過,我們只使用最簡單的本機信息傳遞~ 更多的用法就請您自行man logger 囉!

[root@study ~]# logger [-p 服务名称.等级] "讯息"
选项与参数:
服务名称.等级 :这个项目请参考 rsyslogd 的本章后续小节的介绍;

范例一:指定一下,让 dmtsai 使用 logger 来传送数据到登录文件内
[root@study ~]# logger -p user.info "I will check logger command"
[root@study ~]# journalctl SYSLOG_FACILITY=1 -n 3
-- Logs begin at Mon 2015-08-17 18:37:52 CST, end at Wed 2015-08-19 18:03:17 CST. --
Aug 19 18:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[29710]: starting 0yum-hourly.cron
Aug 19 18:01:01 study.centos.vbird run-parts(/etc/cron.hourly)[29714]: finished 0yum-hourly.cron
Aug 19 18:03:17 study.centos.vbird dmtsai[29753]: I will check logger command

現在,讓我們來瞧一瞧,如果我們之前寫的backup.service 服務中,如果使用手動的方式來備份,亦即是使用”/backups/backup.sh log” 來執行備份時, 那麼就通過logger來記錄備份的開始與結束的時間!該如何是好呢?這樣作看看!

[root@study ~]# vim /backups/backup.sh
#!/bin/bash

if [ "${1}" == "log" ]; then
        logger -p syslog.info "backup.sh is starting"
fi
source="/etc /home /root /var/lib /var/spool/{cron,at,mail}"
target="/backups/backup-system-$(date +%Y-%m-%d).tar.gz"
[ ! -d /backups ] && mkdir /backups
tar -zcvf ${target} ${source} &&gt; /backups/backup.log
if [ "${1}" == "log" ]; then
        logger -p syslog.info "backup.sh is finished"
fi

[root@study ~]# /backups/backup.sh log
[root@study ~]# journalctl SYSLOG_FACILITY=5 -n 3
Aug 19 18:09:37 study.centos.vbird dmtsai[29850]: backup.sh is starting
Aug 19 18:09:54 study.centos.vbird dmtsai[29855]: backup.sh is finished

通過這個玩意兒,我們也能夠將數據自行處置到登錄文件當中囉!

18.4.3 保存journal 的方式

再強調一次,這個systemd-journald.servicd 的訊息是不會放到下一次開機後的,所以,重新開機後,那之前的記錄通通會遺失。雖然我們大概都有啟動rsyslogd 這個服務來進行後續的登錄文件放置,不過如果你比較喜歡journalctl 的存取方式,那麼可以將這些數據儲存下來喔!

基本上,systemd-journald.service 的配置文件主要參考/etc/systemd/journald.conf 的內容,詳細的參數你可以參考man 5 journald.conf 的數據。因為默認的情況下面,配置文件的內容應該已經符合我們的需求,所以這邊鳥哥就不再修改配置文件了。只是如果想要保存你的journalctl 所讀取的登錄文件, 那麼就得要創建一個/var/log/journal 的目錄,並且處理一下該目錄的權限,那麼未來重新啟動systemd-journald.service 之後, 日誌登錄文件就會主動的複制一份到/var/log/journal 目錄下囉!

# 1\. 先处理所需要的目录与相关权限设置
[root@study ~]# mkdir /var/log/journal
[root@study ~]# chown root:systemd-journal /var/log/journal
[root@study ~]# chmod 2775 /var/log/journal

# 2\. 重新启动 systemd-journald 并且观察备份的日志数据!
[root@study ~]# systemctl restart systemd-journald.service
[root@study ~]# ll /var/log/journal/
drwxr-sr-x. 2 root systemd-journal 27 Aug 20 02:37 309eb890d09f440681f596543d95ec7a

你得要注意的是,因為現在整個日誌登錄文件的容量會持續長大,因此你最好還是觀察一下你係統能用的總容量喔!避免不小心文件系統的容量被灌爆!此外,未來在/run/log 下面就沒有相關的日誌可以觀察了!因為移動到/var/log/journal 下面來囉!

其實鳥哥是這樣想的,既然我們還有rsyslog.service 以及logrotate 的存在,因此這個systemd-journald.service 產生的登錄文件, 個人建議最好還是放置到/run/log 的內存當中,以加快存取的速度!而既然rsyslog.service 可以存放我們的登錄文件, 似乎也沒有必要再保存一份journal 登錄文件到系統當中就是了。單純的建議!如何處理,依照您的需求即可喔!

Go 語言很好很強大,但我有幾個問題想吐槽

Go 是一門非常不錯的編程語言。然而,我在公司的Slack 編程頻道中對Go 的抱怨卻越來越多(猜到我是做啥了的吧?),因此我認為有必要把這些吐槽寫下來並放在這裡,這樣當人們問我抱怨什麼時,我給他們一個鏈接就行了。

image

先聲明一下,在過去的一年裡,我大量地使用Go語言開發命令行應用程序、scclc和API。其中既有供客戶端調用的大規模API,也有即將在https://searchcode.com/使用的語法高亮顯示器

我這些批評全部是針對Go 語言的。但是,我對使用過的每種語言都有不滿。我非常贊同下面的話:

“世界上只有兩種語言:人們抱怨的語言和沒人使用的語言。” —— Bjarne Stroustrup

1 不支持函數式編程

我並不是一個函數式編程狂熱者。說到Lisp 語言,我首先想到的是語言障礙。

這可能是Go 語言最大的痛點了。與大部分人不同,我不希望Go 支持泛型,因為它會為多數Go 項目帶來不必要的複雜性。我希望Go 語言支持適用於內置切片和Map 的函數式方法。切片和Map 具有通用性,並且可以容納任何類型,從這個意義上講,它們已經非常神奇。在Go 語言中只有利用接口才能實現類似效果,但這樣一來將喪失安全性和速度。

例如,請考慮下面的問題。

給定兩個字符串切片,找出二者都包含的字符串,並將其放入新的切片以備後用。

複製代碼

existsBoth := []string{}
for _, first := range firstSlice {
for _, second := range secondSlice {
if first == second {
existsBoth = append(existsBoth, proxy)
break
}
}
}

上面是一個用Go 語言實現的簡單方案。當然還有其它方法,比如借助Map 來減少運行時間。這裡我們假設內存足夠用或者切片都不太大,同時假設優化運行時間帶來的複雜性遠超收益,因此不值得優化。作為對比,使用Java 流和函數式編程把相同的邏輯重寫如下:

複製代碼

var existsBoth = firstList.stream()
.filter(x -> secondList.contains(x))
.collect(Collectors.toList());

上面的代碼隱藏了算法的複雜性,但是,你更容易理解它實際做的事情。

與Go 代碼相比,Java 代碼的意圖一目了然。真正靈活之處在於,添加更多的過濾條件易如反掌。如果使用Go 語言添加下面例子中的過濾條件,我們需要在嵌套的for 循環中再添加兩個if 條件。

複製代碼

var existsBoth = firstList.stream()
.filter(x -> secondList.contains(x))
.filter(x -> x.startsWith(needle))
.filter(x -> x.length() >= 5)
.collect(Collectors.toList());

有些借助go generate 命令的項目可以幫你實現上面的一些功能。但是,如果缺少良好的IDE 支持,抽取循環中的語句作為單獨的方法是一件低效又麻煩的事情。

2 通道/ 並行切片處理

Go 通道通常都很好用。但它並不能提供無限的並發能力。它確實存在一些會導致永久阻塞的問題,但這些問題用競爭檢測器能很容易地解決。對於數量不確定或不知何時結束的流式數據,以及非CPU 密集型的數據處理方法,Go 通道都是很好的選擇。

Go 通道不太適合併行處理大小已知的切片。

多線程編程、理論和實踐

image

幾乎在其它任何語言中,當列表或切片很大時,為了充分利用所有CPU 內核,通常都會使用並行流、並行Linq、Rayon、多處理或其它語法來遍歷列表。遍歷後的返回值是一個包含已處理元素的列表。如果元素足夠多,或者處理元素的函數足夠複雜,多核系統會更高效。

但是在Go 語言中,實現高效處理所需要做的事情卻並不顯而易見。

一種可能的解決方案是為切片中的每個元素都創建一個Go 例程。由於Go 例程的開銷很低,因此從某種程度上來說這是一個有效的策略。

複製代碼

toProcess := []int{1,2,3,4,5,6,7,8,9}
var wg sync.WaitGroup
for i, _ := range toProcess {
wg.Add(1)
go func(j int) {
toProcess[j] = someSlowCalculation(toProcess[j])
wg.Done()
}(i)
}
wg.Wait()
fmt.Println(toProcess)

上面的代碼會保持切片中元素的順序,但我們假設不必保持元素順序。

這段代碼的第一個問題是增加了一個WaitGroup,並且必須要記得調用它的Add 和Done 方法。這增加了開發人員的工作量。如果弄錯了,這個程序不會產生正確的輸出,結果是要么輸出不確定,要么程序永不結束。此外,如果列表很長,你會為每個列表創建一個Go 例程。正如我之前所說,這不是問題,因為Go 能輕鬆搞定。問題在於,每個Go 例程都會爭搶CPU 時間片。因此,這不是執行該任務的最有效方式。

你可能希望為每個CPU內核創建一個Go例程,並讓這些例程選取列表並處理。創建Go例程的開銷很小,但是在一個非常緊湊的循環中創建它們會使開銷陡增。當我開發scc時就遇到了這種情況,因此我採用了每個CPU內核對應一個Go例程的策略。在Go語言中,要這樣做的話,你首先要創建一個通道,然後遍歷切片中的元素,使函數從該通道讀取數據,之後從另一個通道讀取。我們來看一下。

複製代碼

toProcess := []int{1,2,3,4,5,6,7,8,9}
var input = make(chan int, len(toProcess))
for i, _ := range toProcess {
input <- i
}
close(input)
var wg sync.WaitGroup
for i := 0; i < runtime.NumCPU(); i++ {
wg.Add(1)
go func(input chan int, output []int) {
for j := range input {
toProcess[j] = someSlowCalculation(toProcess[j])
}
wg.Done()
}(input, toProcess)
}
wg.Wait()
fmt.Println(toProcess)

上面的代碼創建了一個通道,然後遍歷切片,將索引值放入通道。接下來我們為每個CPU 內核創建一個Go 例程,操作系統會報告並處理相應的輸入,然後等待,直到所有操作完成。這裡有很多代碼需要理解。

然而,這種實現有待商榷。如果切片非常大,通道的緩衝區長度和切片大小相同,你可能不希望創建一個有這麼大緩衝區的通道。因此,你應該創建另一個Go 例程來遍歷切片,並將切片中的值放入通道,完成後關閉通道。但這樣一來代碼會變得冗長,因此我把它去掉了。我希望可以大概地闡明基本思路。

使用Java 語言大致這樣實現:

複製代碼

var firstList = List.of(1,2,3,4,5,6,7,8,9);
firstList = firstList.parallelStream()
.map(this::someSlowCalculation)
.collect(Collectors.toList());

通道和流並不等價。使用隊列去仿寫Go 代碼的邏輯更好一些,因為它們更具有可比性,但我們的目的不是進行1 對1 的比較。我們的目標是充分利用所有的CPU 內核處理切片或列表。

如果someSlowCalucation 方法調用了網絡或其它非CPU 密集型任務,這當然不是問題。在這種情況下,通道和Go 例程都會表現得很好。

這個問題與問題#1 有關。如果Go 語言支持適用於切片/Map 對象的函數式方法,那麼就能實現這個功能。但是,如果Go 語言支持泛型,有人就可以把上面的功能封裝成像Rust 的Rayon 一樣的庫,讓每個人都從中受益,這就很令人討厭了(我不希望Go 支持泛型)。

順便說一下,我認為這個缺陷妨礙了Go 語言在數據科學領域的成功,這也是為什麼Python 仍然是數據科學領域的王者。Go 語言在數值操作方面缺乏表現力和能力,原因就是以上討論的這些。

3 垃圾回收器

Go 的垃圾回收器做得非常不錯。我開發的應用程序通常都會因為新版本的改進而變得更快。但是,它以低延遲為最高優先級。對於API 和UI 應用來說,這個選擇完全可以接受。對於包含網絡調用的應用,因為網絡調用往往會是瓶頸,所以它也沒問題。

我發現的問題是Go對UI應用來講一點也不好(我不知道它有任何良好的支持)。如果你想要盡可能高的吞吐量,那這個選擇會讓你很受傷。這是我開發scc時遇到的一個主要問題。scc是一個CPU密集型的命令行工具。為了解決這個問題,我不得不在代碼裡添加邏輯關閉GC,直到達到某個閾值。但是我又不能簡單的禁用它,因為有些任務會很快耗盡內存。

缺乏對GC 的控制時常令人沮喪。你得學會適應它,但是,有時候如果能做到這樣該有多好:“嘿,這些代碼確實需要盡可能快地運行,所以如果你能在高吞吐模式運行一會,那就太好了。”

image

我認為這種情況在Go 1.12 版本中有所改善,因為GC 得到了進一步的改進。但僅僅是關閉和打開GC 還不夠,我期望更多的控制。如果有時間我會再進行研究。

4 錯誤處理

我並不是唯一一個抱怨這個問題的人,但我不吐不快。

複製代碼

value, err := someFunc()
if err != nil {
// Do something here
}
err = someOtherFunc(value)
if err != nil {
// Do something here
}

上面的代碼很乏味。Go 甚至不會像有些人建議的那樣強制你處理錯誤。你可以使用“_”顯式忽略它(這是否算作對它進行了處理呢?),你還可以完全忽略它。比如上面的代碼可以重寫為:

複製代碼

value, _ := someFunc()
someOtherFunc(value)

很顯然,我顯式忽略了someFunc 方法的返回。someOtherFunc(value)方法也可能返回錯誤值,但我完全忽略了它。這裡的錯誤都沒有得到處理。

說實話,我不知道如何解決這個問題。我喜歡Rust中的“?”運算符,它可以幫助避免這種情況。V-Lang https://vlang.io/看起來也可能有一些有趣的解決方案。

另一個辦法是使用可選類型(Optional types)並去掉nil,但這不會發生在Go 語言裡,即使是Go 2.0 版本,因為它會破壞向後兼容性。

結語

Go 仍然是一種非常不錯的語言。如果你讓我寫一個API,或者完成某個需要大量磁盤/ 網絡調用的任務,它依然是我的首選。現在我會用Go 而非Python 去完成很多一次性任務,數據合併任務是例外,因為函數式編程的缺失使執行效率難以達到要求。

與Java 不同,Go 語言盡量遵循“最小驚喜“原則。比如可以這樣比較字兩個符串是否相等:stringA == stringB。但如果你這樣比較兩個切片,那麼會產生編譯錯誤。這些都是很好的特性。

的確,二進製文件還可以變的更小(一些編譯標誌和upx可以解決這個問題),我希望它在某些方面變得更快,GOPATH雖然不是很好,但也沒有人們想得那麼糟糕,默認的單元測試框架缺少很多功能,模擬(mocking)有點讓人痛苦…

它仍然是我使用過的效率較高的語言之一。我會繼續使用它,雖然我希望https://vlang.io/能最終發布,並解決我的很多抱怨。V語言或Go 2.0,Nim或Rust。現在有很多很酷的新語言可以使用,我們開發人員真的要被寵壞了。

查看英文原文:

https://boyter.org/posts/my-personal-complaints-about-golang/

IDEA 2019.02.07注册码

N757JE0KCT-eyJsaWNlbnNlSWQiOiJONzU3SkUwS0NUIiwibGljZW5zZWVOYW1lIjoid3UgYW5qdW4iLCJhc3NpZ25lZU5hbWUiOiIiLCJhc3NpZ25lZUVtYWlsIjoiIiwibGljZW5zZVJlc3RyaWN0aW9uIjoiRm9yIGVkdWNhdGlvbmFsIHVzZSBvbmx5IiwiY2hlY2tDb25jdXJyZW50VXNlIjpmYWxzZSwicHJvZHVjdHMiOlt7ImNvZGUiOiJJSSIsInBhaWRVcFRvIjoiMjAyMC0wMS0wNyJ9LHsiY29kZSI6IkFDIiwicGFpZFVwVG8iOiIyMDIwLTAxLTA3In0seyJjb2RlIjoiRFBOIiwicGFpZFVwVG8iOiIyMDIwLTAxLTA3In0seyJjb2RlIjoiUFMiLCJwYWlkVXBUbyI6IjIwMjAtMDEtMDcifSx7ImNvZGUiOiJHTyIsInBhaWRVcFRvIjoiMjAyMC0wMS0wNyJ9LHsiY29kZSI6IkRNIiwicGFpZFVwVG8iOiIyMDIwLTAxLTA3In0seyJjb2RlIjoiQ0wiLCJwYWlkVXBUbyI6IjIwMjAtMDEtMDcifSx7ImNvZGUiOiJSUzAiLCJwYWlkVXBUbyI6IjIwMjAtMDEtMDcifSx7ImNvZGUiOiJSQyIsInBhaWRVcFRvIjoiMjAyMC0wMS0wNyJ9LHsiY29kZSI6IlJEIiwicGFpZFVwVG8iOiIyMDIwLTAxLTA3In0seyJjb2RlIjoiUEMiLCJwYWlkVXBUbyI6IjIwMjAtMDEtMDcifSx7ImNvZGUiOiJSTSIsInBhaWRVcFRvIjoiMjAyMC0wMS0wNyJ9LHsiY29kZSI6IldTIiwicGFpZFVwVG8iOiIyMDIwLTAxLTA3In0seyJjb2RlIjoiREIiLCJwYWlkVXBUbyI6IjIwMjAtMDEtMDcifSx7ImNvZGUiOiJEQyIsInBhaWRVcFRvIjoiMjAyMC0wMS0wNyJ9LHsiY29kZSI6IlJTVSIsInBhaWRVcFRvIjoiMjAyMC0wMS0wNyJ9XSwiaGFzaCI6IjExNTE5OTc4LzAiLCJncmFjZVBlcmlvZERheXMiOjAsImF1dG9Qcm9sb25nYXRlZCI6ZmFsc2UsImlzQXV0b1Byb2xvbmdhdGVkIjpmYWxzZX0=-AE3x5sRpDellY4SmQVy2Pfc2IT7y1JjZFmDA5JtOv4K5gwVdJOLw5YGiOskZTuGu6JhOi50nnd0WaaNZIuVVVx3T5MlXrAuO3kb2qPtLtQ6/n3lp4fIv+6384D4ciEyRWijG7NA9exQx39Tjk7/xqaGk7ooKgq5yquIfIA+r4jlbW8j9gas1qy3uTGUuZQiPB4lv3P5OIpZzIoWXnFwWhy7s//mjOWRZdf/Du3RP518tMk74wizbTeDn84qxbM+giNAn+ovKQRMYHtLyxntBiP5ByzfAA9Baa5TUGW5wDiZrxFuvBAWTbLrRI0Kd7Nb/tB9n1V9uluB2WWIm7iMxDg==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxcQkq+zdxlR2mmRYBPzGbUNdMN6OaXiXzxIWtMEkrJMO/5oUfQJbLLuMSMK0QHFmaI37WShyxZcfRCidwXjot4zmNBKnlyHodDij/78TmVqFl8nOeD5+07B8VEaIu7c3E1N+e1doC6wht4I4+IEmtsPAdoaj5WCQVQbrI8KeT8M9VcBIWX7fD0fhexfg3ZRt0xqwMcXGNp3DdJHiO0rCdU+Itv7EmtnSVq9jBG1usMSFvMowR25mju2JcPFp1+I4ZI+FqgR8gyG8oiNDyNEoAbsR3lOpI7grUYSvkB/xVy/VoklPCK2h0f0GJxFjnye8NT1PAywoyl7RmiAVRE/EKwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQAF8uc+YJOHHwOFcPzmbjcxNDuGoOUIP+2h1R75Lecswb7ru2LWWSUMtXVKQzChLNPn/72W0k+oI056tgiwuG7M49LXp4zQVlQnFmWU1wwGvVhq5R63Rpjx1zjGUhcXgayu7+9zMUW596Lbomsg8qVve6euqsrFicYkIIuUu4zYPndJwfe0YkS5nY72SHnNdbPhEnN8wcB2Kz+OIG0lih3yz5EqFhld03bGp222ZQCIghCTVL6QBNadGsiN/lWLl4JdR3lJkZzlpFdiHijoVRdWeSWqM4y0t23c92HXKrgppoSV18XMxrWVdoSM3nuMHwxGhFyde05OdDtLpCv+jlWf5REAHHA201pAU6bJSZINyHDUTB+Beo28rRXSwSh3OUIvYwKNVeoBY+KwOJ7WnuTCUq1meE6GkKc4D/cXmgpOyW/1SmBz3XjVIi/zprZ0zf3qH5mkphtg6ksjKgKjmx1cXfZAAX6wcDBNaCL+Ortep1Dh8xDUbqbBVNBL4jbiL3i3xsfNiyJgaZ5sX7i8tmStEpLbPwvHcByuf59qJhV/bZOl8KqJBETCDJcY6O2aqhTUy+9x93ThKs1GKrRPePrWPluud7ttlgtRveit/pcBrnQcXOl1rHq7ByB8CFAxNotRUYL9IF5n3wJOgkPojMy6jetQA5Ogc8Sm7RG6vg1yow==

本博客Nginx 配置之安全篇

之前有細心的朋友問我,為什麼你的博客副標題是「專注WEB 端開發」,是不是少了「前端」的「前」。我想說的是,儘管我從畢業到現在七年左右的時間一直都在專業前端團隊從事前端相關工作,但這並不意味著我的知識體係就必須局限於前端這個範疇內。現在比較流行「全棧工程師」的概念,我覺得全棧意味著一個項目中,各個崗位所需要的技能你都具備,但並不一定意味著你什麼都需要做。你需要做什麼,更多是由能力、人員配比以及成本等各個因素所決定。儘管我現在的工作職責是在WEB 前端領域,但是我的關注點在整個WEB 端。

我接觸過的有些前端朋友,從一開始就把自己局限在一個很小的範圍之中,這在大公司到也無所謂,大公司分工明確,基礎設施齊全,你只要做好自己擅長的那部分就可以了。但是當他們進入創業公司之後,會發現一下子來了好多之前完全沒有接觸過的東西,十分被動。

去年我用Lua + OpenResty替換了線上千萬級的PHP + Nginx服務,至今穩定運行,算是前端之外的一點嘗試。我一直認為學習任何知識很重要的一點是實踐,所以我一直都在折騰我的VPS,進行各種WEB安全、優化相關的嘗試。我打算從安全和性能兩方面介紹一下本博客所用Nginx的相關配置,今天先寫安全相關的。

隱藏不必要的信息

大家可以看一下我的博客請求響應頭,有這麼一行server: nginx,說明我用的是Nginx服務器,但並沒有具體的版本號。由於某些Nginx漏洞只存在於特定的版本,隱藏版本號可以提高安全性。這只需要在配置裡加上這個就可以了:

server_tokens   off;

如果想要更徹底隱藏所用Web Server,可以修改Nginx源碼,把Server Name改掉再編譯,具體步驟可以自己搜索。需要提醒的是:如果你的網站支持SPDY,只改動網上那些文章寫到的地方還不夠,跟SPDY有關的代碼也要改。更簡單的做法是改用Tengine這個Nginx的增強版,並指定server_tag為off或者任何想要的值就可以了。另外,既然想要徹底隱藏Nginx,404、500等各種出錯頁也需要自定義。

同樣,一些WEB語言或框架默認輸出的x-powered-by也會洩露網站信息,他們一般都提供了修改或移除的方法,可以自行查看手冊。如果部署上用到了Nginx的反向代理,也可以通過proxy_hide_header指令隱藏它:

proxy_hide_header        X-Powered-By;

禁用非必要的方法

由於我的博客只處理了GET、POST 兩種請求方法,而HTTP/1 協議還規定了TRACE 這樣的方法用於網絡診斷,這也可能會暴露一些信息。所以我針對GET、POST 以及HEAD 之外的請求,直接返回了444 狀態碼(444 是Nginx 定義的響應狀態碼,會立即斷開連接,沒有響應正文)。具體配置是這樣的:

NGINXif ($request_method !~ ^(GET|HEAD|POST)$ ) {
    return    444;
}

合理配置響應頭

我的博客是由自己用ThinkJS 寫的Node 程序提供服務,Nginx 通過proxy_pass 把請求反向代理給Node 綁定的IP 和端口。在最終輸出時,我給響應增加了以下頭部:

NGINXadd_header  Strict-Transport-Security  "max-age=31536000";
add_header  X-Frame-Options  deny;
add_header  X-Content-Type-Options  nosniff;
add_header  Content-Security-Policy  "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://a.disquscdn.com; img-src 'self' data: https://www.google-analytics.com; style-src 'self' 'unsafe-inline'; frame-src https://disqus.com";

Strict-Transport-Security(簡稱為HSTS)可以告訴瀏覽器,在指定的max-age內,始終通過HTTPS訪問我的博客。即使用戶自己輸入HTTP的地址,或者點擊了HTTP鏈接,瀏覽器也會在本地替換為HTTPS再發送請求。另外由於我的證書不支持多域名,我沒有加上includeSubDomains。關於HSTS更多信息,可以查看我之前的介紹

X-Frame-Options用來指定此網頁是否允許被iframe嵌套,deny就是不允許任何嵌套發生。關於這個響應頭的更多介紹可以看這裡

X-Content-Type-Options用來指定瀏覽器對未指定或錯誤指定Content-Type資源真正類型的猜測行為,nosniff表示不允許任何猜測。這部分內容更多介紹見這裡

Content-Security-Policy(簡稱為CSP)用來指定頁面可以加載哪些資源,主要目的是減少XSS的發生。我允許了來自本站、disquscdn的外鏈JS,還允許內聯JS,以及在JS中使用eval;允許來自本站和google統計的圖片,以及內聯圖片(Data URI形式);允許本站外鏈CSS以及內聯CSS;允許iframe加載來自disqus的頁面。對於其他未指定的資源,都會走默認規則self,也就是只允許加載本站的。關於CSP的詳細介紹請看這裡

之前的博客中,我還介紹過X-XSS-Protection這個響應頭,也可以用來防範XSS。不過由於有了CSP,所以我沒配置它。

需要注意的是,以上這些響應頭現代瀏覽器才支持,所以並不是說加上他們,網站就可以不管XSS,萬事大吉了。但是鑑於低廉的成本,還是都配上。

HTTPS 安全配置

啟用HTTPS 並正確配置了證書,意味著數據傳輸過程中無法被第三者解密或修改。有了HTTPS,也得合理配置好Web Server,才能發揮最大價值。我的博客關於HTTPS 這一塊有以下配置:

NGINXssl_certificate      /home/jerry/ssl/server.crt;
ssl_certificate_key  /home/jerry/ssl/server.key;
ssl_dhparam          /home/jerry/ssl/dhparams.pem;

ssl_ciphers          ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES128-GCM-SHA256:AES256-GCM-SHA384:DES-CBC3-SHA;

ssl_prefer_server_ciphers  on;

ssl_protocols        TLSv1 TLSv1.1 TLSv1.2;

最終效果是我的博客在ssllabs的測試中達到了A+,如下圖:

ssllabs test

如何配置ssl_ciphers可以參考這個網站。需要注意的是,這個網站默認提供的加密方式安全性較高,一些低版本客戶端並不支持,例如IE9-、Android2.2-和Java6-。如果需要支持這些老舊的客戶端,需要點一下網站上的「Yes, give me a ciphersuite that works with legacy / old software」鏈接。

另外,我在ssl_ciphers最開始加上了CHACHA20,這是因為我的Nginx支持了CHACHA20_POLY1305加密算法,這是由Google開發的新一代加密方式,它有兩方面優勢:更好的安全性和更好的性能(尤其是在移動和可穿戴設備上)。下面有一張移動平台上它與AES-GCM的加密速度對比圖(via):

chacha20 poly1305

啟用CHACHA20_POLY1305最簡單的方法是在編譯Nginx時,使用LibreSSL代替OpenSSL。下面是用Chrome訪問我的博客時,點擊地址欄小鎖顯示的信息,可以看到加密方式使用的就是CHACHA20_POLY1305:

imququ.com

關於CHACHA20_POLY1305安全性和性能的詳細介紹可以查看本文

補充:使用CHACHA20_POLY1305的最佳實踐是「僅針對不支持AES-NI的終端使用CHACHA20算法,否則使用AES-GCM」。關於這個話題的詳細解釋和配置方法,請參考我的這篇文章:使用BoringSSL優化HTTPS加密算法選擇

關於ssl_dhparam的配置,可以參考這篇文章:Guide to Deploying Diffie-Hellman for TLS

SSLv3已被證實不安全,所以在ssl_protocols指令中,我並沒有包含它。

ssl_prefer_server_ciphers配置為on,可以確保在TLSv1握手時,使用服務端的配置項,以增強安全性。

好了,本文先就這樣,後面再寫跟性能有關的配置。

一天学会PostgreSQL应用开发与管理 – 1 如何搭建一套学习、开发PostgreSQL的环境

背景

万事开头难,搭建好一套学习、开发PostgreSQL的环境,是重中之重。

因为其他平台(Ubuntu, CentOS, MAC)的用户大多数都具备了自行安装数据库的能力,在这里我只写一个面向Windows用户的学习环境搭建文档。

分为三个部分,用户可以自由选择。

如果你想深入的学习PostgreSQL,建议搭建PostgreSQL on Linux的环境。如果你只是想将数据库使用在日常的应用开发工作中,有也不需要PG的其他附加插件的功能,那么你可以选择PostgreSQL on Win的环境搭建。

如果你不想搭建本地的PostgreSQL,那么你可以使用云数据库服务,比如阿里云RDS for PostgreSQL。

本章大纲

一、PostgreSQL on Win环境搭建

1 环境要求

2 下载PostgreSQL安装包

3 解压PostgreSQL安装包

4 下载pgadmin安装包(可选)

5 安装pgadmin(可选)

6 规划数据文件目录

7 初始化数据库集群

8 配置postgresql.conf

9 配置pg_hba.conf(可选)

10 启动、停止数据库集群

11 如何自动启动数据库集群

12 使用psql 命令行连接数据库

13 新增用户

14 使用psql帮助

15 使用psql语法补齐

16 使用psql sql语法帮助

17 查看当前配置

18 设置会话参数

19 在psql中切换到另一个用户或数据库

20 使用pgadmin4连接数据库

21 文档

二、PostgreSQL on Linux(虚拟机)环境搭建

1 环境要求

2 下载Linux镜像

3 安装VMware Workstation(试用版本)

4 安装securecrt(试用版本)

5 安装Linux虚拟机

6 配置Linux虚拟机网络

7 securecrt终端连接Linux

8 配置linux

9 配置yum仓库(可选)

10 创建普通用户

11 规划数据库存储目录

12 下载PostgreSQL源码

13 安装PostgreSQL

14 配置linux用户环境变量

15 初始化数据库集群

16 配置数据库

17 启动数据库集群

18 连接数据库

19 安装pgadmin(可选)

20 配置pgadmin(可选)

21 使用pgadmin连接数据库(可选)

三、云数据库RDS for PostgreSQL

1 购买云数据库

2 设置并记住RDS for PostgreSQL数据库根用户名和密码

3 配置网络

4 配置白名单

5 本地安装pgadmin(可选)

6 本地配置pgadmin(可选)

7 使用pgadmin连接RDS PostgreSQL数据库(可选)

一、PostgreSQL on Win环境搭建

1 环境要求

Win 7 x64, 8GB以上内存, 4核以上, SSD硬盘(推荐),100GB以上剩余空间, 可以访问公网(10MB/s以上网络带宽)

2 下载PostgreSQL安装包

https://www.postgresql.org/download/windows/

建议下载高级安装包,不需要安装,直接使用。

下载win x64的版本(建议下载最新版本)

http://www.enterprisedb.com/products/pgbindownload.do

例如

https://get.enterprisedb.com/postgresql/postgresql-9.6.2-3-windows-x64-binaries.zip

3 解压PostgreSQL安装包

postgresql-9.6.2-3-windows-x64-binaries.zip

例如解压到d:\pgsql

pic

bin: 二进制文件

doc: 文档

include: 头文件

lib: 动态库

pgAdmin 4: 图形化管理工具

share: 扩展库

StackBuilder: 打包库

symbols: 符号表

4 下载pgadmin安装包(可选)

如果PostgreSQL包中没有包含pgAdmin,建议自行下载一个

建议下载pgadmin4(pgadmin3不再维护)

https://www.pgadmin.org/index.php

https://www.postgresql.org/ftp/pgadmin3/pgadmin4/v1.3/windows/

5 安装pgadmin(可选)

6 规划数据文件目录

例如将D盘的pgdata作为数据库目录。

新建d:\pgdata空目录。

7 初始化数据库集群

以管理员身份打开cmd.exe

pic

>d:  
  
>cd pgsql  
  
>cd bin  
  
>initdb.exe -D d:\pgdata -E UTF8 --locale=C -U postgres  
  
初始化时,指定数据库文件目录,字符集,本地化,数据库超级用户名  

pic

pic

8 配置postgresql.conf

数据库配置文件名字postgresql.conf,这个文件在数据文件目录D:\pgdata中。

将以下内容追加到postgresql.conf文件末尾

listen_addresses = '0.0.0.0'  
port = 1921  
max_connections = 200  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 6  
shared_buffers = 512MB  
maintenance_work_mem = 64MB  
dynamic_shared_memory_type = windows  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 1000  
bgwriter_lru_multiplier = 5.0  
bgwriter_flush_after = 0  
old_snapshot_threshold = -1  
wal_level = minimal  
synchronous_commit = off  
full_page_writes = on  
wal_buffers = 64MB  
wal_writer_delay = 10ms  
wal_writer_flush_after = 4MB  
checkpoint_timeout = 35min  
max_wal_size = 2GB  
min_wal_size = 80MB  
checkpoint_completion_target = 0.1  
checkpoint_flush_after = 0  
random_page_cost = 1.5  
log_destination = 'csvlog'  
logging_collector = on  
log_directory = 'pg_log'  
log_truncate_on_rotation = on  
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose  
log_temp_files = 8192  
log_timezone = 'Asia/Hong_Kong'  
autovacuum = on  
log_autovacuum_min_duration = 0  
autovacuum_naptime = 20s  
autovacuum_vacuum_scale_factor = 0.05  
autovacuum_freeze_max_age = 1500000000  
autovacuum_multixact_freeze_max_age = 1600000000  
autovacuum_vacuum_cost_delay = 0  
vacuum_freeze_table_age = 1400000000  
vacuum_multixact_freeze_table_age = 1500000000  
datestyle = 'iso, mdy'  
timezone = 'Asia/Hong_Kong'  
lc_messages = 'C'  
lc_monetary = 'C'  
lc_numeric = 'C'  
lc_time = 'C'  
default_text_search_config = 'pg_catalog.english'  

9 配置pg_hba.conf(可选)

数据库防火墙文件名字pg_hba.conf,这个文件在数据文件目录D:\pgdata中。

将以下内容追加到文件末尾,表示允许网络用户使用用户密码连接你的postgresql数据库.

host all all 0.0.0.0/0 md5  

10 启动、停止数据库集群

使用命令行启动数据库集群

>d:  
  
>cd pgsql  
  
>cd bin  
  
D:\pgsql\bin>pg_ctl.exe start -D d:\pgdata  
正在启动服务器进程  
  
D:\pgsql\bin>LOG:  00000: redirecting log output to logging collector process  
HINT:  Future log output will appear in directory "pg_log".  
LOCATION:  SysLogger_Start, syslogger.c:622  

使用命令行停止数据库集群

D:\pgsql\bin>pg_ctl.exe stop -m fast -D "d:\pgdata"
等待服务器进程关闭 .... 完成
服务器进程已经关闭

11 如何自动启动数据库集群

配置windows自动启动服务.

12 使用psql 命令行连接数据库

psql -h IP地址 -p 端口 -U 用户名 数据库名

D:\pgsql\bin>psql -h 127.0.0.1 -p 1921 -U postgres postgres  
psql (9.6.2)  
输入 "help" 来获取帮助信息.  
  
postgres=# \dt  

13 新增用户

新建用户属于数据库操作,先使用psql和超级用户postgres连接到数据库。

新增一个普通用户

postgres=# create role digoal login encrypted password 'pwd_digoal';  
CREATE ROLE  

新增一个超级用户

postgres=# create role dba_digoal login superuser encrypted password 'dba_pwd_digoal';  
CREATE ROLE  

新增一个流复制用户

postgres=# create role digoal_rep replication login encrypted password 'pwd';  
CREATE ROLE  

你还可以将一个用户在不同角色之间切换

例如将digoal设置为超级用户

postgres=# alter role digoal superuser;  
ALTER ROLE  

查看已有用户

postgres=# \du+  
                                 角色列表  
  角色名称  |                    属性                    | 成员属于 | 描述  
------------+--------------------------------------------+----------+------  
 dba_digoal | 超级用户                                   | {}       |  
 digoal     | 超级用户                                   | {}       |  
 digoal_rep | 复制                                       | {}       |  
 postgres   | 超级用户, 建立角色, 建立 DB, 复制, 绕过RLS | {}       |  

14 使用psql帮助

psql有很多快捷的命令,使用\?就可以查看。

postgres=# \?  
一般性  
  \copyright            显示PostgreSQL的使用和发行许可条款  
  \errverbose            以最冗长的形式显示最近的错误消息  
  \g [文件] or;     执行查询 (并把结果写入文件或 |管道)  
  \gexec                 执行策略,然后执行其结果中的每个值  
  \gset [PREFIX]     执行查询并把结果存到psql变量中  
  \q             退出 psql  
  \crosstabview [COLUMNS] 执行查询并且以交叉表显示结果  
  \watch [SEC]          每隔SEC秒执行一次查询  
  
帮助  
  \? [commands]          显示反斜线命令的帮助  
  
  ......  
  

15 使用psql语法补齐

如果你编译PostgreSQL使用了补齐选项,那么在psql中按TAB键,可以自动补齐命令。

16 使用psql sql语法帮助

如果你忘记了某个SQL的语法,使用\h 命令即可打印命令的帮助

例如

postgres=# \h create table  
命令:       CREATE TABLE  
描述:       建立新的数据表  
语法:  
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI  
STS ] 表名 ( [  
  { 列名称 数据_类型 [ COLLATE 校对规则 ] [ 列约束 [ ... ] ]  
    | 表约束  
    | LIKE 源表 [ like选项 ... ] }  
    [, ... ]  
] )  
  
......  

17 查看当前配置

show 参数名

postgres=# show client_encoding;  
 client_encoding  
-----------------  
 GBK  
(1 行记录)  

查看pg_settings

postgres=# select * from pg_settings;  

18 设置会话参数

set 参数名=值;

postgres=# set client_encoding='sql_ascii';  
SET  

19 在psql中切换到另一个用户或数据库

\c 切换到其他用户或数据库

postgres=# \c template1 digoal  
您现在已经连接到数据库 "template1",用户 "digoal".  

20 使用pgadmin4连接数据库

pgAdmin4被安装在这个目录

d:\pgsql\pgAdmin 4\bin  

双击pgAdmin4.exe打开pgadmin4(有点耗时,自动启动HTTPD服务)

点击server,右键,创建server.

配置server别名,连接数据库的 IP,端口,用户,密码,数据库名

pic

21 文档

PostgreSQL的安装包中包含了pgadmin, PostgreSQL的文档,找到对应的doc目录,打开index.html。

二、PostgreSQL on Linux(虚拟机)环境搭建

1 环境要求

Win 7 x64, 8GB以上内存, 4核以上, SSD硬盘(推荐),100GB以上剩余空间, 可以访问公网(10MB/s以上网络带宽)

2 下载Linux镜像

http://isoredirect.centos.org/centos/6/isos/x86_64/

http://mirrors.163.com/centos/6.9/isos/x86_64/CentOS-6.9-x86_64-minimal.iso

3 安装VMware Workstation(试用版本)

http://www.vmware.com/cn/products/workstation/workstation-evaluation.html

4 安装securecrt(试用版本)

securecrt可以用来连接Linux终端,方便使用

https://www.vandyke.com/products/securecrt/windows.html

5 安装Linux虚拟机

打开vmware, 创建虚拟机, 选择CentOS 6 x64版本.

1. 配置建议:

4G内存,40G磁盘,2核以上,NAT网络模式。

2. 安装建议:

minimal最小化安装。

3. root密码:

记住你设置的root密码。

4. Linux安装配置建议

配置主机名,配置网络(根据你的vmware NAT网络进行配置),关闭selinux,关闭防火墙或开放ssh端口(测试环境)。

6 配置Linux虚拟机网络

vmware窗口连接linux

例子,192.168.150 请参考你的vmware NAT网络修改一下。

配置网关

vi /etc/sysconfig/network  
  
NETWORKING=yes  
HOSTNAME=digoal01  
GATEWAY=192.168.150.2  

配置IP

cat /etc/sysconfig/network-scripts/ifcfg-eth0   
  
DEVICE=eth0  
TYPE=Ethernet  
UUID=d28f566a-b0b9-4bde-95e7-20488af19eb6  
ONBOOT=yes  
NM_CONTROLLED=yes  
BOOTPROTO=static  
HWADDR=00:0C:29:5D:6D:9C  
IPADDR=192.168.150.133  
PREFIX=24  
GATEWAY=192.168.150.2  
DNS1=192.168.150.2  
DEFROUTE=yes  
IPV4_FAILURE_FATAL=yes  
IPV6INIT=no  
NAME="System eth0"  

配置DNS

cat /etc/resolv.conf  
  
nameserver 192.168.150.2  

重启网络服务

service network restart  

7 securecrt终端连接Linux

添加一个session,连接到Linux虚拟机。

pic

8 配置linux

1. /etc/sysctl.conf

vi /etc/sysctl.conf  
  
追加到文件末尾  
  
kernel.shmall = 4294967296  
kernel.shmmax=135497418752  
kernel.shmmni = 4096  
kernel.sem = 50100 64128000 50100 1280  
fs.file-max = 7672460  
fs.aio-max-nr = 1048576  
net.ipv4.ip_local_port_range = 9000 65000  
net.core.rmem_default = 262144  
net.core.rmem_max = 4194304  
net.core.wmem_default = 262144  
net.core.wmem_max = 4194304  
net.ipv4.tcp_max_syn_backlog = 4096  
net.core.netdev_max_backlog = 10000  
net.ipv4.netfilter.ip_conntrack_max = 655360  
net.ipv4.tcp_timestamps = 0  
net.ipv4.tcp_tw_recycle=1  
net.ipv4.tcp_timestamps=1  
net.ipv4.tcp_keepalive_time = 72   
net.ipv4.tcp_keepalive_probes = 9   
net.ipv4.tcp_keepalive_intvl = 7  
vm.zone_reclaim_mode=0  
vm.dirty_background_bytes = 40960000  
vm.dirty_ratio = 80  
vm.dirty_expire_centisecs = 6000  
vm.dirty_writeback_centisecs = 50  
vm.swappiness=0  
vm.overcommit_memory = 0  
vm.overcommit_ratio = 90  

生效

sysctl -p  

2. /etc/security/limits.conf

vi /etc/security/limits.conf   
  
* soft    nofile  131072  
* hard    nofile  131072  
* soft    nproc   131072  
* hard    nproc   131072  
* soft    core    unlimited  
* hard    core    unlimited  
* soft    memlock 500000000  
* hard    memlock 500000000  

3. /etc/security/limits.d/*

rm -f /etc/security/limits.d/*  

4. 关闭selinux

# vi /etc/sysconfig/selinux   
  
SELINUX=disabled  
SELINUXTYPE=targeted  

5. 配置OS防火墙
(建议按业务场景设置,我这里先清掉)

iptables -F  

配置范例

# 私有网段  
-A INPUT -s 192.168.0.0/16 -j ACCEPT  
-A INPUT -s 10.0.0.0/8 -j ACCEPT  
-A INPUT -s 172.16.0.0/16 -j ACCEPT  

重启linux。

reboot  

9 配置yum仓库(可选)

在linux虚拟机中,找一个有足够空间的分区,下载ISO镜像

wget http://mirrors.163.com/centos/6.9/isos/x86_64/CentOS-6.9-x86_64-bin-DVD1.iso  
  
wget http://mirrors.163.com/centos/6.9/isos/x86_64/CentOS-6.9-x86_64-bin-DVD2.iso  

新建ISO挂载点目录

mkdir /mnt/cdrom1  
mkdir /mnt/cdrom2  

挂载ISO

mount -o loop,defaults,ro /u01/CentOS-6.8-x86_64-bin-DVD1.iso /mnt/cdrom1  
mount -o loop,defaults,ro /u01/CentOS-6.8-x86_64-bin-DVD2.iso /mnt/cdrom2  

备份并删除原有的YUM配置文件

mkdir /tmp/yum.bak  
cd /etc/yum.repos.d/  
mv * /tmp/yum.bak/  

新增YUM配置文件

cd /etc/yum.repos.d/  
  
vi local.repo  
  
[local-yum]  
name=Local Repository  
baseurl=file:///mnt/cdrom1  
enabled=1  
gpgcheck=0  

刷新YUM缓存

yum clean all  

测试

yum list  
  
yum install createrepo   -- 方便后面测试  

修改YUM配置,修改路径为上层目录

cd /etc/yum.repos.d/  
  
vi local.repo  
  
[local-yum]  
name=Local Repository  
baseurl=file:///mnt/  
enabled=1  
gpgcheck=0  

创建YUM索引

cd /mnt/  
createrepo .  

刷新YUM缓存,测试

yum clean all  
  
yum list  
  
yum install vim  

10 创建普通用户

useradd digoal  

11 规划数据库存储目录

假设/home分区有足够的空间, /home/digoal/pgdata规划为数据文件目录

Filesystem      Size  Used Avail Use% Mounted on  
/dev/sda3        14G  5.7G  7.2G  45% /  

12 下载PostgreSQL源码

https://www.postgresql.org/ftp/source/

su - digoal  
  
wget https://ftp.postgresql.org/pub/source/v9.6.2/postgresql-9.6.2.tar.bz2  

13 安装PostgreSQL

安装依赖包

root用户下,使用yum 安装依赖包  
  
yum -y install coreutils glib2 lrzsz mpstat dstat sysstat e4fsprogs xfsprogs ntp readline-devel zlib-devel openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc make smartmontools flex bison perl-devel perl-Ext  
Utils* openldap-devel jadetex  openjade bzip2  

编译安装PostgreSQL

digoal用户下,编译安装PostgreSQL  
  
tar -jxvf postgresql-9.6.2.tar.bz2  
cd postgresql-9.6.2  
./configure --prefix=/home/digoal/pgsql9.6  
make world -j 8  
make install-world  

14 配置linux用户环境变量

digoal用户下,配置环境变量

su - digoal  
vi ~/.bash_profile  
  
追加  
  
export PS1="$USER@`/bin/hostname -s`-> "  
export PGPORT=1921  
export PGDATA=/home/digoal/pgdata  
export LANG=en_US.utf8  
export PGHOME=/home/digoal/pgsql9.6  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH  
export PATH=$PGHOME/bin:$PATH:.  
export DATE=`date +"%Y%m%d%H%M"`  
export MANPATH=$PGHOME/share/man:$MANPATH  
export PGHOST=$PGDATA  
export PGUSER=postgres  
export PGDATABASE=postgres  
alias rm='rm -i'  
alias ll='ls -lh'  
unalias vi  

重新登录digoal用户,配置生效

exit  
  
su - digoal  

15 初始化数据库集群

initdb -D $PGDATA -E UTF8 --locale=C -U postgres  

16 配置数据库

配置文件在$PGDATA目录中

1. 配置postgresql.conf

追加  
  
listen_addresses = '0.0.0.0'  
port = 1921  
max_connections = 200  
unix_socket_directories = '.'  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 10  
shared_buffers = 512MB  
dynamic_shared_memory_type = posix  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 1000  
bgwriter_lru_multiplier = 10.0  
bgwriter_flush_after = 0   
old_snapshot_threshold = -1  
backend_flush_after = 0   
wal_level = minimal  
synchronous_commit = off  
full_page_writes = on  
wal_buffers = 16MB  
wal_writer_delay = 10ms  
wal_writer_flush_after = 0   
checkpoint_timeout = 30min   
max_wal_size = 2GB  
min_wal_size = 128MB  
checkpoint_completion_target = 0.05    
checkpoint_flush_after = 0    
random_page_cost = 1.3   
log_destination = 'csvlog'  
logging_collector = on  
log_truncate_on_rotation = on  
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose  
autovacuum = on  
log_autovacuum_min_duration = 0  
autovacuum_naptime = 20s  
autovacuum_vacuum_scale_factor = 0.05  
autovacuum_freeze_max_age = 1500000000  
autovacuum_multixact_freeze_max_age = 1600000000  
autovacuum_vacuum_cost_delay = 0  
vacuum_freeze_table_age = 1400000000  
vacuum_multixact_freeze_table_age = 1500000000  
datestyle = 'iso, mdy'  
timezone = 'PRC'  
lc_messages = 'C'  
lc_monetary = 'C'  
lc_numeric = 'C'  
lc_time = 'C'  
default_text_search_config = 'pg_catalog.english'  
shared_preload_libraries='pg_stat_statements'  

2. 配置pg_hba.conf

追加  
  
host all all 0.0.0.0/0 md5  

17 启动数据库集群

su - digoal  
  
pg_ctl start  

18 连接数据库

su - digoal  
  
psql  
psql (9.6.2)  
Type "help" for help.  
  
postgres=#   

19 安装pgadmin(可选)

在windows 机器上,安装pgadmin

https://www.pgadmin.org/download/windows4.php

20 配置pgadmin(可选)

参考章节1

21 使用pgadmin连接数据库(可选)

参考章节1

三、云数据库RDS for PostgreSQL

1 购买云数据库

https://www.aliyun.com/product/rds/postgresql

2 设置并记住RDS for PostgreSQL数据库根用户名和密码

在RDS 控制台操作。

3 配置网络

在RDS 控制台操作,配置连接数据库的URL和端口。

4 配置白名单

在RDS 控制台操作,配置来源IP的白名单,如果来源IP为动态IP,白名单设置为0.0.0.0。

(数据库开放公网连接有风险,请谨慎设置,本文仅为测试环境。)

5 本地安装pgadmin(可选)

在windows 机器上,安装pgadmin

https://www.pgadmin.org/download/windows4.php

6 本地配置pgadmin(可选)

参考章节1

7 使用pgadmin连接RDS PostgreSQL数据库(可选)

参考章节1

mongodb 数据库操作–备份 还原 导出 导入

mongodb数据备份和还原主要分为二种,一种是针对于库的mongodump和mongorestore,一种是针对库中表的mongoexport和mongoimport。

一,mongodump备份数据库

1,常用命令格

mongodump -h IP --port 端口 -u 用户名 -p 密码 -d 数据库 -o 文件存在路径 

如果没有用户谁,可以去掉-u和-p。
如果导出本机的数据库,可以去掉-h。
如果是默认端口,可以去掉–port。
如果想导出所有数据库,可以去掉-d。

2,导出所有数据库

[root@localhost mongodb]# mongodump -h 127.0.0.1 -o /home/zhangy/mongodb/ 
connected to: 127.0.0.1 
Tue Dec 3 06:15:55.448 all dbs 
Tue Dec 3 06:15:55.449 DATABASE: test   to   /home/zhangy/mongodb/test 
Tue Dec 3 06:15:55.449   test.system.indexes to /home/zhangy/mongodb/test/system.indexes.bson 
Tue Dec 3 06:15:55.450     1 objects 
Tue Dec 3 06:15:55.450   test.posts to /home/zhangy/mongodb/test/posts.bson 
Tue Dec 3 06:15:55.480     0 objects 
 
。。。。。。。。。。。。。。。。。。。。省略。。。。。。。。。。。。。。。。。。。。。。。。。。 

3,导出指定数据库

[root@localhost mongodb]# mongodump -h 192.168.1.108 -d tank -o /home/zhangy/mongodb/ 
connected to: 192.168.1.108 
Tue Dec 3 06:11:41.618 DATABASE: tank   to   /home/zhangy/mongodb/tank 
Tue Dec 3 06:11:41.623   tank.system.indexes to /home/zhangy/mongodb/tank/system.indexes.bson 
Tue Dec 3 06:11:41.623     2 objects 
Tue Dec 3 06:11:41.623   tank.contact to /home/zhangy/mongodb/tank/contact.bson 
Tue Dec 3 06:11:41.669     2 objects 
Tue Dec 3 06:11:41.670   Metadata for tank.contact to /home/zhangy/mongodb/tank/contact.metadata.json 
Tue Dec 3 06:11:41.670   tank.users to /home/zhangy/mongodb/tank/users.bson 
Tue Dec 3 06:11:41.685     2 objects 
Tue Dec 3 06:11:41.685   Metadata for tank.users to /home/zhangy/mongodb/tank/users.metadata.json 

三,mongorestore还原数据库

1,常用命令格式

mongorestore -h IP --port 端口 -u 用户名 -p 密码 -d 数据库 --drop 文件存在路径

–drop的意思是,先删除所有的记录,然后恢复。

2,恢复所有数据库到mongodb中

[root@localhost mongodb]# mongorestore /home/zhangy/mongodb/  #这里的路径是所有库的备份路径

3,还原指定的数据库

[root@localhost mongodb]# mongorestore -d tank /home/zhangy/mongodb/tank/  #tank这个数据库的备份路径 
 
[root@localhost mongodb]# mongorestore -d tank_new /home/zhangy/mongodb/tank/  #将tank还有tank_new数据库中

这二个命令,可以实现数据库的备份与还原,文件格式是json和bson的。无法指写到表备份或者还原。

四,mongoexport导出表,或者表中部分字段

1,常用命令格式

mongoexport -h IP --port 端口 -u 用户名 -p 密码 -d 数据库 -c 表名 -f 字段 -q 条件导出 --csv -o 文件名 

上面的参数好理解,重点说一下:
-f    导出指字段,以字号分割,-f name,email,age导出name,email,age这三个字段
-q    可以根查询条件导出,-q ‘{ “uid” : “100” }’ 导出uid为100的数据
–csv 表示导出的文件格式为csv的,这个比较有用,因为大部分的关系型数据库都是支持csv,在这里有共同点

2,导出整张表

[root@localhost mongodb]# mongoexport -d tank -c users -o /home/zhangy/mongodb/tank/users.dat 
connected to: 127.0.0.1 
exported 4 records 

3,导出表中部分字段

[root@localhost mongodb]# mongoexport -d tank -c users --csv -f uid,name,sex -o tank/users.csv 
connected to: 127.0.0.1 
exported 4 records 

4,根据条件敢出数据

[root@localhost mongodb]# mongoexport -d tank -c users -q '{uid:{$gt:1}}' -o tank/users.json 
connected to: 127.0.0.1 
exported 3 records 

五,mongoimport导入表,或者表中部分字段

1,常用命令格式

1.1,还原整表导出的非csv文件
mongoimport -h IP –port 端口 -u 用户名 -p 密码 -d 数据库 -c 表名 –upsert –drop 文件名
重点说一下–upsert,其他参数上面的命令已有提到,–upsert 插入或者更新现有数据
1.2,还原部分字段的导出文件
mongoimport -h IP –port 端口 -u 用户名 -p 密码 -d 数据库 -c 表名 –upsertFields 字段 –drop 文件名
–upsertFields根–upsert一样
1.3,还原导出的csv文件
mongoimport -h IP –port 端口 -u 用户名 -p 密码 -d 数据库 -c 表名 –type 类型 –headerline –upsert –drop 文件名
上面三种情况,还可以有其他排列组合的。

2,还原导出的表数据

[root@localhost mongodb]# mongoimport -d tank -c users --upsert tank/users.dat 
connected to: 127.0.0.1 
Tue Dec 3 08:26:52.852 imported 4 objects

3,部分字段的表数据导入

[root@localhost mongodb]# mongoimport -d tank -c users  –upsertFields uid,name,sex  tank/users.dat
connected to: 127.0.0.1
Tue Dec  3 08:31:15.179 imported 4 objects

4,还原csv文件

[root@localhost mongodb]# mongoimport -d tank -c users --type csv --headerline --file tank/users.csv 
connected to: 127.0.0.1 
Tue Dec 3 08:37:21.961 imported 4 objects 

总体感觉,mongodb的备份与还原,还是挺强大的,虽然有点麻烦。

开源系统管理资源大合辑

Automation build.

自动化构建

  • Apache Ant – 用 Java 编写的自动化构建工具,与 make 类似
  • Apache Maven – 主要为 Java 开发的自动化构建工具
  • Bazel – Google 的分布式构建系统
  • GNU Make – 最流行的自动化构建系统
  • Gradle – 另一个自动化构建系统

Backup software.

备份软件

  • Amanda – C/S 模式的备份软件
  • Attic – 用 Python 编写的去重备份程序
  • Bareos – Bacula 备份程序的衍生版本
  • Backupninja – 轻量级、可扩展的元数据备份
  • Brebis – 全自动的备份检查
  • Burp – 网络备份和还原程序
  • Duplicity – 使用 rsync 算法加密的带宽-效率备份软件
  • Elkarbackup – 基于 RSnapshot 的、带有简单 Web 交互接口的备份解决方案
  • Lsyncd – 对文件进行监控,并开启一个进程来同步更改(默认是用 rsync)
  • Obnam – 一个简便、安全、基于快照、带有数据备份程序
  • Rdiff-backup – 远程增量备份工具
  • Rsnapshot – 文件系统快照辅助工具
  • Snebu – 带有多客户端去重和透明压缩的快照备份程序
  • UrBackup – 另一个 C/S 备份系统
  • DREBS – 官方策略支持的 AWS EBS 备份脚本
  • ZBackup – 一个通用去重备份工具

Build and software organization tools

构建和软件安排

  • EasyBuild – EasyBuild builds software and modulefiles for High Performance Computing (HPC) systems in an efficient way.
  • environment-modules Lmod – Lmod is a Lua based module system that easily handles the MODULEPATH Hierarchical problem.
  • HPCBIOS – HPCBIOS is an effort to setup a common, well-documented and reproducible, environment spanning across multiple HPC systems & sites, inclusive of documentation.

ChatOps

运维机器人

对话驱动的运维和管理。请看 reddit 查看更多信息
– CloudBot – Python 编写的简单、快速、可扩展的 IRC 机器人
– Eggdrop – the world’s most popular IRC bot, designed for flexibility and ease of use, and is freely distributable under the GNU GPL.
– Err – a plugin based chatbot designed to be easily deployable, extensible and maintainable.
– Hubot – 可定制的、生活改良型机器人
– Lazlo – 用 Go 编写的运维机器人自动化框架
– Lita – 你公司的聊天室的机器人同伴

Client management

客户端管理

  • OCS Inventory NG – 资产管理、部署和网络扫描
  • Opsi (开放式 PC 服务器集合) – 运行于 Debian 专为 Windows 客户端开发的客户端管理软件
  • WAPT – 全网范围的 Windows 软件安装、卸载、配置和升级
  • WPKG – Windows 程序的部署、升级和移除

Cloning

克隆软件

  • Clonezilla – 硬盘分区、硬盘镜像/克隆程序
  • Fog – 另一个计算机克隆问题解决方案

Cloud Computing

云计算

  • AppScale – 兼容 GAE 的云计算软件
  • Archipel – Manage and supervise virtual machines using Libvirt.
  • CloudStack – Cloud computing software for creating, managing, and deploying infrastructure cloud services.
  • Cobbler – Cobbler is a Linux installation server that allows for rapid setup of network installation environments.
  • Cracow Cloud One – Polish Private Cloud – The CC1 system provides a complete solution for Private Cloud Computing.
  • Eucalyptus – Private cloud software with AWS compatibility.
  • Flynn – PaaS
  • Mesos – Develop and run resource-efficient distributed systems.
  • OpenNebula – User-driven cloud management platform for sysadmins and devops.
  • Openshift – PaaS product from Red Hat.
  • OpenStack – 构建你的私有或共有云
  • The Foreman – 面向物理和虚拟服务器的全生命周期管理工具
  • Tsuru – Tsuru is an extensible Platform as a Service software.

Cloud Orchestration

云业务流程

  • BOSH – IaaS orchestration platform originally written for deploying and managing Cloud Foundry PaaS, but also useful for general purpose distributed systems.
  • Cloudify – 使用 Python 和 YAML 编写,基于 TOSCA 的云业务流程软件平台
  • CloudSlang – 面向管理开发应用程序、基于流的业务流程管理工具,支持 Docker
  • Juju – Cloud orchestration tool which manages services as charms, YAML configuration and deployment script bundles.
  • MCollective – 用于管理服务器的业务流程的 Ruby 框架,由 Puppet 实验室开发
  • Overcast – Deploy VMs across different cloud providers, and run commands and scripts across any or all of them in parallel via SSH.
  • Rundeck – 简单的业务流程工具
  • Salt – 用 Python/ZeroMQ 编写的快速、可扩展、灵活的系统管理软件
  • StackStorm – Event Driven Operations and ChatOps platform for infrastructure management. Written in Python

Cloud Storage

云存储

  • git-annex assistant – 你全部设备的同步文件夹(包括你的 OSX 、 Linux 、安卓设备 、可移动设备 、NAS 、 NAS 应用、 云服务)
  • ownCloud – 通过 Web 端、电脑以及移动设备来提供对你文件和数据的通用访问
  • Pydio – Pydio (formerly AjaXplorer) is a mature solution for file sharing and synchronization.
  • Seafile – 另一个云存储解决方案
  • SparkleShare – 提供云存储和文件同步服务,默认使用 Git 作为存储后端
  • Swift – 高可用、分布式、最终一致的对象/二进制大对象存储
  • Syncthing – 私人、加密、带有身份验证的分布式数据系统

Code Review

代码评审

  • Gerrit – 基于 Git 的版本控制,可以帮助软件开发者通过接受或拒绝源代码的改动来评审、改进源代码
  • Phabricator – 由 Facebook 开发的代码评审工具,被 WikiMedia 、Facebook 、Dropbox 等公司使用。附带了一个集成的 Wiki 、Bug 跟踪、 VC 集成、 和一个被叫做“奥术师”的 CLI 工具。
  • Review Board – MIT 许可证下的免费软件

Collaborative Software

协作软件

  • Citadel/UX – Collaboration suite (messaging and groupware) that is descended from the Citadel family of programs.
  • EGroupware – 用 PHP 编写的协作软件
  • Horde Groupware – 基于 PHP 的协作软件套件,包括电子邮件、日历、Wiki、进度跟踪和文件管理
  • Kolab – 另一个协作套件
  • SOGo – 专注于简单和弹性的协作软件服务器
  • Zimbra – 包含电子邮件服务器和 Web 客户端的协作软件套件

Configuration Management Database

配置管理数据库软件

  • i-doit – IT 文档和配置数据库管理
  • iTop – Complete ITIL web based service management tool.
  • Ralph – Asset management, DCIM and CMDB system for large Data Centers as well as smaller LAN networks.
  • Clusto – Helps you keep track of your inventory, where it is, how it’s connected, and provides an abstracted interface for interacting with the elements of the infrastructure.
  • Collins – At Tumblr, it’s the infrastructure source of truth and knowledge.

Configuration Management

配置管理工具

  • Ansible – 用 Python 编写的,使用 SSH 来管理节点
  • CFEngine – 轻量级代理系统,配置情况通过声明式语言指定
  • Chef – 使用 Ruby 和 Erlang 编写的,使用纯 Ruby 的 DSL
  • Pallet – 通过 Clojure DSL 来进行基础设施的定义、配置和管理
  • Puppet – 用 Ruby 编写的,使用 Puppet 的声明式语言或者 Ruby 的 DSL
  • Salt – 用 Python 编写的配置管理工具
  • Slaughter – 用 Perl 编写的配置管理工具

Continuous Integration & Continuous Deployment

持续集成和持续部署

  • Buildbot – 基于 Python 的持续集成套件
  • Drone – 基于 Docker 构建的持续集成服务器,使用 YAML 文件进行配置
  • GitLab CI – Based off of ruby. They also provide GitLab, which manages git repositories.
  • Go – 持续交付服务器
  • Jenkins – 一个可扩展的持续集成服务器

    Control Panels

控制面板

◦Froxlor – 使用 Nginx 和 PHP-FPM 开发的面向 Linux 的使用面板
◦ISPConfig – Linux 主机控制面板
◦Sentora – Control panel for Linux, BSD, and Windows based on ZPanel.
◦VestaCP – 使用 Nginx 的面向 Linux 的主机控制面板

  • DNS

域名解析

◦Atomia DNS – DNS 管理系统
◦PDNS Gui – 拥有管理域名的 Web 界面,并将记录通过 PowerDNS 记录到 MySQL 中
◦Poweradmin – 基于 PowerDNS 服务器友好的 Web 管理工具

  • Revision Control

版本控制

◦iF.SVNAdmin – 通过 Web 界面来管理资料库和用户/组的权限
◦SCM-Manager – 用最简单的办法来管理你的 Git、Mercurial 和资料库
◦WebSVN – 开源的 Web 资料库浏览器

  • Virtualization

虚拟化

◦Feathur – VPS 资源调配和管理软件
◦Panamax – Project that makes deploying complex containerized apps as easy as Drag-and-Drop.
◦OpenVZ Web Panel – 控制你的 OpenVZ 服务器的 Web 面板
◦Virtkick – 简易管理虚拟机或 Docker 容器的控制器
◦WebVirtMgr – 基于 libvirt 的管理虚拟机 Web 接口 machines.

  • Server

服务器

◦Ajenti – Linux 和 BSD 的控制面板
◦Cockpit – 用 C 编写的,针对 Linux 服务器的, 多服务器 Web 管理接口
◦Virtualmin – 基于 webmin 的 Linux 系统控制面板
◦Webmin – Linux 服务器控制面板

•Deployment Automation

自动部署

  • Capistrano – Deploy your application to any number of machines simultaneously, in sequence or as a rolling set via SSH (rake based).
  • Fabric – Python 库和命令行工具,为了简化应用部署时 SSH 的使用、系统管理任务等
  • Mina – 基于 rake 的快速部署和服务器自动化工具
  • Rocketeer – PHP 任务运行和部署工具
  • Vlad the Deployer – 基于 rake 的自动化部署工具

Distributed Filesystems

分布式存储系统

  • Ceph – 分布式对象存储和文件系统
  • DRBD – 分布式复制块设备
  • LeoFS – 非结构化对象/数据存储,高可用、分布式、一致的存储系统
  • GlusterFS – Scale-out network-attached storage file system.
  • HDFS – 用 Java 编写的,面向 hadoop 框架的分布式、可伸缩、便携式文件系统
  • Lustre – Parallel distributed file system, generally used for large-scale cluster computing.
  • MooseFS – 容错的网络分布式文件系统
  • MogileFS – 应用层的网络分布式文件系统
  • OpenAFS – Distributed network file system with read-only replicas and multi-OS support.
  • TahoeLAFS – 安全、分散、容错、点对点分布式数据存储和分布式文件系统
  • XtreemFS – XtreemFS is a fault-tolerant distributed file system for all storage needs.

DNS

域名解析系统

  • Bind – 最广泛使用的域名服务器软件
  • djbdns – DNS 应用集合,包括 tinydns
  • Designate – DNS REST API that support several DNS servers as its backend.
  • dnsmasq – 一个轻量级的为小型网络提供 DNS \ DHCP \ TFTP 服务的服务
  • Knot – 高性能授权的 DNS 服务器
  • NSD – 仅权威的、高效、简单的域名服务器
  • PowerDNS – 带有后端海量数据存储和负载均衡的DNS 服务器
  • Unbound – 验证、递归和缓存 DNS 解析程序
  • Yadifa – 使用 DNSSEC 提供的 .eu 顶级域名的轻量级权威域名服务器

Editors

开源代码编辑器

  • Atom – Github 开发的一个可编程文本编辑器
  • Brackets – 面向前段工程师和 Web 设计师的代码编辑器
  • Eclipse – 用 Java 编写的可扩展插件的 IDE
  • Geany – GTK2 文本编辑器
  • GNU Emacs – 可扩展的、可自定义的文本编辑器
  • Haroopad – 带有实时预览的 Markdown 编辑器
  • ICEcoder – 彪悍的代码编辑器,用来架构常见的 Web 语言
  • jotgit – 基于 Git 的实时协作代码编辑
  • KDevelop – IDE by the people behind KDE.
  • Light Table – 下一代编辑器
  • Lime – 旨在提供一个对标 Sublime Text 且开放源代码的解决方案
  • TextMate – OS X 上的图形文本编辑器.
  • Vim – 目的是高效编辑的高度可配置的文本编辑器

Identity Management

身份管理

LDAP
– 389 Directory Server – 由 Red Hat 开发的
– Apache Directory Server – 用 Java 编写的, Apache 软件基金会项目
– OpenDJ – 是 OpenDS 令一个实现版本
– OpenDS – 用 Java 写的另一个目录服务器
– OpenLDAP – 由 OpenLDAP 项目组开发

Tools and web interfaces

工具和 Web 接口

  • Fusion Directory – 改善基于 OpenLDAP 的公司目录和服务的管理
  • FreeIPA – 安全管理解决方案,可以管理 LDAP、KRB、DNS、sudo 等等
  • LDAP Account Manager (LAM) – 存储在 LDAP 的 Web 前端管理条目(包括用户、组、DHCP 设置等)
  • Samba – 活动目录和 CIFS 协议控制

IT Asset Management

IT 资产管理软件

  • GLPI – 带有额外管理接口的信息资源管理
  • OCS Inventory NG – 管理 IT 资产清单
  • RackTables – Datacenter and server room asset management like document hardware assets, network addresses, space in racks, networks configuration.
  • Ralph – 面向大型数据中心和小型局域网的资产管理、DCIM 、CMDB 系统
  • Snipe IT – 资产和许可证管理软件

Log Management

日志管理工具(包括收集、分析、可视化)

  • Elasticsearch – 基于 Lucene 的文档存储,主要用于日志索引、存储和分析
  • Fluentd – 日志收集和传送
  • Flume – 分布式日志收集和聚合系统
  • Graylog2 – 带有报警选项的插件化日志、事件分析服务器
  • Heka – 用于日志聚合的流处理系统
  • Kibana – 日志和时间戳数据的可视化
  • Logstash – 用于管理事件和日志的工具
  • Octopussy – 日志管理解决方案(可视化、报警、报告)

Mail Clients

邮件客户端

  • Claws Mail – Old school email client (and news reader), based on GTK+.
  • Mutt – 强大的基于文本的邮件客户端
  • Thunderbird – 易于设置和自定义的免费邮件应用

Webmail

Web 邮件应用

  • Roundcube – 带有用户接口的基于浏览器的 IMAP 客户端
  • SquirrelMail – 另一个基于浏览器的 IMAP 客户端
  • Horde – Web 电子邮件和群组客户端
  • Rainloop – 支持 IMAP / SMTP 和多账户的 Web 电子邮件

Mail Servers

邮件服务器

MDA (IMAP/POP3)
Mail Delivery Agents (IMAP/POP3 software).
– Courier IMAP/POP3 – 快速、弹性、企业级 IMAP 和 POP3 服务器
– Cyrus IMAP/POP3 – Intended to be run on sealed servers, where normal users are not permitted to log in.
– Dovecot – IMAP and POP3 server written primarily with security in mind.

MTA (SMTP)
Mail Transfer Agents (SMTP servers).
– Exim – Message transfer agent (MTA) developed at the University of Cambridge.
– Haraka – A high-performance, pluginable SMTP server written in JavaScript.
– MailCatcher – Ruby gem that deploys a simply SMTP MTA gateway that accepts all mail and displays in web interface. Useful for debugging or development.
– Maildrop – Disposable email SMTP server, also useful for development.
– OpenSMTPD – Secure SMTP server implementation from the OpenBSD project.
– Postfix – Fast, easy to administer, and secure Sendmail replacement.
– Qmail – Secure Sendmail replacement.
– Sendmail – Message transfer agent (MTA).

complete solutions

Software for simple deployment of a mail server, e.g. for inexperienced or impatient admins.
– hMailServer – Open source e-mail server for Microsoft Windows
– Mail-in-a-Box – Take back control of your email with this easy-to-deploy mail server in a box.
– iRedMail – Full-featured mail server solution based on Postfix and Dovecot.
– Citadel – Feature packed, easy, versatile, and powerful mail server, thanks to exclusive “rooms” based architecture.
– Modoboa – Modoboa is a mail hosting and management platform including a modern and simplified Web User Interface.
– Fufix – Fufix is a mailserver installer based on Dovecot, Postfix, Postfixadmin, Nginx, PHP, MySQL and Fail2ban.

Monitoring

监控软件

  • Alerta – 分布式、可扩展、灵活的监控系统
  • Cacti – 带有制图工具的基于 Web 的网络监控
  • Cabot – 带有监视和报警功能,与 PagerDuty 类似
  • check_mk – Nagios 的扩展
  • Dash – 对 GNU/Linux 机器的低开销的监控 Web 仪表盘
  • Flapjack – 监控通知事件处理系统
  • Icinga – Nagios 的另一个分支版本
  • LibreNMS – 全功能网络监视系统,提供了丰富的功能和设备支持
  • Monit – 使用小程序,用于管理、监控 Unix 系统
  • Munin – 网络资源监控工具
  • Naemon – Network monitoring tool based on the基于 Nagios 4 的网络监控工具,带有核心性能和新功能改进
  • Nagios – 计算机系统、网络和基础设施监控软件
  • Node-Bell – Real-time anomalies detection for periodic time series, metrics monitor.
  • Observium – 通过 SNMP 监控服务器和网络设备,运行在 Linux 上
  • OMD – 分布式监控
  • PhpSysInfo – 可定制的 PHP 脚本,更好地显示相关系统信息
  • Riemann – 复杂、快速的事项处理
  • Sensu – 监控框架
  • Sentry – 应用程序监视、事件日志和聚合
  • ServerStatus BotoX – 监视并展示你的服务器统计信息
  • ServerStatus moejda – 服务器状态脚本,展示正常运行时间、空闲 RAM、空闲硬盘空间
  • Shinken – 另一个监控框架
  • Thruk – Multibackend monitoring web interface with support for Naemon, Nagios, Icinga and Shinken.
  • Xymon – 受到 Big Brother 启发的网络监控
  • Zabbix – 为监控网络和应用开发的企业级软件
  • Zenoss – 基于 Zope 的应用程序、服务器和网络管理平台

Metric & Metric Collection

度量收集和展示软件

  • Collectd – 守护进程的系统统计信息的收集
  • Collectl – 高精度系统性能指标收集工具
  • Dashing – Ruby gem that allows for rapid statistical dashboard development. An all HTML5 approach allows for big screen displays in data centers or conference rooms.
  • Diamond – 基于 Python 的统计信息收集守护程序
  • Facette – Time series data visualization and graphing software written in Go.
  • Freeboard – 一个前端实时控制面板,原生的 JSON 串转换成 UI
  • Ganglia – High performance, scalable RRD based monitoring for grids and/or clusters of servers. Compatible with Graphite using a single collection process.
  • Grafana – Graphite 和 InfluxDB 的仪表盘和图形编辑
  • Graphite – 弹性的图形显示服务器
  • InfluxDB – 不需要外部依赖的分布式时间序列数据库
  • KairosDB – Fast distributed scalable time series database, fork of OpenTSDB 1.x.
  • OpenTSDB – 没有粒度损失的服务器时间序列数据存储收集
  • Packetbeat – 捕获网络流量并显示到 Kibana 的仪表盘上
  • Prometheus – 服务监控系统和时间序列数据库
  • RRDtool – 应用于时间序列数据的行业标准、高性能数据日志和绘图系统
  • Statsd – 应用统计监听程序

Network Configuration Management

网络配置管理工具

  • GestióIP – 自动化的基于 Web 的 IPv4/IPv6 地址管理工具
  • Oxidized – 用 Web 接口和 Git 存储的网络设备配置监控模块
  • RANCID – 监视网络设备的配置和维护变换的历史
  • rConfig – 另一个网络设备配置管理工具
  • trigger – 用 Python 编写的强大网络自动化工具集

Newsletters

通讯软件

  • DadaMail – 用 Perl 编写的邮件列表管理
  • phpList – 用 PHP 编写的通信管理
  • LibreMailer – Libre Mailer is a modest and simple web based email marketing application.
  • Lewsnetter – 电子邮件营销管理程序(通过 SES 创建和发送电子邮件)包括订阅管理、交付、投诉通知、模版和数据.

NoSQL

NoSQL 数据库

  • Column-Family
    ◦Apache HBase – Hadoop 数据库,分布式的大数据存储
    ◦Cassandra – 设计为处理跨多台服务器的海量数据的分布式数据库管理系统
    ◦Hypertable – C++ based BigTable-like DBMS, communicates through Thrift and runs either as stand-alone or on distributed FS such as Hadoop.
  • Document Store (面向海量数据访问)
    ◦CouchDB – 易用的、多主机复制的文档型数据库
    ◦ElasticSearch – 基于 Java 的数据库,因为日志聚合以及电子邮件归档而流行
    ◦MongoDB – 另一个文档型数据库
    ◦RavenDB – 用于 ACID/事务性功能基于文档的数据库
    ◦RethinkDB – 分布式文档存储数据库,聚焦与 JSON
  • Graph
    ◦FlockDB – Twitter 的分布式容错图数据库
    ◦Neo4j – 图型数据库
  • Key-Value (面向高性能并发读写)
    ◦LevelDB – Google 的高性能键值对数据库
    ◦Redis – 网络上、内存中、键值对存储,持久性数据库
    ◦Riak – 另一个容错的键值对 NoSQL 数据库

Packaging

打包工具

  • fpm – Versatile multi format package creator.
  • omnibus-ruby – 依赖 Ruby 的全栈、跨平台的打包软件
  • packman – 依赖 Python 的全栈、跨平台的打包软件
  • tito – 为基于 Git 的项目构建 RPM 包

Project Management

项目管理和 Bug 跟踪

  • CaseBox – 在一个系统中管理你组织的全部信息
  • ChiliProject – Redmine 的另一个分支版本
  • GitBucket – 用 Scala 编写的 Github 的克隆
  • GitLab – 用 Ruby 编写的 Github 的克隆
  • Gogs – 用 Go 编写的自托管 Git 服务
  • OpenProject – 项目协同
  • Phabricator 用 PHP 编写
  • Redmine – 用 Ruby 编写的跑在 rails 上
  • Taiga – 基于 Kanban 和 Scrum 的敏捷项目管理工具
  • The Bug Genie – 用 PHP 编写
  • Trac – 用 Python 编写

Queuing

消息队列

  • ActiveMQ – Java 消息代理
  • BeanstalkD – 简单、快速的工作队列
  • Gearman – 快速、多语言队列/任务处理平台
  • Kafka – 性能极高的发布/订阅消息系统
  • NSQ – 实时分布式消息平台
  • RabbitMQ – 功能齐全、跨各发行版的队列系统
  • ZeroMQ – 轻量级队列系统

RDBMS

关系型数据库管理系统

  • Firebird – 通用数据库
  • Galera – Galera Cluster for MySQL is an easy-to-use high-availability solution with high system up-time, no data loss, and scalability for future growth.
  • MariaDB – 社区驱动的 MySQL 分支版本
  • Percona Server – Enhanced, drop-in MySQL replacement.
  • PostgreSQL – 对象关系型数据库管理系统
  • PostgreSQL-XL – 可伸缩、基于 PostgreSQL 的数据库集群
  • SQLite – 实现了独立的、无服务器、零配置、事务性 SQL 数据库

Security

安全工具

  • Blackbox – 在 Git 中安全存储机密信息,提供工具来自动加密机密信息(例如密码)
  • Bro – 网络分析和安全监控的强大框架
  • Denyhosts – 守卫 SSH 免受字典和暴力破解攻击
  • Fail2Ban – 扫描日志文件并显示出恶意行为的 IP 地址
  • fwknop – 通过单个数据包授权来保护你防火墙的端口
  • Glastopf – 用于模拟漏洞和共济数据的收集的低交互蜜罐
  • Kippo – 中交互的 SSH 蜜罐,主要是带有可配置文件系统沙盒的独立 SSH 守护进程
  • Linux Malware Detect – 为了解决共享主机环境所面临风险的 Linux 恶意软件扫描器
  • OSSEC – 一个可执行日至分析、FIM、Rootkit 检测等的 HIDS
  • OSQuery – 通过类似 SQL 的用户接口来查询你服务器的状态和相关信息
  • pfSense – FreeBSD 的防火墙和路由器分支
  • Snort – 网络入侵防御系统(NIPS)和网络入侵检测系统(NIDS)
  • SpamAssassin – 使用各种检测技术的垃圾电子邮件过滤器

Service Discovery

服务发现

  • Consul – 服务发现、监控和配置的工具
  • Doozerd – Doozer is a highly-available, completely consistent store for small amounts of extremely important data.
  • etcd – distributed K/V-Store, authenticating via SSL PKI and a REST HTTP Api for shared configuration and service discovery.
  • ZooKeeper – 一个集中维护配置信息服务、命名、提供分布式同步并提供服务的系统

Software Containers

软件容器

  • Docker – 为开发人员和系统管理员建立的分布式应用程序
  • LXC – Userspace interface for the Linux kernel containment features.
  • OpenVZ – Container-based virtualization for Linux.

SSH

SSH 工具

  • Advanced SSH config – 全透明地增强 ssh_config 文件能力
  • autossh – 在网络中断后自动重连的 ssh 会话
  • Cluster SSH – 通过单一的图形化控制台控制一系列窗口
  • DSH – 分布式 Shell, 通过一个命令终端在多个远程 Shell 中执行命令
  • Mosh – 手机 Shell
  • parallel-ssh – 提供 OpenSSH 的并行版本和相关工具
  • ssh-cert-authority – 一个 SSH 证书颁发工具
  • ssh-ca – 推送用户密钥到服务器,从而允许 SSH 对服务器的访问权限
  • SSH Power Tool – 同时使用预共享的密钥来执行命令、上传文件到多个服务器
  • sshrc – sources ~/.sshrc on your local computer after logging in remotely.
  • stormssh – 一个用来管理 SSH 连接的命令行工具

Statistics

统计分析软件

  • AWStats – 生成 Web 、流、FTP、邮件服务器的统计图
  • GoAccess – 实时 Web 日志分析和交互查看器,在终端中运行
  • Open Web Analytics – 使用 JS、 PHP or REST APIs 向网站中添加 Web 网站分析
  • Piwik – Web 分析应用
  • Webalizer – 快速 Web 服务器日志文件分析

Status Pages

状态页

  • Cachet – 用 PHP 编写的状态页系统
  • Stashboard – 为云服务和 API 开发的状态页
  • System Status Dashboard (SSD) – 概览组织的基础设施健康状况
  • Staytus – 一个完整的发布信息的解决方案,关于你的 Web 应用、网络或者服务的最新信息

Ticketing systems

任务跟踪系统

  • Bugzilla – 通用的 Bug 跟踪和测试工具,最初由 Mozilla 项目组开发
  • Cerb – 小组的电子邮件管理项目
  • Flyspray – 用 PHP 编写的,基于 Web 的 Bug 跟踪系统
  • MantisBT – 基于 Web 的 Bug 跟踪系统
  • osTicket – Simple support ticket system.
  • OTRS – Trouble ticket system for assigning tickets to incoming queries and tracking further communications.
  • Request Tracker – 用 Perl 编写的任务跟踪系统
  • TheBugGenie – 可扩展的用户权限的跟踪系统

Troubleshooting

故障排除工具

  • grml – 带有强大 CLI 工具的 Debian 启动盘
  • mitmproxy – 用于拦截、查看、修改网络流量的 Python 工具,在确定的问题的故障排除时是神器
  • Sysdig – 在 Linux 实例中捕捉系统状态和活动,保存、过滤并分析数据
  • Wireshark – 世界上最好的网络协议分析工具

Version control

版本控制

  • Fossil – 内置 Wiki 和 Bug 跟踪的分布式版本控制
  • Git – 以速度为重点的分布式版本控制和源码管理
  • GNU Bazaar – 由 Canonical 发起的分布式版本控制系统
  • Mercurial – 另一个分布式版本控制系统
  • Subversion – C/S 架构的版本控制系统

Virtualization

虚拟化软件

  • Archipel – 基于 XMPP 的虚拟化管理平台
  • ConVirt – 为集中管理您的 KVM 和 Xen 虚拟化环境提供核心功能
  • Ganeti – Cluster virtual server management software tool built on top of KVM and Xen.
  • KVM – Linux 内核虚拟机
  • OpenNebula – 灵活的企业云
  • oVirt – 管理虚拟机,存储和虚拟网络
  • Packer – A tool for creating identical machine images for multiple platforms from a single source configuration.
  • Proxmox VE – 虚拟化管理解决方案
  • QEMU – QEMU is a generic machine emulator and virtualizer.
  • Vagrant – 构建完整开发环境的工具
  • VirtualBox – 甲骨文公司的虚拟化产品
  • Xen – Virtual machine monitor for 32/64 bit Intel / AMD (IA 64) and PowerPC 970 architectures.

VPN

VPN 软件

  • OpenVPN – 使用自定义的安全协议,使用 SSL/TLS 进行密钥交换
  • Pritunl – OpenVPN based solution. Easy to set up.
  • SoftEther – 具有高级功能的多协议 VPN 软件
  • sshuttle – Poor man’s VPN.
  • strongSwan – Complete IPsec implementation for Linux.
  • tinc – 分布式 P2P VPN

XMPP

XMPP 服务器

  • ejabberd – 用 Erlang/OTP 编写的 XMPP 即时消息服务器
  • Metronome IM – Prosody 的分支版本
  • MongooseIM – ejabberd 的分支版本
  • Openfire – 实时协作(RTC)服务器
  • Prosody IM – 用 Lua 编写的 XMPP 服务器
  • Tigase – 用 Java 实现的 XMPP 服务器

XMPP Web Clients

XMPP Web 客户端

  • Candy – 使用 Javascript 编写的多用户 XMPP 客户端
  • Kaiwa – 现代风格基于 Web 的聊天客户端
  • Lets-Chat – 用 Node 编写的自托管的聊天套件

Web

Web 服务器

  • Apache – 最流行的 Web 服务器
  • Cherokee – 轻量级、高性能 Web 服务器/反向代理服务器
  • Lighttpd – 高速环境下最优的 Web 服务器
  • Nginx – 反向代理、负载均衡、HTTP 缓存和 Web 服务器
  • uWSGI – The uWSGI project aims at developing a full stack for building hosting services.

Web Performance

Web 性能

  • HAProxy – 基于软件的负载均衡,使用 SSL 减轻负载,进行性能优化、压缩和通用 Web 路由
  • Varnish – 关注优化缓存和压缩的基于 HTTP 的 Web 应用程序加速器

Wiki Software

Wiki 软件

  • DokuWiki – 简易、高度灵活的 Wiki,不需要数据库
  • Gollum – 带有 API 接口、本地前端,简易、Git 驱动的 Wiki
  • ikiwiki – 一个 Wiki 编译器
  • MDwiki – 完全基于 HTML5/Javascript 的 Wiki
  • Mediawiki – Used to power Wikipedia.
  • MoinMoin – 拥有很大用户群的、易于使用、可扩展的 Wiki 引擎
  • Ōlelo Wiki – 页面存储在 Git 仓库中的 Wiki
  • PmWiki – Wiki-based system for collaborative creation and maintenance of websites.
  • TiddlyWiki – 用 Javascript 编写的完整交互式 Wiki