strong – Page 54 – 飛奔嘅蝸牛

TestRegex regexp

package net.strong.util;

import org.apache.oro.text.regex.MatchResult;

import org.apache.oro.text.regex.Pattern;

import org.apache.oro.text.regex.PatternCompiler;

import org.apache.oro.text.regex.PatternMatcher;

import org.apache.oro.text.regex.PatternMatcherInput;

import org.apache.oro.text.regex.Perl5Compiler;

import org.apache.oro.text.regex.Perl5Matcher;

import org.apache.oro.text.regex.Perl5Substitution;

import org.apache.oro.text.regex.Util;

public class TestRegex {

public TestRegex() {}

public void parseLog() throws Exception {

String log1=“172.26.155.241 – – [26/Feb/2001:10:56:03 -0500] \”GET /IsAlive.htm HTTP/1.0\” 200 15 “;

String regexp=“(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s-\\s-\\s\\[([^\\]]+)\\]”;

PatternCompiler compiler=new Perl5Compiler();

Pattern pattern=compiler.compile(regexp);

PatternMatcher matcher=new Perl5Matcher();

if (matcher.contains(log1,pattern)) {

MatchResult result=matcher.getMatch();

System.out.println(“IP: “+result.group(1));

System.out.println(“Timestamp: “+result.group(2));

}

public void parseHTML() throws Exception {

String html=“<font face=\”Arial, Serif\” size=\”+1\”color=\”red\”>”;

String regexpForFontTag=“<\\s*font\\s+([^>]*)\\s*>”;

String regexpForFontAttrib=“([a-z]+)\\s*=\\s*\”([^\”]+)\””;

PatternCompiler compiler=new Perl5Compiler();

Pattern patternForFontTag=compiler.compile(regexpForFontTag,Perl5Compiler.CASE_INSENSITIVE_MASK);

Pattern patternForFontAttrib=compiler.compile(regexpForFontAttrib,Perl5Compiler.CASE_INSENSITIVE_MASK);

PatternMatcher matcher=new Perl5Matcher();

if (matcher.contains(html,patternForFontTag)) {

MatchResult result=matcher.getMatch();

String attrib=result.group(1);

PatternMatcherInput input=new PatternMatcherInput(attrib);

while (matcher.contains(input,patternForFontAttrib)) {

result=matcher.getMatch();

System.out.println(result.group(1)+“: “+result.group(2));

}

public void substitutelink() throws Exception {

String link=“<a href=\”http://widgets.acme.com/interface.html#How_To_Trade\”>”;

String regexpForLink=“<\\s*a\\s+href\\s*=\\s*\”http://widgets.acme.com/interface.html#([^\”]+)\”>”;

PatternCompiler compiler=new Perl5Compiler();

Pattern patternForLink=compiler.compile(regexpForLink,Perl5Compiler.CASE_INSENSITIVE_MASK);

PatternMatcher matcher=new Perl5Matcher();

String result=Util.substitute(matcher,

patternForLink,

new Perl5Substitution(“<a href=\”http://newserver.acme.com/interface.html#$1\”>”),

link,

Util.SUBSTITUTE_ALL);

System.out.println(result);

}

public static void main(String[] args) throws Exception {

TestRegex test=new TestRegex();

System.out.println(“\n\nLog Parsing Example”);

test.parseLog();

System.out.println(“\n\nHtml Example 1”);

test.parseHTML();

System.out.println(“\n\nHtml Example 2”);

test.substitutelink();

}

PERL5 Regular Expression Description

Why is Perl so useful for sysadmin and WWW and text hacking? It has a lot of nice little features that make it easy to do nearly anything you want to text. A lot of perl programs look like a weird synergy of C and shell and sed and awk. For example:

    #!/usr/bin/perl
    # manpath — [email protected]
    foreach $bindir (split(/:/, $ENV{PATH})) {
        ($mandir = $bindir) =~ s/[^\/]+$/man/;
        next if $mandir =~ /^\./ || $mandir eq ”;
        if (-d $mandir && ! $seen{$mandir}++ ) {
            ($dev,$ino) = stat($mandir);
            if (! $seen{$dev,$ino}++) {
                push(@manpath,$mandir);
            }
        }
    }
    print join(“:”, @manpath), “\n”;

Can anyone see what that does? I’d like to think it’s not too hard, even devoid of commentary. It does have some naughty bits, like using side effect operators of assignment operators as expressions and double-plus postfix autoincrement. C programmers don’t have a problem with it, but a lot of others do. That’s why Guido banned such things in Python (a rather nice language in many ways), and why I don’t advocate using them to non-C programmers, whom it generally confuses whether it be done in C or in Perl or C++ or any such language.
By far the most bizarre thing is that dread punctuation lying within funny slashes. Often folks call Perl unreadable because they don’t grok regexps, which all true perl wizards — and acolytes — adore. The slashes and their patterns govern matching and splitting and substituting, and here is where a lot of the Perl magic resides: its unmatched 🙂 regular expressions. Certainly the above code could be rewritten in tcl or python or nearly anything else. It could even be rewritten in more legible perl. 🙂

So what’s so special about perl’s regexps? Quite a bit, actually, although the real magic isn’t demonstrated very well in the manpath program. Once you’ve read the perlre(1) and the perlop(1) man pages, there’s still a lot to talk about. So permit me, if you would, to now explain Far More Than Everything You Ever Wanted to Know about Perl Regular Expressions… 🙂

Perl starts with POSIX regexps of the “modern” variety, that is, egrep style not grep style. Here’s the simple case of matching a number

/Th?om(as)? (Ch|K)rist(ia|e)ns{1,2}s[eo]n/

This avoids a lot of backslashes. I believe many languages also support such regular rexpressions.
Now, Perl’s regexps “aren’t” — that is, they aren’t “regular” because backreferences per sed and grep are also supported, which renders the language no longer strictly regular and so forbids “pure” DFA implementations.

But this is exceedingly useful. Backreferences let you refer back to match part of what you just had. Consider lines like these:

    1. This is a fine kettle of fish.
    2. The best card is Island Fish Jasconius.
    3. Is isolation unpleasant?
    4. That’s his isn’t it?
    5. Is is outlawed?

If you’d like to pick up duplicate “is” strings there, you could use the pattern
/(is) \1/ # matches 1,4

As written, that will match sentences 1 and 4. The others fail due to mixed case. You can’t fix it just by saying
/([Ii]s) \1/ # still matches 1,4

because the \1 refers back to the real match, not the potential match. So what do we do? Well, POSIX specifies a REG_ICASE flag you can pass into your matcher to help support “grep -i” etc. To get perl to do this, affix an i flag after the match:
/(is) \1/i # matches 1,2,3,4,5

And now all 5 of those sentences match. If you only wanted them to match legit words, you might use the \b notation for word boundaries, making it
    /\b(is) \1/i        # matches 2,3,5
    /(is) \1\b/i        # matches 1,5
    /\b(is) \1\b/i      # matches 5

This means you will see Perl code like
    if ( $variable =~ /\b(is) \1\b/i ) {
        print “gotta match”;
    }

One might argue that is “should” be written more like
    if ( rematch(variable, ‘\b(is) \1\b’, ‘i’) ) {
        print “gotta match”;
    }

but that’s not how Perl works. I suspect that other languages could make it work that way.
If you’d like to know where you matched, you might want to use these:

        $MATCH                  full match
        $PREMATCH               before the match
        $POSTMATCH              after the match
        $LAST_PAREN_MATCH       useful for alternatives

Although the most normal case is just to use $1, $2, etc, which match the first, second, etc parenthesized subexpressions.
Another nice thing that Perl supports are the notions from C’s ctype.h include file:

C function Perl regexp

        isalnum         \w
        isspace         \s
        isdigit         \d

That means that you don’t have to hard-code [A-Z] and have it break when someone has some interesting locale settings. For example, under charset=ISO-8859-1, something like “façade” properly matches /^\w+$/, because the c-cedille is considered an alphanum. In theory, LC_NUMERIC settings should also take, but I’ve never tried.
This quickly leads to a pattern that detects duplicate words in sentences:

/\b(\w+)(\s+\1)+\b/i

In fact, that one matches multiple duplicates as well. If if if you read in your input data a paragraph at a time, it will catch dups crossing line boundaries as as well. For example, using some convenient command line flags, here’s a
perl -00 -ne ‘if ( /\b(\w+)(\s+\1)+\b/i ) { print “dup $1 at $.\n” }’

which when used on this article says:
dup Is at 10
dup If at 33

the $. variable ($NR in English mode) is the record number. I set it to read paragraph records, so paragraphs 10 and 33 of this posting contain duplicate words.
Actually, we can do something a bit nicer: we can find multiple duplicates in the same paragraph. The /g flag causes a match to store a bit of state and start up where it last left off. This gives us:

    #!/usr/bin/perl -00 -n
    while ( /\b(\w+)(\s+\1)+\b/gi ) {
        print “dup $1 at paragraph $.\n”;
    }

This now yields:
    dup Is at paragraph 10
    dup if at paragraph 33
    dup as at paragraph 33

Of course, we’re getting a bit hard to read here. So let’s use the /x flag to permit embedded white space and comments in our pattern — you’ll want 5.002 for this (the white space worked in 5.000, but the comments were added later :-). For legibility, instead of slashes for the match, I’ll embrace the real m() function, Since /foo/ and m(foo) and m{foo} are all equivalent.
    #!/usr/bin/perl -n
    require 5.002;
    use English;
    $RS = ”;

while (
m{ # m{foo} is like /foo/, but helps vi’s % key

\b # first find a word boundary

                 (\w+)          # followed by the biggest word we can find
                                # which we’ll save in the \1 buffer
                 (
                   \s+          # now have some white space following it
                   \1           # and the word itself
                 )+             # repeat the space+word combo ad libitum

\b # make sure there’s a boundary at the end too

        }xgi                    # /x for space/comment-expanded patterns
                                # /g for global matching
                                # /i for case-insensitive matching
    )
    {
        print “dup $1 at paragraph $NR\n”;
    }

While it’s true that someone who doesn’t know regular expressions won’t be able to read this at first glance, this is not a problem. So even though we can build up rather complex patterns, we can format and comment them nicely, preserving understandability. I wonder why no one else has done this in their regexp libraries?
I actually wrote a sublegible version of this many years ago. It runs even on ancient versions of Perl. I’d probably to that a bit differently these days — my coding style has certainly matured. It violates several of my own current style guidelines.

    #!/usr/bin/perl
    undef $/; $* = 1;
    while ( $ARGV = shift ) {
        if (!open ARGV) { warn “$ARGV: $!\n”; next; }
        $_ = <ARGV&gt$$
        s/\b(\s?)(([A-Za-z]\w*)(\s+\3)+\b)/$1\200$2\200/gi || next;
        split(/\n/);
        $NR = 0;
        @hits = ();
        for (@_) {
            $NR++;
            push(@hits, sprintf(“%5d %s”, $NR, $_)) if /\200/;
        }
        $_ = join(“\n”,@hits);
        s/\200([^\200]+)\200/[* $1 *]/g;
        print “$ARGV:\n$_\n”;
    }

here’s that will output when run on this article up to this current point:
51 5. [* Is is *] outlawed?
124 In fact, that one matches multiple duplicates as well. [* If
125 if if *] you read in your input data a paragraph at a time, it will
126 catch dups crossing line boundaries [* as as *] well. For example, using

Which is pretty neat.
Speaking of ctype.h macros, Perl borrows the vi notation of case translation via \u, \l, \U, and \L. So you could say

$variable = “façade niño coöperate molière renée naïve hæmo tschüß”;
and then do a

$variable =~ s/(\w+)/\U$1/g;

and it would come out
FAÇADE NIÑO COÖPERATE MOLIÈRE RENÉE NAÏVE HÆMO TSCHÜß

Oh well. My clib doesn’t know to turn ß -> SS. That’s a harder issue.
This is much better than writing things like

$variable =~ tr[a-z][A-Z];

because that would give you:
FAçADE NIñO COöPERATE MOLIèRE RENéE NAïVE HæMO TSCHüß

which isn’t right at all.
Actually, perl can beat vi and do this:

$variable =~ s/(\w+)/\u\L$1/g;

Yielding:
Façade Niño Coöperate Molière Renée Naïve Hæmo Tschüß

which is somewhat interesting.
Speaking of substitutes, we can use a /e flag on the substitute to get the RHS to evaluate to code instead of just a string. Consider:

s/(\d+)/8 * $1/ge; # multiple all numbers by 8
s/(\d+)/sprintf(“%x”, $1)/ge; # convert them to hex

This is nice when renumbering paragraphs. I often write
s/^(\d+)/1 + $1/

or from within vi, just
%!perl -pe ‘s/^(\d+)/1 + $1/’

Here’s a more elaborate example of this. If you wanted to expand %d or %s or whatnot, you might just do
s/%(.)/$percent{$1}/g;

given a %percent definition like this:
    %percent = (
                ‘d’     => ‘digit’,
                ‘s’     => ‘string’,
    );

But in fact, that’s got quite enough. You might well want to call a function, like
s/%(.)/unpercent($1)/ge;

(assuming you have an unpercent() function defined.)
You can even use /ee for a double-eval, but that seems going overboard in most cases. It is, however, nice for converting embedded variables like $foo or whatever in text into their values. This way a sentence with $HOME and $TERM in it, assuming there were valid variables, might become a sentence with /home/tchrist and xterm in it. Just do this:

s/(\$\w+)/$1/eeg;

Ok, what more can we do with perl patterns? split takes a pattern. Imagine that you have a record stored in plain text as blank line separated paragraphs with FIELD: VALUE pairs on each line.
    field: value here
    somefield: some value here
    morefield: other value here

    field: second record’s value here
    somefield: some value here
    morefield: other value here
    newfield: other funny stuff

You could process that this way. We’ll put it into key value pairs in a hash, just as though it had been initialized as
    %hash = (
        ‘field’         => ‘value here’,
        ‘somefield’     => ‘some value here’,
        ‘morefield’     => ‘other value here’,
    );

I’ll use a few command line switches for short cuts:
    #!/usr/bin/perl -00n
    %hash = split( /^([^:]+):\s*/m );
    if ( $hash{“somefield”} =~ /here/) {
        print “record $. has here in somefield\n”;
    }

The /m flag governs whether ^ can match internally. I believe this is the POSIX value REG_NEWLINE. Normally perl does not have ^ match anywhere but the beginning of the string. (
Or you could eschew shortcuts and write:

    #!/usr/bin/perl
    use English;
    $RS = ”;
    while ( $line = <ARGV> ) {
        %hash = split(/^([^:]+):\s*/m, $line);
        if ( $hash{“somefield”} =~ /here/) {
            print “record $NR has here in somefield\n”;
        }
    }

Actually, in the current version of perl, you can use the getline() object method on the predefined ARGV file handle object:
    #!/usr/bin/perl
    use English;
    $RS = ”;
    while ( $line = ARGV->getline() ) {
        %hash = split(/^([^:]+):\s*/m, $line);
        if ( $hash{“somefield”} =~ /here/) {
            print “record $NR has here in somefield\n”;
        }
    }

This can be especially convenient for handling mail messages.
Here, for example, is a bair-bones mail-sorting program:

    #!/usr/bin/perl -00
    while (<>) {
        if ( /^From / ) {
            ($id) = /^Message-ID:\s*(.*)/mi;
            $sub{$id} = /^Subject:\s*(Re:\s*)*(.*)/mi
                            ? uc($2)
                            : $id;
        }
        $msg{$id} .= $_;
    }
    print @msg{ sort { $sub{$a} cmp $sub{$b} } keys %msg};

Now, I still haven’t mentioned a couple of features which are to my mind critical in any analysis of the strengths of Perl’s pattern matching. These are stingy matching and lookaheads.
Stingy matching solves the problem greedy matching. A greedy match picks up everything, as in:

       $line = “The food is under the bar in the barn.”;
       if ( $line =~ /foo(.*)bar/ ) {
           print “got <$1>\n”;
       }

That prints out
<d is under the bar in the >

Which is often not what you want. Instead, we can add an extra ? after a repetition operator to render it stingy instead of greedy.
       if ( $line =~ /foo(.*?)bar/ ) {
           print “got <$1>\n”;
       }

That prints out
got <d is under the >

which is often more what folks want. It turns out that having both stringy and greedy repetition operators in no way compromises a regexp engines regularity, nor is it particularly hard to implement. This comes up in matching quoted things. You can do tricks like using [^:] or [^”] or [^”‘] for the simple cases, but negating multicharacter strings is hard. You can just use stingy matching instead.
Or you could just use lookaheads.

This is other important aspect of perl matching I wanted to mention. These are 0-width assertions that state that what follows must match or must not match a particular thing. These are phrased as either (?=pattern) for the assertion or (?!pattern) for the negation.

/\bfoo(?!bar)\w+/

That will match “foostuff” but not “foo” or “foobar”, because I said there must be some alphanums after the word foo, but these may not begin with bar.
Why would you need this? Oh, there are lots of times. Imagine splitting on newlines that are not followed by a space or a tab:

@list_of_results = split ( /\n(?![\t ])/, $data );

Let’s put this all together and look at a couple of examples. Both have to do with HTML munging, the current rage. First, let’s solve the problem of detecting URLs in plaintext and highlighting them properly. A problem is if the URL has trailling punctuation, like ftp://host/path.file. Is that last dot supposed to be in the URL? We can probably just assume that a trailing dot doesn’t count, but even so, most scanners seem to get this wrong. Here’s a different approach:
#!/usr/bin/perl
# urlify — [email protected]
require 5.002; # well, or 5.000 if you see below

$urls = ‘(‘ . join (‘|’, qw{
                http
                telnet
                gopher
                file
                wais
                ftp
            } )
        . ‘)’;

$ltrs = ‘\w’;
$gunk = ‘/#~:.?+=&%@!\-‘;
$punc = ‘.:?\-‘;
$any = “${ltrs}${gunk}${punc}”;

while (<>) {
      ## use this if early-ish perl5 (pre 5.002)
      ## s{\b(${urls}:[$any]+?)(?=[$punc]*[^$any]|\Z)}{<A HREF=”$1″>$1</A>}goi;
      ## otherwise use this — it just has 5.002ish comments
      s{
        \b                          # start at word boundary
        (                           # begin $1 {
          $urls     :               # need resource and a colon
          [$any] +?                 # followed by on or more
                                    # of any valid character, but
                                    # be conservative and take only
                                    # what you need to….
        )                           # end   $1 }
        (?=                         # look-ahead non-consumptive assertion
                [$punc]*            # either 0 or more puntuation
                [^$any]             #   followed by a non-url char
            |                       # or else
                $                   #   then end of the string
        )
      }{<A HREF=”$1″>$1</A>}igox;
      print;
}

Pretty nifty, eh? 🙂
Here’s another HTML thing: we have an html document, and we want to remove all of its embedded markup text. This requires three steps:

    1) Strip <!– html comments –>
    2) Strip <TAGS>
    3) Convert &entities; into what they should be.

This is complicated by the horrible specs on how html comments work: they can have embedded tags in them. So you have to be way more careful. But it still only takes three substitutions. 🙂 I’ll use the /s flag to make sure that my “.” can stretch to match a newline as well (normally it doesn’t).
    #!/usr/bin/perl -p0777
    #
    #########################################################
    # striphtml (“striff tummel”)
    # [email protected]
    # version 1.0: Thu 01 Feb 1996 1:53:31pm MST
    # version 1.1: Sat Feb 3 06:23:50 MST 1996
    #           (fix up comments in annoying places)
    #########################################################
    #
    # how to strip out html comments and tags and transform
    # entities in just three — count ’em three — substitutions;
    # sed and awk eat your heart out. 🙂
    #
    # as always, translations from this nacré rendition into
    # more characteristically marine, herpetoid, titillative,
    # or indonesian idioms are welcome for the furthering of
    # comparitive cyberlinguistic studies.
    #
    #########################################################

require 5.001; # for nifty embedded regexp comments

    #########################################################
    # first we’ll shoot all the <!– comments –>
    #########################################################

s{ <! # comments begin with a `<!’
# followed by 0 or more comments;

(.*?) # this is actually to eat up comments in non
# random places

( # not suppose to have any white space here

                            # just a quick start;
          —                # each comment starts with a `–‘
            .*?             # and includes all text up to and including
          —                # the *next* occurrence of `–‘
            \s*             # and may have trailing while space
                            #   (albeit not leading white space XXX)
         )+                 # repetire ad libitum XXX should be * not +
        (.*?)           # trailing non comment text
       >                    # up to a `>’
    }{
        if ($1 || $3) { # this silliness for embedded comments in tags
            “<!$1 $3>”;
        }
    }gesx;                 # mutate into nada, nothing, and niente

    #########################################################
    # next we’ll remove all the <tags>
    #########################################################

s{ < # opening angle bracket

        (?:                 # Non-backreffing grouping paren
             [^>'”] *       # 0 or more things that are neither > nor ‘ nor “
                |           #    or else
             “.*?”          # a section between double quotes (stingy match)
                |           #    or else
             ‘.*?’          # a section between single quotes (stingy match)
        ) +                 # repetire ad libitum
                            # hm…. are null tags <> legal? XXX
       >                    # closing angle bracket
    }{}gsx;                 # mutate into nada, nothing, and niente

    #########################################################
    # finally we’ll translate all &valid; HTML 2.0 entities
    #########################################################

    s{ (
            &              # an entity starts with a semicolon
            (
                \x23\d+    # and is either a pound (# == hex 23)) and numbers
                 |         #   or else
                \w+        # has alphanumunders up to a semi
            )
            ;?             # a semi terminates AS DOES ANYTHING ELSE (XXX)
        )
    } {

        $entity{$2}        # if it’s a known entity use that
            ||             #   but otherwise
            $1             # leave what we’d found; NO WARNINGS (XXX)

    }gex;                  # execute replacement — that’s code not a string

    #########################################################
    # but wait! load up the %entity mappings enwrapped in
    # a BEGIN that the last might be first, and only execute
    # once, since we’re in a -p “loop”; awk is kinda nice after all.
    #########################################################

BEGIN {

%entity = (

            lt     => ‘<‘,     #a less-than
            gt     => ‘>’,     #a greater-than
            amp    => ‘&’,     #a nampersand
            quot   => ‘”‘,     #a (verticle) double-quote

            nbsp   => chr 160, #no-break space
            iexcl => chr 161, #inverted exclamation mark
            cent   => chr 162, #cent sign
            pound => chr 163, #pound sterling sign CURRENCY NOT WEIGHT
            curren => chr 164, #general currency sign
            yen    => chr 165, #yen sign
            brvbar => chr 166, #broken (vertical) bar
            sect   => chr 167, #section sign
            uml    => chr 168, #umlaut (dieresis)
            copy   => chr 169, #copyright sign
            ordf   => chr 170, #ordinal indicator, feminine
            laquo => chr 171, #angle quotation mark, left
            not    => chr 172, #not sign
            shy    => chr 173, #soft hyphen
            reg    => chr 174, #registered sign
            macr   => chr 175, #macron
            deg    => chr 176, #degree sign
            plusmn => chr 177, #plus-or-minus sign
            sup2   => chr 178, #superscript two
            sup3   => chr 179, #superscript three
            acute => chr 180, #acute accent
            micro => chr 181, #micro sign
            para   => chr 182, #pilcrow (paragraph sign)
            middot => chr 183, #middle dot
            cedil => chr 184, #cedilla
            sup1   => chr 185, #superscript one
            ordm   => chr 186, #ordinal indicator, masculine
            raquo => chr 187, #angle quotation mark, right
            frac14 => chr 188, #fraction one-quarter
            frac12 => chr 189, #fraction one-half
            frac34 => chr 190, #fraction three-quarters
            iquest => chr 191, #inverted question mark
            Agrave => chr 192, #capital A, grave accent
            Aacute => chr 193, #capital A, acute accent
            Acirc => chr 194, #capital A, circumflex accent
            Atilde => chr 195, #capital A, tilde
            Auml   => chr 196, #capital A, dieresis or umlaut mark
            Aring => chr 197, #capital A, ring
            AElig => chr 198, #capital AE diphthong (ligature)
            Ccedil => chr 199, #capital C, cedilla
            Egrave => chr 200, #capital E, grave accent
            Eacute => chr 201, #capital E, acute accent
            Ecirc => chr 202, #capital E, circumflex accent
            Euml   => chr 203, #capital E, dieresis or umlaut mark
            Igrave => chr 204, #capital I, grave accent
            Iacute => chr 205, #capital I, acute accent
            Icirc => chr 206, #capital I, circumflex accent
            Iuml   => chr 207, #capital I, dieresis or umlaut mark
            ETH    => chr 208, #capital Eth, Icelandic
            Ntilde => chr 209, #capital N, tilde
            Ograve => chr 210, #capital O, grave accent
            Oacute => chr 211, #capital O, acute accent
            Ocirc => chr 212, #capital O, circumflex accent
            Otilde => chr 213, #capital O, tilde
            Ouml   => chr 214, #capital O, dieresis or umlaut mark
            times => chr 215, #multiply sign
            Oslash => chr 216, #capital O, slash
            Ugrave => chr 217, #capital U, grave accent
            Uacute => chr 218, #capital U, acute accent
            Ucirc => chr 219, #capital U, circumflex accent
            Uuml   => chr 220, #capital U, dieresis or umlaut mark
            Yacute => chr 221, #capital Y, acute accent
            THORN => chr 222, #capital THORN, Icelandic
            szlig => chr 223, #small sharp s, German (sz ligature)
            agrave => chr 224, #small a, grave accent
            aacute => chr 225, #small a, acute accent
            acirc => chr 226, #small a, circumflex accent
            atilde => chr 227, #small a, tilde
            auml   => chr 228, #small a, dieresis or umlaut mark
            aring => chr 229, #small a, ring
            aelig => chr 230, #small ae diphthong (ligature)
            ccedil => chr 231, #small c, cedilla
            egrave => chr 232, #small e, grave accent
            eacute => chr 233, #small e, acute accent
            ecirc => chr 234, #small e, circumflex accent
            euml   => chr 235, #small e, dieresis or umlaut mark
            igrave => chr 236, #small i, grave accent
            iacute => chr 237, #small i, acute accent
            icirc => chr 238, #small i, circumflex accent
            iuml   => chr 239, #small i, dieresis or umlaut mark
            eth    => chr 240, #small eth, Icelandic
            ntilde => chr 241, #small n, tilde
            ograve => chr 242, #small o, grave accent
            oacute => chr 243, #small o, acute accent
            ocirc => chr 244, #small o, circumflex accent
            otilde => chr 245, #small o, tilde
            ouml   => chr 246, #small o, dieresis or umlaut mark
            divide => chr 247, #divide sign
            oslash => chr 248, #small o, slash
            ugrave => chr 249, #small u, grave accent
            uacute => chr 250, #small u, acute accent
            ucirc => chr 251, #small u, circumflex accent
            uuml   => chr 252, #small u, dieresis or umlaut mark
            yacute => chr 253, #small y, acute accent
            thorn => chr 254, #small thorn, Icelandic
            yuml   => chr 255, #small y, dieresis or umlaut mark
        );

        ####################################################
        # now fill in all the numbers to match themselves
        ####################################################
        for $chr ( 0 .. 255 ) {
            $entity{ ‘#’ . $chr } = chr $chr;
        }
    }

    #########################################################
    # premature finish lest someone clip my signature
    #########################################################

# NOW FOR SOME SAMPLE DATA — Switch ARGV to DATA above
# to test

__END__

<title>Tom Christiansen’s Mox.Perl.COM Home Page</title>
<!– begin header –>
<A HREF=”http://perl-ora.songline.com/universal/header.map”><IMG SRC=”http://perl-ora.songline.com/graphics/header-nav.gif” HEIGHT=”18″ WIDTH=”515″ ALT=”Nav bar” BORDER=”0″ usemap=”#header-nav”></A>

<!– end header –>
<BODY BGCOLOR=#ffffff TEXT=#000000>

<!–

!–>

    <CENTER>
    <h3>
    <A HREF=”#PERL”>perl</a> /
    <A HREF=”#MAGIC”>magic</a> /
    <A HREF=”#USENIX”>usenix</a> /
    <A HREF=”#BOULDER”>boulder</a>
    </h3>
    <BR>
    The word of the day is <i>nidificate</i>.
    </CENTER>

Testing: E Ê Ä
</a>

    DOCTYPE START1
    <!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”
      — This is an annoying comment > —
    >
    END1

    DOCTYPE START2
    <!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”
      — This is an annoying comment —
    >
    END2

    <I>
    <BLOCKQUOTE>
    <DL><DT>A ship then new they built for him
    <DD>of mithril and of elven glass…
    </DL>
    </I>
    </BLOCKQUOTE>
    </CENTER>

    <BLOCKQUOTE>
        Wow! I really can’t believe that anyone has read this far
        in this very long news posting about irregular expressions. 🙂
        Is anyone really still with me? If so, make my day and
        drop me a piece of email.
    </BLOCKQUOTE>

    <UL>
    <LI>
    <A HREF=”/CPAN/README.html”>CPAN
    (Comprehensive Perl Archive Network)</a> sites are replicated around the world; please ch
oose
    from <A HREF=”/CPAN/CPAN.html”>one near you</a>.
    The <A HREF=”/CPAN/modules/01modules.index.html”>CPAN index</a
>
    to the <A HREF=”/CPAN/modules/00modlist.long.html”>full module
s file</a>
    are also good places to look.

    <LI><IMG SRC=”/deckmaster/gifs/new.gif” WIDTH=26 HEIGHT=13 ALT=”NEW”>
    Here’s a table of perl and CGI-related books and publications, in either
    <A HREF=”/info/books.html”><SMALL>HTML</SMALL> 3.0 table format</a>
    or else in
    <A HREF=”/info/books.txt”>pre-formatted</a> for old browsers.

What’s missing from Perl’s regular expressions? Anything? Well, yes. The first is that they should be first-class objects. There are some really embarassing optimization hacks to get around not having compiled regepxs directly-usable accessible. The /o flag I used above is just one of them. (I’m *not* talking about the study() function, which is a neat thing to turbo-ize your matching.) A much more egregious hack involving closures is demonstrated here using the match_any funtion, which itself returns a function to do the work:
    $f = match_any(‘^begin’, ‘end$’, ‘middle’);
    while (<>) {
        print if &$f();
    }

    sub match_any {
        die “usage: match_any pats” unless @_;
        my $code = <<EOCODE;
    sub {
    EOCODE
        $code .= <<EOCODE if @_ > 5;
        study;
    EOCODE
        for $pat (@_) {
            $code .= <<EOCODE;
        return 1 if /$pat/;
    EOCODE
        }
        $code .= “}\n”;
        print “CODE: $code\n”;
        my $func = eval $code;
        die “bad pattern: $@” if $@;
        return $func;
    }

That’s the kind of thing I just despise writing: the only thing worse would be not being able to do it at all. 🙁 1st-class compiled regexps would surely help a great deal here.
Sometimes people expect backreferences to be forward references, as in the pattern /\1\s(\w+)/, which just isn’t the way it works. A related issue is that while lookaheads work, these are not lookbehinds, which can confuse people. This means /\n(?=\s)/ is ok, but you cannot use this for lookbehind: /(?!foo)bar/ will not find an occurrence of “bar” that is preceded by something which is not “foo”. That’s because the (?!foo) is just saying that the next thing cannot be “foo”–and it’s not, it’s a “bar”, so “foobar” will match.

There isn’t really much support for user-defined character classes. You see a bit of that in the urlify program above. On the other hand, this might be the most clear way of writing it.

Another thing that would be nice to have is the ability to someone specify a recursive match with nesting. That ways you could pull out matching parens or braces or begin/end blocks etc. I don’t know what a good syntax for this might be. Maybe (?{…) for the opening one and (?}…) for the closing one, as in:

/\b(?{begin)\b.*\b(?}end)\b/i

Finally, while it’s cool that perl’s patterns are 8-bit clean, will match strings even with null bytes in them, and have support for alternate 8-bit character sets, it would certainly make the world happy if there were full Unicode support.

Escape Mission 逃脫任務

Escape Mission 逃脫任務

軟體：Escape Mission(版本：N/A)
類別：益智遊戲
性質：Freeware()

【編輯/宗文】

這是一款益智遊戲，玩家的目標是要推開擋住主角的箱子，然後找出一條可以前往出口處的通道，當主角走到出口處便能順利過關。(會呈現閃動狀態的棕色格子便是出口處。)遊戲中玩家可以看到畫面下方有一個「MOVES」字樣，其底下有許多小白點，每當玩家移動一次就會扣一白點，當關卡結束後留有愈多白點可以獲得更多積分。

遊戲中玩家一次只能推動一個箱子，而且如果箱子推到牆壁邊緣或者路徑的旁邊時就無法再推動，因此玩家必須先經過縝密思考再來推動箱子，否則很可能會困住自己。另外遊戲中還會出現一些障礙物，例如鐵製轉門，當轉門旁邊有箱子時，將不能推動轉門，玩家必須先將旁邊的箱子移走。又例如一些帶有三角形圖案的地板，必須是特定的主角才能通過，這些限制都增加了遊戲的困難度，讓玩家更加傷腦筋。遊戲中有些關卡是要多人走到出口才能過關，因此有可能要多人合作，才能清出一條通往出口的通道。

遊戲操控說明：
1.利用四個方向鍵移動主角。
2.空白鍵可以切換不同主角。
3.滑鼠左鍵點擊「RESET!」可以重玩本關卡，不過會扣一生命值。

下載：

http://www.cartoonnetwork.com/games/adventure/knd/escapemission/index.html

Super Pirate Isle 海盜之島

Super Pirate Isle 海盜之島

軟體：Super Pirate Isle(版本：N/A)
類別：動作遊戲
性質：Freeware()

【編輯/宗文】

玩家將會隨著遊戲的進行，前往各處不同的島嶼，然後引導海盜來找到藏寶箱，只有在限定的時間之內找完關卡中的全部藏寶箱，才能順利過關，並且可以有資格來挑戰更艱難的下一個關卡。而如果無法在時間之內找完，則整個遊戲都會結束。愈早過關，可以獲得時間積分會愈多。

遊戲中玩家會遇到許多敵人，他們會到處走動，阻擾玩家找到藏寶箱。玩家可以利用炸彈來攻擊他們，不過再放置炸彈之後，要迅速躲避，要不然可能會被炸彈所波及。不管是碰觸到敵人或者被炸彈所炸傷，這些都會扣一生命值，玩家要小心才能順利過關。當玩家要走到地圖的另一側時要注意，因為敵人有能就躲在邊邊角角的地方，一不注意就會碰觸到。

遊戲操控說明：
1.利用四個方向鍵來移動主角。
2.空白鍵可以放置炸彈。

下載：

http://www.gamesforwork.com/games/play-6039-Super_Pirate_Isle-Flash_Game

Panik in Platform Peril 潘尼克冒險記

Panik in Platform Peril 潘尼克冒險記

軟體：Panik in Platform Peril(版本：N/A)
類別：動作遊戲
性質：Freeware()

【編輯/宗文】

這是一款動作遊戲，而玩家的目標是先找到鑰匙，然後尋找出口處來用鑰匙開啟出口，如此才能順利過關，並且可以來挑戰更艱難的下一關。

遊戲中玩家會發現一路上會有許多魚骨頭出現，玩家可以盡量來收集它們，將可獲得更多的積分。當主角遭受敵人攻擊，或者碰觸到各種機關時，畫面正下方的力量值會減少，如果玩家能找到胡蘿蔔的話，將可以補充部分減少的力量值。遊戲中玩家可以丟擲武器來攻擊敵人，將它們暫時冰凍住，但要小心一會兒後他們又可以開始自由行動了。

遊戲中除了會出現敵人之外，還會有各種機關，例如會刺人的仙人掌，或者可怕的機械拳擊手套等等，玩家要小心避開它們。另外跳躍能力是玩家要過關所倚重的能力，許多地方必須來連續跳躍，例如某些地方必須藉由跳過多顆氣球，才能到達高處，而這些氣球又很容易破損，所以動作要快，才能順利到達目的地。

遊戲操控說明：
1.利用方向鍵中的左右鍵移動主角。
2.方向鍵中的上鍵可以進行跳躍，下鍵則是蹲下。
3.空白鍵可以丟出攻擊武器，暫時冰凍住敵人。

下載：

http://www.gamesforwork.com/games/play-6054-Panik_in_Platform_Peril-Flash_Game

Asteroids Revenge 3 行星的復仇行動

Asteroids Revenge 3 行星的復仇行動

軟體：Asteroids Revenge 3(版本：N/A)
類別：動作遊戲
性質：Freeware()

【編輯/宗文】

遊戲中玩家將要控制一顆行星，與眾多敵人對抗。攻擊的方式是利用行星本身來撞擊敵人，或者也可以利用周邊的小行星來攻擊敵人。敵人的種類繁多，有些會成群結隊出現，發射出大量子彈來攻擊主角，有些則是非常會閃躲，要打倒他們可是要費一番功夫的。另外有些類似炸彈，會產生大爆炸，如果太靠近爆炸區域可是會受到重傷害的。有些則是會產生排擠的力量，讓主角很難靠近來攻擊他們。因此要順利打敗這些敵人可是不容易的喔！

遊戲中玩家必須保護自己的主要行星，盡量不讓他受傷害，而要達到此目的，可以利用周邊的小行星來抵擋各式子彈的攻擊。另外每次過關後，會有一些增強我方各式能力的選項，玩家可以選擇一個項目來增強能力，例如讓行星的速度加快，或者讓行星變的更大等等，這些將是過關斬將的利器。

遊戲操控方面：
1.利用滑鼠移動或四個方向鍵來移動我們的主角行星。
2.Z鍵可以讓周邊的小行星靠攏，X鍵則是讓小行星與行星距離拉大。
3.C鍵可以讓行星與小行星分離。

下載：

http://www.crazymonkeygames.com/Asteroids-Revenge-3.html

stay the distance 賽馬

stay the distance 賽馬

軟體：stay the distance(版本：N/A)
類別：動作遊戲
性質：Freeware()

【編輯/宗文】

這是一款賽馬的遊戲，玩家的目標是能夠達成第一個跑回終點站的騎士。不過要當冠軍可不簡單喔！必須要考慮眾多因素，每個環節都能注意才能順利達成。玩家可以看到畫面右上角有一些指示的訊息，綠色部分是代表距離終點站還有多少距離，紅色部分則代表此匹馬的體力多寡程度，當沒有體力時，馬匹的移動速度會相當緩慢。最靠近右上角的「PACE」代表馬匹移動的速度，玩家控制的馬移動速度快的話，下方體力則會消耗的快，因此要如何搭配得當，就要靠玩家的智慧了！

遊戲中有一個項目「WHIPS」，這就是用鞭子抽打馬匹，這可使馬匹的速度急速增加，不過只能支撐一會兒的時間，很快的速度又會降回來，另外此功能只有三次機會，玩家要慎選時機來利用。遊戲中遇到障礙物，必須要操控馬匹跨欄，這個時間點可要選好，否則可是會摔下馬的喔！

遊戲操控說明：
1.利用方向鍵中的上下鍵增加或減少馬匹移動速度。
2.利用方向鍵中的左右鍵使馬匹左右橫向移動。
3.Ctrl鍵可使用「WHIPS」功能。
4.空白鍵可以使馬匹跨欄。

下載：http://www.miniclip.com/games/stay-the-distance/en/

Super Chick Sisters 超級雞姊妹

Super Chick Sisters 超級雞姊妹

軟體：Super Chick Sisters(版本：N/A)
類別：動作遊戲
性質：Freeware()

【編輯/宗文】

這是一款動作遊戲，它的玩法非常類似「超級瑪莉」。玩家的最終目標是打敗大魔王，但是要找到大魔王前，必須經過層層的關卡，與眾多敵人對抗。遊戲中玩家會遇到一些小雞，玩家可以盡量來收集這些小雞，當收集滿一百隻時便能增加一生命值。

遊戲中玩家會看到一些帶有問號的石頭，玩家可以用主角的頭部去撞擊，有些時候可以增加收集小雞的數量，有些時候還可以得到讓主角變大的寶物喔！遊戲中主角取得寶物後，身體會變大，當遭後敵人攻擊後身體會恢復原本大小，不過不會扣一生命值。遊戲中除了要對付眾多不同種類的敵人之外，還有許多危險的地形與機關要注意喔！例如有許多坑洞，或者溫度非常高的油鍋，或者會將主角壓扁的機關等等，玩家必須利用跳躍的方式或者找適當時間差來避開。

遊戲中會有許多隱藏的地點，藏有可以增加生命值的寶物，或者許多的小雞，玩家可以多多探索，來發現這些地點。另外遊戲一開始只能選擇兩位主角，而玩家如能全破關後會有一個密碼，在片頭輸入密碼後便能選擇另外一位女主角進行遊戲。遊戲操控方面，利用方向鍵中的左右鍵移動主角，上鍵可以跳躍。

下載：http://www.kentuckyfriedcruelty.com/superchicksisters/superChickSisters.zip

MySQL 服务器调优

2007 年 7 月 30 日

如今，开发人员不断地开发和部署使用 LAMP（Linux®、Apache、MySQL 和 PHP/Perl）架构的应用程序。但是，服务器管理员常常对应用程序本身没有什么控制能力，因为应用程序是别人编写的。这份共三部分的系列文章将讨论许多服务器配置问题，这些配置会影响应用程序的性能。本文是本系列文章的第三部分，也是最后一部分，将重点讨论为实现最高效率而对数据库层进行的调优。

关于 MySQL 调优

有 3 种方法可以加快 MySQL 服务器的运行速度，效率从低到高依次为：

替换有问题的硬件。
对 MySQL 进程的设置进行调优。
对查询进行优化。

替换有问题的硬件通常是我们的第一考虑，主要原因是数据库会占用大量资源。不过这种解决方案也就仅限于此了。实际上，您通常可以让中央处理器（CPU）或磁盘速度加倍，也可以让内存增大 4 到 8 倍。

第二种方法是对 MySQL 服务器（也称为 mysqld）进行调优。对这个进程进行调优意味着适当地分配内存，并让 mysqld 了解将会承受何种类型的负载。加快磁盘运行速度不如减少所需的磁盘访问次数。类似地，确保 MySQL 进程正确操作就意味着它花费在服务查询上的时间要多于花费在处理后台任务（如处理临时磁盘表或打开和关闭文件）上的时间。对 mysqld 进行调优是本文的重点。

最好的方法是确保查询已经进行了优化。这意味着对表应用了适当的索引，查询是按照可以充分利用 MySQL 功能的方式来编写的。尽管本文并没有包含查询调优方面的内容（很多著作中已经针对这个主题进行了探讨），不过它会配置 mysqld 来报告可能需要进行调优的查询。

虽然已经为这些任务指派了次序，但是仍然要注意硬件和 mysqld 的设置以利于适当地调优查询。机器速度慢也就罢了，我曾经见过速度很快的机器在运行设计良好的查询时由于负载过重而失败，因为 mysqld 被大量繁忙的工作所占用而不能服务查询。

记录慢速查询

在一个 SQL 服务器中，数据表都是保存在磁盘上的。索引为服务器提供了一种在表中查找特定数据行的方法，而不用搜索整个表。当必须要搜索整个表时，就称为表扫描。通常来说，您可能只希望获得表中数据的一个子集，因此全表扫描会浪费大量的磁盘 I/O，因此也就会浪费大量时间。当必须对数据进行连接时，这个问题就更加复杂了，因为必须要对连接两端的多行数据进行比较。

当然，表扫描并不总是会带来问题；有时读取整个表反而会比从中挑选出一部分数据更加有效（服务器进程中查询规划器用来作出这些决定）。如果索引的使用效率很低，或者根本就不能使用索引，则会减慢查询速度，而且随着服务器上的负载和表大小的增加，这个问题会变得更加显著。执行时间超过给定时间范围的查询就称为慢速查询。

您可以配置 mysqld 将这些慢速查询记录到适当命名的慢速查询日志中。管理员然后会查看这个日志来帮助他们确定应用程序中有哪些部分需要进一步调查。清单 1 给出了要启用慢速查询日志需要在 my.cnf 中所做的配置。

清单 1. 启用 MySQL 慢速查询日志



[mysqld]

; enable the slow query log, default 10 seconds

log-slow-queries

; log queries taking longer than 5 seconds

long_query_time = 5

; log queries that don’t use indexes even if they take less than long_query_time

; MySQL 4.1 and newer only

log-queries-not-using-indexes

这三个设置一起使用，可以记录执行时间超过 5 秒和没有使用索引的查询。请注意有关 log-queries-not-using-indexes 的警告：您必须使用 MySQL 4.1 或更高版本。慢速查询日志都保存在 MySQL 数据目录中，名为 hostname-slow.log。如果希望使用一个不同的名字或路径，可以在 my.cnf 中使用 log-slow-queries = /new/path/to/file 实现此目的。

阅读慢速查询日志最好是通过 mysqldumpslow 命令进行。指定日志文件的路径，就可以看到一个慢速查询的排序后的列表，并且还显示了它们在日志文件中出现的次数。一个非常有用的特性是 mysqldumpslow 在比较结果之前，会删除任何用户指定的数据，因此对同一个查询的不同调用被计为一次；这可以帮助找出需要工作量最多的查询。

对查询进行缓存

很多 LAMP 应用程序都严重依赖于数据库，但却会反复执行相同的查询。每次执行查询时，数据库都必须要执行相同的工作 —— 对查询进行分析，确定如何执行查询，从磁盘中加载信息，然后将结果返回给客户机。MySQL 有一个特性称为查询缓存，它将（后面会用到的）查询结果保存在内存中。在很多情况下，这会极大地提高性能。不过，问题是查询缓存在默认情况下是禁用的。

将 query_cache_size = 32M 添加到 /etc/my.conf 中可以启用 32MB 的查询缓存。

监视查询缓存

在启用查询缓存之后，重要的是要理解它是否得到了有效的使用。MySQL 有几个可以查看的变量，可以用来了解缓存中的情况。清单 2 给出了缓存的状态。

清单 2. 显示查询缓存的统计信息



mysql> SHOW STATUS LIKE ‘qcache%’;

+————————-+————+

| Variable_name           | Value      |

+————————-+————+

| Qcache_free_blocks      | 5216       |

| Qcache_free_memory      | 14640664   |

| Qcache_hits             | 2581646882 |

| Qcache_inserts          | 360210964  |

| Qcache_lowmem_prunes    | 281680433  |

| Qcache_not_cached       | 79740667   |

| Qcache_queries_in_cache | 16927      |

| Qcache_total_blocks     | 47042      |

+————————-+————+

8 rows in set (0.00 sec)

这些项的解释如表 1 所示。

表 1. MySQL 查询缓存变量

变量名	说明
`Qcache_free_blocks`	缓存中相邻内存块的个数。数目大说明可能有碎片。`FLUSH QUERY CACHE` 会对缓存中的碎片进行整理，从而得到一个空闲块。
`Qcache_free_memory`	缓存中的空闲内存。
`Qcache_hits`	每次查询在缓存中命中时就增大。
`Qcache_inserts`	每次插入一个查询时就增大。命中次数除以插入次数就是不中比率；用 1 减去这个值就是命中率。在上面这个例子中，大约有 87% 的查询都在缓存中命中。
`Qcache_lowmem_prunes`	缓存出现内存不足并且必须要进行清理以便为更多查询提供空间的次数。这个数字最好长时间来看；如果这个数字在不断增长，就表示可能碎片非常严重，或者内存很少。（上面的 `free_blocks` 和 `free_memory` 可以告诉您属于哪种情况）。
`Qcache_not_cached`	不适合进行缓存的查询的数量，通常是由于这些查询不是 `SELECT` 语句。
`Qcache_queries_in_cache`	当前缓存的查询（和响应）的数量。
`Qcache_total_blocks`	缓存中块的数量。

通常，间隔几秒显示这些变量就可以看出区别，这可以帮助确定缓存是否正在有效地使用。运行 FLUSH STATUS 可以重置一些计数器，如果服务器已经运行了一段时间，这会非常有帮助。

使用非常大的查询缓存，期望可以缓存所有东西，这种想法非常诱人。由于 mysqld 必须要对缓存进行维护，例如当内存变得很低时执行剪除，因此服务器可能会在试图管理缓存时而陷入困境。作为一条规则，如果 FLUSH QUERY CACHE 占用了很长时间，那就说明缓存太大了。

强制限制

您可以在 mysqld 中强制一些限制来确保系统负载不会导致资源耗尽的情况出现。清单 3 给出了 my.cnf 中与资源有关的一些重要设置。

清单 3. MySQL 资源设置



set-variable=max_connections=500

set-variable=wait_timeout=10

max_connect_errors = 100

连接最大个数是在第一行中进行管理的。与 Apache 中的 MaxClients 类似，其想法是确保只建立服务允许数目的连接。要确定服务器上目前建立过的最大连接数，请执行 SHOW STATUS LIKE ‘max_used_connections’。

第 2 行告诉 mysqld 终止所有空闲时间超过 10 秒的连接。在 LAMP 应用程序中，连接数据库的时间通常就是 Web 服务器处理请求所花费的时间。有时候，如果负载过重，连接会挂起，并且会占用连接表空间。如果有多个交互用户或使用了到数据库的持久连接，那么将这个值设低一点并不可取！

最后一行是一个安全的方法。如果一个主机在连接到服务器时有问题，并重试很多次后放弃，那么这个主机就会被锁定，直到 FLUSH HOSTS 之后才能运行。默认情况下，10 次失败就足以导致锁定了。将这个值修改为 100 会给服务器足够的时间来从问题中恢复。如果重试 100 次都无法建立连接，那么使用再高的值也不会有太多帮助，可能它根本就无法连接。

缓冲区和缓存

MySQL 支持超过 100 个的可调节设置；但是幸运的是，掌握少数几个就可以满足大部分需要。查找这些设置的正确值可以通过 SHOW STATUS 命令查看状态变量，从中可以确定 mysqld 的运作情况是否符合我们的预期。给缓冲区和缓存分配的内存不能超过系统中的现有内存，因此调优通常都需要进行一些妥协。

MySQL 可调节设置可以应用于整个 mysqld 进程，也可以应用于单个客户机会话。

服务器端的设置

每个表都可以表示为磁盘上的一个文件，必须先打开，后读取。为了加快从文件中读取数据的过程，mysqld 对这些打开文件进行了缓存，其最大数目由 /etc/mysqld.conf 中的 table_cache 指定。清单 4 给出了显示与打开表有关的活动的方式。

清单 4. 显示打开表的活动



mysql> SHOW STATUS LIKE ‘open%tables’;

+—————+——-+

| Variable_name | Value |

+—————+——-+

| Open_tables   | 5000  |

| Opened_tables | 195   |

+—————+——-+

2 rows in set (0.00 sec)

清单 4 说明目前有 5,000 个表是打开的，有 195 个表需要打开，因为现在缓存中已经没有可用文件描述符了（由于统计信息在前面已经清除了，因此可能会存在 5,000 个打开表中只有 195 个打开记录的情况）。如果 Opened_tables 随着重新运行 SHOW STATUS 命令快速增加，就说明缓存命中率不够。如果 Open_tables 比 table_cache 设置小很多，就说明该值太大了（不过有空间可以增长总不是什么坏事）。例如，使用 table_cache = 5000 可以调整表的缓存。

与表的缓存类似，对于线程来说也有一个缓存。 mysqld 在接收连接时会根据需要生成线程。在一个连接变化很快的繁忙服务器上，对线程进行缓存便于以后使用可以加快最初的连接。

清单 5 显示如何确定是否缓存了足够的线程。

清单 5. 显示线程使用统计信息



mysql> SHOW STATUS LIKE ‘threads%’;

+——————-+——–+

| Variable_name     | Value  |

+——————-+——–+

| Threads_cached    | 27     |

| Threads_connected | 15     |

| Threads_created   | 838610 |

| Threads_running   | 3      |

+——————-+——–+

4 rows in set (0.00 sec)

此处重要的值是 Threads_created，每次 mysqld 需要创建一个新线程时，这个值都会增加。如果这个数字在连续执行 SHOW STATUS 命令时快速增加，就应该尝试增大线程缓存。例如，可以在 my.cnf 中使用 thread_cache = 40 来实现此目的。

关键字缓冲区保存了 MyISAM 表的索引块。理想情况下，对于这些块的请求应该来自于内存，而不是来自于磁盘。清单 6 显示了如何确定有多少块是从磁盘中读取的，以及有多少块是从内存中读取的。

清单 6. 确定关键字效率



mysql> show status like ‘%key_read%’;

+——————-+———–+

| Variable_name     | Value     |

+——————-+———–+

| Key_read_requests | 163554268 |

| Key_reads         | 98247     |

+——————-+———–+

2 rows in set (0.00 sec)

Key_reads 代表命中磁盘的请求个数， Key_read_requests 是总数。命中磁盘的读请求数除以读请求总数就是不中比率 —— 在本例中每 1,000 个请求，大约有 0.6 个没有命中内存。如果每 1,000 个请求中命中磁盘的数目超过 1 个，就应该考虑增大关键字缓冲区了。例如，key_buffer = 384M 会将缓冲区设置为 384MB。

临时表可以在更高级的查询中使用，其中数据在进一步进行处理（例如 GROUP BY 字句）之前，都必须先保存到临时表中；理想情况下，在内存中创建临时表。但是如果临时表变得太大，就需要写入磁盘中。清单 7 给出了与临时表创建有关的统计信息。

清单 7. 确定临时表的使用



mysql> SHOW STATUS LIKE ‘created_tmp%’;

+————————-+——-+

| Variable_name           | Value |

+————————-+——-+

| Created_tmp_disk_tables | 30660 |

| Created_tmp_files       | 2     |

| Created_tmp_tables      | 32912 |

+————————-+——-+

3 rows in set (0.00 sec)

每次使用临时表都会增大 Created_tmp_tables；基于磁盘的表也会增大 Created_tmp_disk_tables。对于这个比率，并没有什么严格的规则，因为这依赖于所涉及的查询。长时间观察 Created_tmp_disk_tables 会显示所创建的磁盘表的比率，您可以确定设置的效率。 tmp_table_size 和 max_heap_table_size 都可以控制临时表的最大大小，因此请确保在 my.cnf 中对这两个值都进行了设置。

每个会话的设置

下面这些设置针对于每个会话。在设置这些数字时要十分谨慎，因为它们在乘以可能存在的连接数时候，这些选项表示大量的内存！您可以通过代码修改会话中的这些数字，或者在 my.cnf 中为所有会话修改这些设置。

当 MySQL 必须要进行排序时，就会在从磁盘上读取数据时分配一个排序缓冲区来存放这些数据行。如果要排序的数据太大，那么数据就必须保存到磁盘上的临时文件中，并再次进行排序。如果 sort_merge_passes 状态变量很大，这就指示了磁盘的活动情况。清单 8 给出了一些与排序相关的状态计数器信息。

清单 8. 显示排序统计信息



mysql> SHOW STATUS LIKE “sort%”;

+——————-+———+

| Variable_name     | Value   |

+——————-+———+

| Sort_merge_passes | 1       |

| Sort_range        | 79192   |

| Sort_rows         | 2066532 |

| Sort_scan         | 44006   |

+——————-+———+

4 rows in set (0.00 sec)

如果 sort_merge_passes 很大，就表示需要注意 sort_buffer_size。例如， sort_buffer_size = 4M 将排序缓冲区设置为 4MB。

MySQL 也会分配一些内存来读取表。理想情况下，索引提供了足够多的信息，可以只读入所需要的行，但是有时候查询（设计不佳或数据本性使然）需要读取表中大量数据。要理解这种行为，需要知道运行了多少个 SELECT 语句，以及需要读取表中的下一行数据的次数（而不是通过索引直接访问）。实现这种功能的命令如清单 9 所示。

清单 9. 确定表扫描比率



mysql> SHOW STATUS LIKE “com_select”;

+—————+——–+

| Variable_name | Value  |

+—————+——–+

| Com_select    | 318243 |

+—————+——–+

1 row in set (0.00 sec)
mysql> SHOW STATUS LIKE “handler_read_rnd_next”;

+———————–+———–+

| Variable_name         | Value     |

+———————–+———–+

| Handler_read_rnd_next | 165959471 |

+———————–+———–+

1 row in set (0.00 sec)

Handler_read_rnd_next / Com_select 得出了表扫描比率 —— 在本例中是 521:1。如果该值超过 4000，就应该查看 read_buffer_size，例如 read_buffer_size = 4M。如果这个数字超过了 8M，就应该与开发人员讨论一下对这些查询进行调优了！

3 个必不可少的工具

尽管在了解具体设置时，SHOW STATUS 命令会非常有用，但是您还需要一些工具来解释 mysqld 所提供的大量数据。我发现有 3 个工具是必不可少的；在参考资料一节中您可以找到相应的链接。

大部分系统管理员都非常熟悉 top 命令，它为任务所消耗的 CPU 和内存提供了一个不断更新的视图。 mytop 对 top 进行了仿真；它为所有连接上的客户机以及它们正在运行的查询提供了一个视图。mytop 还提供了一个有关关键字缓冲区和查询缓存效率的实时数据和历史数据，以及有关正在运行的查询的统计信息。这是一个很有用的工具，可以查看系统中（比如 10 秒钟之内）的状况，您可以获得有关服务器健康信息的视图，并显示导致问题的任何连接。

mysqlard 是一个连接到 MySQL 服务器上的守护程序，负责每 5 分钟搜集一次数据，并将它们存储到后台的一个 Round Robin Database 中。有一个 Web 页面会显示这些数据，例如表缓存的使用情况、关键字效率、连接上的客户机以及临时表的使用情况。尽管 mytop 提供了服务器健康信息的快照，但是 mysqlard 则提供了长期的健康信息。作为奖励，mysqlard 使用自己搜集到的一些信息针对如何对服务器进行调优给出一些建议。

搜集 SHOW STATUS 信息的另外一个工具是 mysqlreport。其报告要远比 mysqlard 更加复杂，因为需要对服务器的每个方面都进行分析。这是对服务器进行调优的一个非常好的工具，因为它对状态变量进行适当计算来帮助确定需要修正哪些问题。

结束语

本文介绍了对 MySQL 进行调优的一些基础知识，并对这个针对 LAMP 组件进行调优的 3 部分系列文章进行了总结。调优很大程度上需要理解组件的工作原理，确定它们是否正常工作，进行一些调整，并重新评测。每个组件 —— Linux、Apache、PHP 或 MySQL —— 都有各种各样的需求。分别理解各个组件可以帮助减少可能会导致应用程序速度变慢的瓶颈。

参考资料

学习

您可以参阅本文在 developerWorks 全球站点上的英文原文。
“从 MySQL 或 PostgreSQL 迁移到 DB2 Express-C”（developerWorks，2006 年 6 月）提供了一种从 MySQL 迁移到 DB2 Express-C 上的简单方法。
IBM 还为那些希望迁移到 DB2 Express-C 上的 MySQL 管理员提供了帮助，请参阅：“利用 MySQL 技能学习 DB2 Express: DB2 与 MySQL 的管理任务和基本任务”（developerWorks，2006 年 2 月）以及本系列文章的其他部分。
“在联邦数据库环境中使用 MySQL ”（developerWorks，2004 年 12 月）是有关从 WebSphere 中访问存储在 MySQL 中的数据的教程。IBM 确保 WebSphere® 软件可以很好地与 MySQL 结合使用。
SHOW VARIABLES 和 SHOW STATUS 在 MySQL 文档中都已经很好地进行了定义。
如果喜欢 blogs，MySQL Performance Blog、 Xaprb 以及 MySQL DBA 都非常值得阅读。
在 developerWorks 上 Architecture 专区中，可以找到提高架构设计领域方面技能所需要的一些资源。开发正确的架构是扩展 LAMP 应用程序的关键。
在 developerWorks Linux 专区中可找到适合于 Linux 开发人员的更多资源，包括 Linux 教程以及上月读者最喜欢的 Linux 文章和教程。
随时关注 developerWorks 技术事件和网络广播。

获得产品和技术

尽管已经出版了 3 年之久了， High Performance MySQL 仍然是非常有价值的一本书。作者也有一个 Web 站点介绍有关 MySQL 的各种文章。
mytop 告诉您目前 MySQL 服务器上都在进行什么操作，并提供一些关键的统计信息。在发现数据库有问题时，应该首先求助于这个程序。
mysqlard 会给出 MySQL 服务器一个关键性能指示器的图形表示，并给出一些调优建议。
mysqlreport 是一个必须的工具。它为您分析 SHOW STATUS 变量。
MySQL 文章如果没有提供到 phpMyAdmin 的链接，就说不上完整。尽管已经给出了对状态变量的一些解释，但是这个产品的强大之处在于如何简化管理任务。
定购 SEK for Linux，共包含两张 DVD，其中有用于 Linux 的最新 IBM 试用软件，包括 DB2®、Lotus®、Rational®、Tivoli® 和 WebSphere®。
利用可直接从 developerWorks 下载的 IBM 试用软件在 Linux 上构建您的下一个开发项目。

讨论

通过参与新的 developerWorks 空间中的开发者博客、论坛、podcast 和社区主题加入 developerWorks 社区。

Linux 技巧: 用 cron 和 at 调度作业

如何轻松地管理系统

2007 年 8 月 27 日

系统管理员需要在系统负载低的午夜运行作业，或者需要每天或每月运行作业，同时又不愿意牺牲睡眠时间或假期。调度任务的其他原因包括自动执行日常任务或者确保每次都以相同的方式处理任务。本文帮助您使用 cron 和 at 功能调度作业定期运行或在指定的时间运行一次。

Linux® 和 UNIX® 系统允许调度任务在以后执行一次，或者重复运行。本文是从 developerWorks 教程 “LPI 102 考试准备：管理任务” 摘录的，讲解如何调度作业定期运行，或在指定的时间运行一次。

在 Linux 系统上，许多管理任务必须频繁地定期执行。这些任务包括轮转日志文件以避免装满文件系统、备份数据和连接时间服务器来执行系统时间同步。上面提到的教程更详细地介绍了这些管理任务。在本文中，学习 Linux 中提供的调度机制，包括 cron 和 anacron 设施以及 crontab 和 at 命令。即使系统常常关机，anacron 也可以帮助调度作业。

以一定的时间间隔运行作业

以一定的时间间隔运行作业需要使用 cron 设施进行管理，它由 crond 守护进程和一组表（描述执行哪些操作和采用什么样的频率）组成。这个守护进程每分钟唤醒一次，并通过检查 crontab 判断需要做什么。用户使用 crontab 命令管理 crontab。crond 守护进程常常是在系统启动时由 init 进程启动的。

为了简单，假设希望定期运行清单 1 所示的命令。这个命令实际上只报告日期和时间，其他什么事都不做，但是它可以说明如何使用 crontab 设置 cron 作业，而且还可以通过输出看到作业运行的时间。设置 crontab 条目需要一个包含转义的 shell 元字符的字符串，所以适合于简单的命令和参数。在这个示例中，将从脚本 /home/ian/mycrontab.sh 运行 echo 命令，这个脚本不需要参数。这可以减少处理转义字符的工作。

清单 1. 一个简单的命令示例



[ian@lyrebird ~]$ cat mycrontest.sh

#!/bin/bash

 echo “It is now $(date +%T) on $(date +%A)”

[ian@lyrebird ~]$ ./mycrontest.sh

It is now 18:37:42 on Friday

创建 crontab

使用 crontab 命令和 -e（表示 “edit”）选项创建 crontab。这会打开 vi 编辑器，除非在 EDITOR 或 VISUAL 环境变量中指定了另一种编辑器。

每个 crontab 条目包含六个字段：

分钟
小时
日
月
星期
由 sh 执行的字符串

分钟和小时的范围分别是 0-59 和 0-12，日和月的范围分别是 1-31 和 1-12。星期的范围是 0-6，0 表示星期日。星期也可以指定为 sun、mon、tue 等等。第 6 个字段包含前 5 个字段之后的所有内容，它是要传递给 sh 的字符串。百分号（%）将转换为空行，所以如果要使用 % 或其他任何特殊字符，就要在前面加上反斜线（\）。第一个 % 之前的一行传递给 shell，这个 % 之后的所有行都作为标准输入传递。

各个与时间相关的字段可以指定一个单独的值、值的范围（比如 0-10 或 sun-wed）或者以逗号分隔的单独值和范围列表。清单 2 给出一个 crontab 条目示例。

清单 2. 一个简单的 crontab 示例



0,20,40 22-23 * 7 fri-sat /home/ian/mycrontest.sh

在这个示例中，我们的命令在 7 月的每个星期五和星期六晚上 10 点到午夜之间的第 0、20、40 分钟（每 20 分钟）执行。关于指定时间的其他方式的细节，参见 crontab(5) 的手册页。

输出

您可能想知道对来自命令的输出会如何处理。为使用 cron 而设计的大多数命令会使用 syslog 在日志中记录输出（参见教程 “LPI 102 考试准备：管理任务” 中的讨论）。但是，定向到 stdout 的输出会通过电子邮件发送给用户。清单 3 给出我们的命令示例可能产生的输出。

清单 3. 通过电子邮件发送的 cron 输出



From [email protected]  Fri Jul  6 23:00:02 2007

Date: Fri, 6 Jul 2007 23:00:01 -0400

From: [email protected] (Cron Daemon)

To: [email protected]

Subject: Cron <ian@lyrebird> /home/ian/mycrontest.sh

Content-Type: text/plain; charset=UTF-8

Auto-Submitted: auto-generated

X-Cron-Env: <SHELL=/bin/sh>

X-Cron-Env: <HOME=/home/ian>

X-Cron-Env: <PATH=/usr/bin:/bin>

X-Cron-Env: <LOGNAME=ian>

X-Cron-Env: <USER=ian>
It is now 23:00:01 on Friday

crontab 存储在哪里？

suid 程序

suid 程序以程序文件的所有者的权限运行，而不是采用运行程序的用户的权限。关于 suid 的更多信息，参见教程 “LPI 101 考试准备：设备、Linux 文件系统和 Filesystem Hierarchy Standard”；关于 passwd 命令的更多信息，参见教程 “LPI 102 考试准备：管理任务”。

用 crontab 命令创建的 crontab 存储在 /etc/spool/cron 下面的一个子目录中，这个子目录与创建 crontab 的用户同名，所以上面的 crontab 存储在 /etc/spool/cron/ian 中。因此，与 passwd 命令一样，crontab 命令是一个用根权限运行的 suid 程序。

/etc/crontab

除了 /var/spool/cron 中的用户 crontab 文件之外，cron 还会检查 /etc/crontab 文件和 /etc/cron.d 目录中的文件。在这些系统 crontab 中，在第五个时间字段（星期）和命令之间增加了一个字段。这个字段指定哪个用户应该运行这个命令，一般情况下是根用户。清单 4 给出一个 /etc/crontab 文件示例。

清单 4. /etc/crontab

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly

在这个示例中，真正的工作由 run-parts 命令执行，它运行 /etc/cron.hourly、/etc/cron.daily 等目录中的脚本；/etc/crontab 仅仅控制执行作业的时间。注意，这里的所有命令都作为根用户运行。还要注意，crontab 可以包含 shell 变量赋值，这些赋值会在运行命令之前执行。

anacron

cron 适合那些连续运行的系统。对于那些常常不开机的系统，比如笔记本计算机，可以使用另一个实用程序 anacron（表示 “anachronistic cron”）调度每日、每周或每月执行的作业。anacron 不处理每小时执行的作业。

anacron 在 /var/spool/anacron 中保留时间戳文件，记录作业运行的时间。当 anacron 运行时，它检查自作业上一次运行以来是否已经经过了所需的天数，如果需要，就运行作业。anacron 的作业表存储在 /etc/anacrontab 中，文件格式与 /etc/crontab 略有不同。与 /etc/crontab 一样，/etc/anacrontab 可以包含环境设置。每个作业有四个字段：

周期
延迟
作业标识符
命令

周期是天数，但是可以指定为 @monthly，这确保作业每个月只运行一次（无论这个月中有多少天）。延迟是在作业符合运行条件之后，到实际启动它之前等待的分钟数。可以使用这个设置防止在系统启动时集中执行作业。作业标识符可以包含除了斜线（/）之外的所有非空白字符。

/etc/crontab 和 /etc/anacrontab 都通过直接编辑进行更新。不使用 crontab 命令更新这些文件或 /etc/cron.d 目录中的文件。

在指定的时间运行作业

有时候，需要只运行作业一次而不是定期运行。为此，应该使用 at 命令。要运行的命令是从 -f 选项指定的文件读取的，如果没有使用 -f，那么从 stdin 读取。-m 选项向用户发送邮件，即使命令没有 stdout。-v 选项显示运行作业的时间。这个时间也显示在输出中。

清单 5 给出一个运行 mycrontest.sh 脚本的示例。清单 6 显示在运行作业之后通过邮件发送给用户的输出。注意，这里的输出比对应的 cron 作业输出要简单一些。

清单 5. 使用 at 命令



[ian@lyrebird ~]$ at -f mycrontest.sh -v 10:25

Sat Jul  7 10:25:00 2007
job 5 at Sat Jul  7 10:25:00 2007

清单 6. 来自 at 的作业输出



From [email protected]  Sat Jul  7 10:25:00 2007

Date: Sat, 7 Jul 2007 10:25:00 -0400

From: Ian Shields <[email protected]>

Subject: Output from your job        5

To: [email protected]
It is now 10:25:00 on Saturday

时间的设置可以非常复杂。清单 7 给出几个示例。参见 at 的手册页、/usr/share/doc/at/timespec 文件或 /usr/share/doc/at-3.1.10/timespec 这样的文件（这个示例中的 3.1.10 是 at 包的版本号）。

清单 7. at 命令使用的时间值



[ian@lyrebird ~]$ at -f mycrontest.sh  10pm tomorrow

job 14 at Sun Jul  8 22:00:00 2007

[ian@lyrebird ~]$ at -f mycrontest.sh 2:00 tuesday

job 15 at Tue Jul 10 02:00:00 2007

[ian@lyrebird ~]$ at -f mycrontest.sh 2:00 july 11

job 16 at Wed Jul 11 02:00:00 2007

[ian@lyrebird ~]$ at -f mycrontest.sh 2:00 next week

job 17 at Sat Jul 14 02:00:00 2007

nice 值

nice 值表示一个作业对于其他用户的优先程度。关于 nice 和 renice 命令的更多信息，参见教程 “LPI 101 考试准备：GNU 和 UNIX 命令”。

at 命令还有一个 -q 选项。随着队列的增长，作业的 nice 值也会增长。还有一个 batch 命令，它与 at 命令相似，但是作业只在系统负载足够低时运行。这些特性的细节参见手册页。

管理调度的作业

列出调度的作业

可以管理 cron 和 at 作业。使用 crontab 命令和 -l 选项列出 crontab，使用 atq 命令显示用 at 命令加入队列中的作业，见清单 8。

清单 8. 显示调度的作业



[ian@lyrebird ~]$ crontab -l

0,20,40 22-23 * 7 fri-sat /home/ian/mycrontest.sh

[ian@lyrebird ~]$ atq

16      Wed Jul 11 02:00:00 2007 a ian

17      Sat Jul 14 02:00:00 2007 a ian

14      Sun Jul  8 22:00:00 2007 a ian

15      Tue Jul 10 02:00:00 2007 a ian

如果希望查看 at 调度执行的实际命令，那么可以使用 at 命令并加上 -c 选项和作业号。您会注意到，在发出 at 命令时生效的大多数环境设置会随调度的作业一起保存。清单 9 给出清单 7 和清单 8 中作业 15 的部分输出。

清单 9. 使用 at -c 并加上作业号



#!/bin/sh

# atrun uid=500 gid=500

# mail ian 0

umask 2

HOSTNAME=lyrebird.raleigh.ibm.com; export HOSTNAME

SHELL=/bin/bash; export SHELL

HISTSIZE=1000; export HISTSIZE

SSH_CLIENT=9.67.219.151\ 3210\ 22; export SSH_CLIENT

SSH_TTY=/dev/pts/5; export SSH_TTY

USER=ian; export USER

 …

HOME=/home/ian; export HOME

LOGNAME=ian; export LOGNAME

 …

cd /home/ian || {

         echo ‘Execution directory inaccessible’ >&2

         exit 1

}

${SHELL:-/bin/sh} << `(dd if=/dev/urandom count=200 bs=1 \

   2>/dev/null|LC_ALL=C tr -d -c ‘[:alnum:]’)`
#!/bin/bash

 echo “It is now $(date +%T) on $(date +%A)”

注意，我们脚本文件的内容已经复制在一个 here-document 中，这个 here-document 将由 SHELL 变量指定的 shell 执行（如果没有设置 SHELL 变量，就使用 /bin/sh）。关于 here-document 的信息参见教程 “LPI 101 考试准备，主题 103：GNU 和 UNIX 命令”。

删除调度的作业

可以使用 cron 命令和 -r 选项删除所有调度的 cron 作业，见清单 10。

清单 10. 显示并删除 cron 作业



[ian@lyrebird ~]$ crontab -l

0,20,40 22-23 * 7 fri-sat /home/ian/mycrontest.sh

[ian@lyrebird ~]$ crontab -r

[ian@lyrebird ~]$ crontab -l

no crontab for ian

要删除系统 cron 或 anacron 作业，应该编辑 /etc/crontab、/etc/anacrontab 或者编辑或删除 /etc/cron.d 目录中的文件。

可以使用 atrm 命令加作业号删除用 at 命令调度的一个或多个作业。多个作业应该用空白分隔。清单 11 给出一个示例。

清单 11. 用 atq 和 atrm 显示并删除作业



[ian@lyrebird ~]$ atq

16      Wed Jul 11 02:00:00 2007 a ian

17      Sat Jul 14 02:00:00 2007 a ian

14      Sun Jul  8 22:00:00 2007 a ian

15      Tue Jul 10 02:00:00 2007 a ian

[ian@lyrebird ~]$ atrm 16 14 15

[ian@lyrebird ~]$ atq

17      Sat Jul 14 02:00:00 2007 a ian

配置用户对作业调度的访问

如果文件 /etc/cron.allow 存在，那么非根用户必须在其中列出，才能使用 crontab 和 cron 设施。如果 /etc/cron.allow 不存在，但是 /etc/cron.deny 存在，那么其中列出的非根用户不能使用 crontab 或 cron 设施。如果这两个文件都不存在，那么只允许超级用户使用这个命令。空的 /etc/cron.deny 文件允许所有用户使用 cron 设施，这是默认情况。

/etc/at.allow 和 /etc/at.deny 文件对 at 设施起相似的作用。

了解更多信息

如果想了解关于 Linux 管理任务的更多信息，请阅读教程 “LPI 102 考试准备：管理任务”，或者参见下面的参考资料。不要忘记 rate 这个页面。

参考资料

学习

您可以参阅本文在 developerWorks 全球站点上的英文原文。
回顾教程 “LPI 102 考试准备：管理任务”（developerWorks，2007 年 7 月），了解关于 Linux 上其他管理任务的信息，包括用户管理、备份、系统日志和 Network Time Protocol。它是 LPI 考试准备教程系列的一部分，这个系列覆盖 Linux 基础知识，帮助您准备系统管理员认证。本教程还引用了这个系列中的两个教程 “LPI 101 考试准备：GNU 和 UNIX 命令” 和 “LPI 101 考试准备：设备、Linux 文件系统和 Filesystem Hierarchy Standard”。
Linux Documentation Project 有很多非常有用的文档，尤其是其中的 HOWTO 指导。
在 developerWorks Linux 专区中可以找到为 Linux 开发人员准备的更多教程，包括 Linux 教程，以及最近几个月读者最喜爱的 Linux 文章和教程。
随时关注 developerWorks 技术活动和网络广播。

获得产品和技术

索取 SEK for Linux，这套 DVD（两张）包含来自 DB2®、Lotus®、Rational®、Tivoli® 和 WebSphere® 的最新 IBM 试用版 Linux 软件。
使用 IBM 试用软件构建您的下一个 Linux 开发项目，这些软件可以从 developerWorks 直接下载。

讨论

参与论坛讨论。
通过新的 developerWorks spaces 中的开发人员 blog、论坛、podcast 和社区主题，加入 developerWorks 社区。

关于作者


		Ian Shields 为 developerWorks Linux 专区的许多 Linux 项目工作。他是 IBM 北卡罗莱那州 Research Triangle Park 的一名高级程序员。他于 1973 年作为一名系统工程师加入 IBM 位于澳大利亚堪培拉的子公司。之后，在加拿大蒙特利尔和北卡罗莱那州 RTP 从事通信系统和普及运算。他拥有多项专利并发表了若干篇论文。他毕业于 Australian National University，本科学位是纯数学和哲学。他拥有北卡罗来纳州立大学的计算机硕士和博士学位。您可以通过 [email protected] 与 Ian 联系。