EclipseCrossword填字遊戲-打發時間的好幫手!
軟體:EclipseCrossword(版本:1.2.54)
類別:益智遊戲
性質:Freeware(360 K)
【編輯/lillian】
EclipseCrossword是一套免費的填字遊戲產生軟體,但是功能卻一點也不陽春喔!精簡的安裝過程與操作介面,只要輸入你的字串、選擇字格數,就可以自動幫你產生好一個完美的填字遊戲。
更棒的是,他還支援輸出成互動網頁喔!經由原公司提供的Java applet與scripting,你就可以輕易的與網頁結合。讓你的網站更豐富與多元。此外,EclipseCrossword也很適合提供編輯人員使用,豐富的排版樣式也省去編輯人員不少時間。
年假不知道該去哪玩嗎?找個三五好友,各自設計一個填字遊戲互相挑戰吧!
下載:
Month: October 2007
Perl Regular Expression Tutorial
Contents
The period (.) is a commonly used metacharacter. It matches exactly one character, regardless of what the character is. For example, the regular expression: But what if you wanted to search for a string containing a period? For example, suppose we wished to search for references to pi. The following regular expression would not work: To get around this, we introduce a second metacharacter, the backslash (\). The backslash can be used to indicate that the character immediately to its right is to be taken literally. Thus, to search for the string “3.14”, we would use: (Unfortunately, the backslash is used for other things besides quoting metacharacters. Many “normal” characters take on special meanings when preceded by a backslash. The rule of thumb is, quoting a metacharacter turns it into a normal character, and quoting a normal character may turn it into a metacharacter.) Let’s look at some more common metacharacters. We consider first the question mark (?). The question mark indicates that the character immediately preceding it either zero times or one time. Thus Another metacharacter is the star (*). This indicates that the character immediately to its left may be repeated any number of times, including zero. Thus The plus (+) metacharacter indicates that the character immediately preceding it may be repeated one or more times. It is just like the star metacharacter, except it doesn’t match the null string. Thus Metacharacters may be combined. A common combination includes the period and star metacharacters, with the star immediately following the period. This is used to match an arbitrary string of any length, including the null string. For example: If you wanted to search for articles on cyclodecane and cyclohexane, but didn’t want to match articles about how cyclones drive one insane, you could string together three periods, as follows: Here are some more examples. These involve the backslash. Note that the placement of backslash is important. Matches any string starting with an “a”, followed by one arbitrary character, and terminated with “*z”. Thus, “ag*z”, “a5*z” and “a@*z” are all matched. Only strings of length four, where the first character is “a”, the third “*”, and the fourth “z”, are matched. The letter “d” in the string “\d” must be lower-case. This is because there is another metacharacter, the non-digit metacharacter, which uses the uppercase “D”. The non-digit metacharacter looks like “\D” and matches any character except a digit. Thus, Notice that in changing the “d” from lower-case to upper-case, we have reversed the meaning of the digit metacharacter. This holds true for most other metacharacters of the format backslash-letter. There are three other metacharacters in the backslash-letter format. The first is the word metacharacter, which matches exactly one letter, one number, or the underscore character ( The whitespace metacharacter matches exactly one character of whitespace. (Whitespace is defined as spaces, tabs, newlines, or any character which would not use ink if printed on a printer.) The whitespace metacharacter looks like this: “\s“. It’s opposite, which matches any character that is not whitespace, looks like this: “\S“. Thus, Note that the underscore ( There is one other metacharacter starting with a backslash, the octal metacharacter. The octal metacharacter looks like this: “\nnn“, where “n” is a number from zero to seven. This is used for specifying control characters that have no typed equivalent. For example, There are three other metacharacters that may be of use. The first is the braces metacharacter. This metacharacter follows a normal character and contains two number separated by a comma (,) and surrounded by braces ({}). It is like the star metacharacter, except the length of the string it matches must be within the minimum and maximum length specified by the two numbers in braces. Thus, The alternative metacharacter is represented by a vertical bar (|). It indicates an either/or behavior by separating two or more possible choices. For example: If you wish to include a dash within brackets as one of the characters to match, instead of to denote a range, put the dash immediately before the right bracket. Thus: The bracket metacharacter can also be inverted by placing a caret (^) immediately after the left bracket. Thus, Note that within brackets, ordinary quoting rules do not apply and other metacharacters are not available. The only characters that can be quoted in brackets are “
Overview
Simple Regular Expressions
gauss
carbon
hydro
oxy
top ten
Metacharacters
2,.-Dimethylbutane
3.14 (THIS IS WRONG!)
3\.14 (This will work.)
m?ethane
comm?a
ab*c
ab+c
cyclo.*ane
cyclo…ane
a\.*z
a.\*z
a\++z
a\+\+z
a+\+z
a.?e
a\.?e
a.\?e
a\.\?e
2,\d-Dimethylbutane
1\.\d\d\d\d\d
a\d+z
a\Dz
\D+
_
). It is written as “\w“. It’s opposite, “\W“, matches any one character except a letter, a number or the underscore. Thus, a\wz
a\Wz
a\sz
a\Sz
\bcomput
\Bcomput
_
) is considered a “word” character. Thus, super\bcomputer
\007
ab{3,5}c
.{3,5}pentane
isopentane|cyclopentane
\s[cmt]an\s
2,[23]-dimethylbutane
a[a-d]z
textfile0[3-5]
a[1234-]z
a[1-4-]z
textfile0[^02468]
\W[^f-h]ood\W
[
“, “]
“, and “\
“. Thus, [\[\\\]]abc
Forbidden Characters
Things To Remember
mopac
and Mopac
and MOPAC
all search for the same set of strings. Each will match “mopac”, “MOPAC”, “Mopac”, “mopaC”, “MoPaC”, “mOpAc” and so forth. Thus you need not worry about capitalization. (Note, however, that metacharacter must still have the proper case. This is especially important for metacharacters whose case determines whether their meaning is reversed or not.)
TestRegex regexp
package net.strong.util;
import org.apache.oro.text.regex.MatchResult;
import org.apache.oro.text.regex.Pattern;
import org.apache.oro.text.regex.PatternCompiler;
import org.apache.oro.text.regex.PatternMatcher;
import org.apache.oro.text.regex.PatternMatcherInput;
import org.apache.oro.text.regex.Perl5Compiler;
import org.apache.oro.text.regex.Perl5Matcher;
import org.apache.oro.text.regex.Perl5Substitution;
import org.apache.oro.text.regex.Util;
public class TestRegex {
public TestRegex() {}
public void parseLog() throws Exception {
String log1=“172.26.155.241 – – [26/Feb/2001:10:56:03 -0500] \”GET /IsAlive.htm HTTP/1.0\” 200 15 “;
String regexp=“(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s-\\s-\\s\\[([^\\]]+)\\]”;
PatternCompiler compiler=new Perl5Compiler();
Pattern pattern=compiler.compile(regexp);
PatternMatcher matcher=new Perl5Matcher();
if (matcher.contains(log1,pattern)) {
MatchResult result=matcher.getMatch();
System.out.println(“IP: “+result.group(1));
System.out.println(“Timestamp: “+result.group(2));
}
}
public void parseHTML() throws Exception {
String html=“<font face=\”Arial, Serif\” size=\”+1\”color=\”red\”>”;
String regexpForFontTag=“<\\s*font\\s+([^>]*)\\s*>”;
String regexpForFontAttrib=“([a-z]+)\\s*=\\s*\”([^\”]+)\””;
PatternCompiler compiler=new Perl5Compiler();
Pattern patternForFontTag=compiler.compile(regexpForFontTag,Perl5Compiler.CASE_INSENSITIVE_MASK);
Pattern patternForFontAttrib=compiler.compile(regexpForFontAttrib,Perl5Compiler.CASE_INSENSITIVE_MASK);
PatternMatcher matcher=new Perl5Matcher();
if (matcher.contains(html,patternForFontTag)) {
MatchResult result=matcher.getMatch();
String attrib=result.group(1);
PatternMatcherInput input=new PatternMatcherInput(attrib);
while (matcher.contains(input,patternForFontAttrib)) {
result=matcher.getMatch();
System.out.println(result.group(1)+“: “+result.group(2));
}
}
}
public void substitutelink() throws Exception {
String link=“<a href=\”http://widgets.acme.com/interface.html#How_To_Trade\”>”;
String regexpForLink=“<\\s*a\\s+href\\s*=\\s*\”http://widgets.acme.com/interface.html#([^\”]+)\”>”;
PatternCompiler compiler=new Perl5Compiler();
Pattern patternForLink=compiler.compile(regexpForLink,Perl5Compiler.CASE_INSENSITIVE_MASK);
PatternMatcher matcher=new Perl5Matcher();
String result=Util.substitute(matcher,
patternForLink,
new Perl5Substitution(“<a href=\”http://newserver.acme.com/interface.html#$1\”>”),
link,
Util.SUBSTITUTE_ALL);
System.out.println(result);
}
public static void main(String[] args) throws Exception {
TestRegex test=new TestRegex();
System.out.println(“\n\nLog Parsing Example”);
test.parseLog();
System.out.println(“\n\nHtml Example 1”);
test.parseHTML();
System.out.println(“\n\nHtml Example 2”);
test.substitutelink();
}
}
PERL5 Regular Expression Description
Why is Perl so useful for sysadmin and WWW and text hacking? It has a lot of nice little features that make it easy to do nearly anything you want to text. A lot of perl programs look like a weird synergy of C and shell and sed and awk. For example:
#!/usr/bin/perl
# manpath — tchrist@perl.com
foreach $bindir (split(/:/, $ENV{PATH})) {
($mandir = $bindir) =~ s/[^\/]+$/man/;
next if $mandir =~ /^\./ || $mandir eq ”;
if (-d $mandir && ! $seen{$mandir}++ ) {
($dev,$ino) = stat($mandir);
if (! $seen{$dev,$ino}++) {
push(@manpath,$mandir);
}
}
}
print join(“:”, @manpath), “\n”;
Can anyone see what that does? I’d like to think it’s not too hard, even devoid of commentary. It does have some naughty bits, like using side effect operators of assignment operators as expressions and double-plus postfix autoincrement. C programmers don’t have a problem with it, but a lot of others do. That’s why Guido banned such things in Python (a rather nice language in many ways), and why I don’t advocate using them to non-C programmers, whom it generally confuses whether it be done in C or in Perl or C++ or any such language.
By far the most bizarre thing is that dread punctuation lying within funny slashes. Often folks call Perl unreadable because they don’t grok regexps, which all true perl wizards — and acolytes — adore. The slashes and their patterns govern matching and splitting and substituting, and here is where a lot of the Perl magic resides: its unmatched 🙂 regular expressions. Certainly the above code could be rewritten in tcl or python or nearly anything else. It could even be rewritten in more legible perl. 🙂
So what’s so special about perl’s regexps? Quite a bit, actually, although the real magic isn’t demonstrated very well in the manpath program. Once you’ve read the perlre(1) and the perlop(1) man pages, there’s still a lot to talk about. So permit me, if you would, to now explain Far More Than Everything You Ever Wanted to Know about Perl Regular Expressions… 🙂
Perl starts with POSIX regexps of the “modern” variety, that is, egrep style not grep style. Here’s the simple case of matching a number
/Th?om(as)? (Ch|K)rist(ia|e)ns{1,2}s[eo]n/
This avoids a lot of backslashes. I believe many languages also support such regular rexpressions.
Now, Perl’s regexps “aren’t” — that is, they aren’t “regular” because backreferences per sed and grep are also supported, which renders the language no longer strictly regular and so forbids “pure” DFA implementations.
But this is exceedingly useful. Backreferences let you refer back to match part of what you just had. Consider lines like these:
1. This is a fine kettle of fish.
2. The best card is Island Fish Jasconius.
3. Is isolation unpleasant?
4. That’s his isn’t it?
5. Is is outlawed?
If you’d like to pick up duplicate “is” strings there, you could use the pattern
/(is) \1/ # matches 1,4
As written, that will match sentences 1 and 4. The others fail due to mixed case. You can’t fix it just by saying
/([Ii]s) \1/ # still matches 1,4
because the \1 refers back to the real match, not the potential match. So what do we do? Well, POSIX specifies a REG_ICASE flag you can pass into your matcher to help support “grep -i” etc. To get perl to do this, affix an i flag after the match:
/(is) \1/i # matches 1,2,3,4,5
And now all 5 of those sentences match. If you only wanted them to match legit words, you might use the \b notation for word boundaries, making it
/\b(is) \1/i # matches 2,3,5
/(is) \1\b/i # matches 1,5
/\b(is) \1\b/i # matches 5
This means you will see Perl code like
if ( $variable =~ /\b(is) \1\b/i ) {
print “gotta match”;
}
One might argue that is “should” be written more like
if ( rematch(variable, ‘\b(is) \1\b’, ‘i’) ) {
print “gotta match”;
}
but that’s not how Perl works. I suspect that other languages could make it work that way.
If you’d like to know where you matched, you might want to use these:
$MATCH full match
$PREMATCH before the match
$POSTMATCH after the match
$LAST_PAREN_MATCH useful for alternatives
Although the most normal case is just to use $1, $2, etc, which match the first, second, etc parenthesized subexpressions.
Another nice thing that Perl supports are the notions from C’s ctype.h include file:
C function Perl regexp
isalnum \w
isspace \s
isdigit \d
That means that you don’t have to hard-code [A-Z] and have it break when someone has some interesting locale settings. For example, under charset=ISO-8859-1, something like “façade” properly matches /^\w+$/, because the c-cedille is considered an alphanum. In theory, LC_NUMERIC settings should also take, but I’ve never tried.
This quickly leads to a pattern that detects duplicate words in sentences:
/\b(\w+)(\s+\1)+\b/i
In fact, that one matches multiple duplicates as well. If if if you read in your input data a paragraph at a time, it will catch dups crossing line boundaries as as well. For example, using some convenient command line flags, here’s a
perl -00 -ne ‘if ( /\b(\w+)(\s+\1)+\b/i ) { print “dup $1 at $.\n” }’
which when used on this article says:
dup Is at 10
dup If at 33
the $. variable ($NR in English mode) is the record number. I set it to read paragraph records, so paragraphs 10 and 33 of this posting contain duplicate words.
Actually, we can do something a bit nicer: we can find multiple duplicates in the same paragraph. The /g flag causes a match to store a bit of state and start up where it last left off. This gives us:
#!/usr/bin/perl -00 -n
while ( /\b(\w+)(\s+\1)+\b/gi ) {
print “dup $1 at paragraph $.\n”;
}
This now yields:
dup Is at paragraph 10
dup if at paragraph 33
dup as at paragraph 33
Of course, we’re getting a bit hard to read here. So let’s use the /x flag to permit embedded white space and comments in our pattern — you’ll want 5.002 for this (the white space worked in 5.000, but the comments were added later :-). For legibility, instead of slashes for the match, I’ll embrace the real m() function, Since /foo/ and m(foo) and m{foo} are all equivalent.
#!/usr/bin/perl -n
require 5.002;
use English;
$RS = ”;
while (
m{ # m{foo} is like /foo/, but helps vi’s % key
\b # first find a word boundary
(\w+) # followed by the biggest word we can find
# which we’ll save in the \1 buffer
(
\s+ # now have some white space following it
\1 # and the word itself
)+ # repeat the space+word combo ad libitum
\b # make sure there’s a boundary at the end too
}xgi # /x for space/comment-expanded patterns
# /g for global matching
# /i for case-insensitive matching
)
{
print “dup $1 at paragraph $NR\n”;
}
While it’s true that someone who doesn’t know regular expressions won’t be able to read this at first glance, this is not a problem. So even though we can build up rather complex patterns, we can format and comment them nicely, preserving understandability. I wonder why no one else has done this in their regexp libraries?
I actually wrote a sublegible version of this many years ago. It runs even on ancient versions of Perl. I’d probably to that a bit differently these days — my coding style has certainly matured. It violates several of my own current style guidelines.
#!/usr/bin/perl
undef $/; $* = 1;
while ( $ARGV = shift ) {
if (!open ARGV) { warn “$ARGV: $!\n”; next; }
$_ = <ARGV>$$
s/\b(\s?)(([A-Za-z]\w*)(\s+\3)+\b)/$1\200$2\200/gi || next;
split(/\n/);
$NR = 0;
@hits = ();
for (@_) {
$NR++;
push(@hits, sprintf(“%5d %s”, $NR, $_)) if /\200/;
}
$_ = join(“\n”,@hits);
s/\200([^\200]+)\200/[* $1 *]/g;
print “$ARGV:\n$_\n”;
}
here’s that will output when run on this article up to this current point:
51 5. [* Is is *] outlawed?
124 In fact, that one matches multiple duplicates as well. [* If
125 if if *] you read in your input data a paragraph at a time, it will
126 catch dups crossing line boundaries [* as as *] well. For example, using
Which is pretty neat.
Speaking of ctype.h macros, Perl borrows the vi notation of case translation via \u, \l, \U, and \L. So you could say
$variable = “façade niño coöperate molière renée naïve hæmo tschüß”;
and then do a
$variable =~ s/(\w+)/\U$1/g;
and it would come out
FAÇADE NIÑO COÖPERATE MOLIÈRE RENÉE NAÏVE HÆMO TSCHÜß
Oh well. My clib doesn’t know to turn ß -> SS. That’s a harder issue.
This is much better than writing things like
$variable =~ tr[a-z][A-Z];
because that would give you:
FAçADE NIñO COöPERATE MOLIèRE RENéE NAïVE HæMO TSCHüß
which isn’t right at all.
Actually, perl can beat vi and do this:
$variable =~ s/(\w+)/\u\L$1/g;
Yielding:
Façade Niño Coöperate Molière Renée Naïve Hæmo Tschüß
which is somewhat interesting.
Speaking of substitutes, we can use a /e flag on the substitute to get the RHS to evaluate to code instead of just a string. Consider:
s/(\d+)/8 * $1/ge; # multiple all numbers by 8
s/(\d+)/sprintf(“%x”, $1)/ge; # convert them to hex
This is nice when renumbering paragraphs. I often write
s/^(\d+)/1 + $1/
or from within vi, just
%!perl -pe ‘s/^(\d+)/1 + $1/’
Here’s a more elaborate example of this. If you wanted to expand %d or %s or whatnot, you might just do
s/%(.)/$percent{$1}/g;
given a %percent definition like this:
%percent = (
‘d’ => ‘digit’,
‘s’ => ‘string’,
);
But in fact, that’s got quite enough. You might well want to call a function, like
s/%(.)/unpercent($1)/ge;
(assuming you have an unpercent() function defined.)
You can even use /ee for a double-eval, but that seems going overboard in most cases. It is, however, nice for converting embedded variables like $foo or whatever in text into their values. This way a sentence with $HOME and $TERM in it, assuming there were valid variables, might become a sentence with /home/tchrist and xterm in it. Just do this:
s/(\$\w+)/$1/eeg;
Ok, what more can we do with perl patterns? split takes a pattern. Imagine that you have a record stored in plain text as blank line separated paragraphs with FIELD: VALUE pairs on each line.
field: value here
somefield: some value here
morefield: other value here
field: second record’s value here
somefield: some value here
morefield: other value here
newfield: other funny stuff
You could process that this way. We’ll put it into key value pairs in a hash, just as though it had been initialized as
%hash = (
‘field’ => ‘value here’,
‘somefield’ => ‘some value here’,
‘morefield’ => ‘other value here’,
);
I’ll use a few command line switches for short cuts:
#!/usr/bin/perl -00n
%hash = split( /^([^:]+):\s*/m );
if ( $hash{“somefield”} =~ /here/) {
print “record $. has here in somefield\n”;
}
The /m flag governs whether ^ can match internally. I believe this is the POSIX value REG_NEWLINE. Normally perl does not have ^ match anywhere but the beginning of the string. (
Or you could eschew shortcuts and write:
#!/usr/bin/perl
use English;
$RS = ”;
while ( $line = <ARGV> ) {
%hash = split(/^([^:]+):\s*/m, $line);
if ( $hash{“somefield”} =~ /here/) {
print “record $NR has here in somefield\n”;
}
}
Actually, in the current version of perl, you can use the getline() object method on the predefined ARGV file handle object:
#!/usr/bin/perl
use English;
$RS = ”;
while ( $line = ARGV->getline() ) {
%hash = split(/^([^:]+):\s*/m, $line);
if ( $hash{“somefield”} =~ /here/) {
print “record $NR has here in somefield\n”;
}
}
This can be especially convenient for handling mail messages.
Here, for example, is a bair-bones mail-sorting program:
#!/usr/bin/perl -00
while (<>) {
if ( /^From / ) {
($id) = /^Message-ID:\s*(.*)/mi;
$sub{$id} = /^Subject:\s*(Re:\s*)*(.*)/mi
? uc($2)
: $id;
}
$msg{$id} .= $_;
}
print @msg{ sort { $sub{$a} cmp $sub{$b} } keys %msg};
Now, I still haven’t mentioned a couple of features which are to my mind critical in any analysis of the strengths of Perl’s pattern matching. These are stingy matching and lookaheads.
Stingy matching solves the problem greedy matching. A greedy match picks up everything, as in:
$line = “The food is under the bar in the barn.”;
if ( $line =~ /foo(.*)bar/ ) {
print “got <$1>\n”;
}
That prints out
<d is under the bar in the >
Which is often not what you want. Instead, we can add an extra ? after a repetition operator to render it stingy instead of greedy.
if ( $line =~ /foo(.*?)bar/ ) {
print “got <$1>\n”;
}
That prints out
got <d is under the >
which is often more what folks want. It turns out that having both stringy and greedy repetition operators in no way compromises a regexp engines regularity, nor is it particularly hard to implement. This comes up in matching quoted things. You can do tricks like using [^:] or [^”] or [^”‘] for the simple cases, but negating multicharacter strings is hard. You can just use stingy matching instead.
Or you could just use lookaheads.
This is other important aspect of perl matching I wanted to mention. These are 0-width assertions that state that what follows must match or must not match a particular thing. These are phrased as either (?=pattern) for the assertion or (?!pattern) for the negation.
/\bfoo(?!bar)\w+/
That will match “foostuff” but not “foo” or “foobar”, because I said there must be some alphanums after the word foo, but these may not begin with bar.
Why would you need this? Oh, there are lots of times. Imagine splitting on newlines that are not followed by a space or a tab:
@list_of_results = split ( /\n(?![\t ])/, $data );
Let’s put this all together and look at a couple of examples. Both have to do with HTML munging, the current rage. First, let’s solve the problem of detecting URLs in plaintext and highlighting them properly. A problem is if the URL has trailling punctuation, like ftp://host/path.file. Is that last dot supposed to be in the URL? We can probably just assume that a trailing dot doesn’t count, but even so, most scanners seem to get this wrong. Here’s a different approach:
#!/usr/bin/perl
# urlify — tchrist@perl.com
require 5.002; # well, or 5.000 if you see below
$urls = ‘(‘ . join (‘|’, qw{
http
telnet
gopher
file
wais
ftp
} )
. ‘)’;
$ltrs = ‘\w’;
$gunk = ‘/#~:.?+=&%@!\-‘;
$punc = ‘.:?\-‘;
$any = “${ltrs}${gunk}${punc}”;
while (<>) {
## use this if early-ish perl5 (pre 5.002)
## s{\b(${urls}:[$any]+?)(?=[$punc]*[^$any]|\Z)}{<A HREF=”$1″>$1</A>}goi;
## otherwise use this — it just has 5.002ish comments
s{
\b # start at word boundary
( # begin $1 {
$urls : # need resource and a colon
[$any] +? # followed by on or more
# of any valid character, but
# be conservative and take only
# what you need to….
) # end $1 }
(?= # look-ahead non-consumptive assertion
[$punc]* # either 0 or more puntuation
[^$any] # followed by a non-url char
| # or else
$ # then end of the string
)
}{<A HREF=”$1″>$1</A>}igox;
print;
}
Pretty nifty, eh? 🙂
Here’s another HTML thing: we have an html document, and we want to remove all of its embedded markup text. This requires three steps:
1) Strip <!– html comments –>
2) Strip <TAGS>
3) Convert &entities; into what they should be.
This is complicated by the horrible specs on how html comments work: they can have embedded tags in them. So you have to be way more careful. But it still only takes three substitutions. 🙂 I’ll use the /s flag to make sure that my “.” can stretch to match a newline as well (normally it doesn’t).
#!/usr/bin/perl -p0777
#
#########################################################
# striphtml (“striff tummel”)
# tchrist@perl.com
# version 1.0: Thu 01 Feb 1996 1:53:31pm MST
# version 1.1: Sat Feb 3 06:23:50 MST 1996
# (fix up comments in annoying places)
#########################################################
#
# how to strip out html comments and tags and transform
# entities in just three — count ’em three — substitutions;
# sed and awk eat your heart out. 🙂
#
# as always, translations from this nacré rendition into
# more characteristically marine, herpetoid, titillative,
# or indonesian idioms are welcome for the furthering of
# comparitive cyberlinguistic studies.
#
#########################################################
require 5.001; # for nifty embedded regexp comments
#########################################################
# first we’ll shoot all the <!– comments –>
#########################################################
s{ <! # comments begin with a `<!’
# followed by 0 or more comments;
(.*?) # this is actually to eat up comments in non
# random places
( # not suppose to have any white space here
# just a quick start;
— # each comment starts with a `–‘
.*? # and includes all text up to and including
— # the *next* occurrence of `–‘
\s* # and may have trailing while space
# (albeit not leading white space XXX)
)+ # repetire ad libitum XXX should be * not +
(.*?) # trailing non comment text
> # up to a `>’
}{
if ($1 || $3) { # this silliness for embedded comments in tags
“<!$1 $3>”;
}
}gesx; # mutate into nada, nothing, and niente
#########################################################
# next we’ll remove all the <tags>
#########################################################
s{ < # opening angle bracket
(?: # Non-backreffing grouping paren
[^>'”] * # 0 or more things that are neither > nor ‘ nor “
| # or else
“.*?” # a section between double quotes (stingy match)
| # or else
‘.*?’ # a section between single quotes (stingy match)
) + # repetire ad libitum
# hm…. are null tags <> legal? XXX
> # closing angle bracket
}{}gsx; # mutate into nada, nothing, and niente
#########################################################
# finally we’ll translate all &valid; HTML 2.0 entities
#########################################################
s{ (
& # an entity starts with a semicolon
(
\x23\d+ # and is either a pound (# == hex 23)) and numbers
| # or else
\w+ # has alphanumunders up to a semi
)
;? # a semi terminates AS DOES ANYTHING ELSE (XXX)
)
} {
$entity{$2} # if it’s a known entity use that
|| # but otherwise
$1 # leave what we’d found; NO WARNINGS (XXX)
}gex; # execute replacement — that’s code not a string
#########################################################
# but wait! load up the %entity mappings enwrapped in
# a BEGIN that the last might be first, and only execute
# once, since we’re in a -p “loop”; awk is kinda nice after all.
#########################################################
BEGIN {
%entity = (
lt => ‘<‘, #a less-than
gt => ‘>’, #a greater-than
amp => ‘&’, #a nampersand
quot => ‘”‘, #a (verticle) double-quote
nbsp => chr 160, #no-break space
iexcl => chr 161, #inverted exclamation mark
cent => chr 162, #cent sign
pound => chr 163, #pound sterling sign CURRENCY NOT WEIGHT
curren => chr 164, #general currency sign
yen => chr 165, #yen sign
brvbar => chr 166, #broken (vertical) bar
sect => chr 167, #section sign
uml => chr 168, #umlaut (dieresis)
copy => chr 169, #copyright sign
ordf => chr 170, #ordinal indicator, feminine
laquo => chr 171, #angle quotation mark, left
not => chr 172, #not sign
shy => chr 173, #soft hyphen
reg => chr 174, #registered sign
macr => chr 175, #macron
deg => chr 176, #degree sign
plusmn => chr 177, #plus-or-minus sign
sup2 => chr 178, #superscript two
sup3 => chr 179, #superscript three
acute => chr 180, #acute accent
micro => chr 181, #micro sign
para => chr 182, #pilcrow (paragraph sign)
middot => chr 183, #middle dot
cedil => chr 184, #cedilla
sup1 => chr 185, #superscript one
ordm => chr 186, #ordinal indicator, masculine
raquo => chr 187, #angle quotation mark, right
frac14 => chr 188, #fraction one-quarter
frac12 => chr 189, #fraction one-half
frac34 => chr 190, #fraction three-quarters
iquest => chr 191, #inverted question mark
Agrave => chr 192, #capital A, grave accent
Aacute => chr 193, #capital A, acute accent
Acirc => chr 194, #capital A, circumflex accent
Atilde => chr 195, #capital A, tilde
Auml => chr 196, #capital A, dieresis or umlaut mark
Aring => chr 197, #capital A, ring
AElig => chr 198, #capital AE diphthong (ligature)
Ccedil => chr 199, #capital C, cedilla
Egrave => chr 200, #capital E, grave accent
Eacute => chr 201, #capital E, acute accent
Ecirc => chr 202, #capital E, circumflex accent
Euml => chr 203, #capital E, dieresis or umlaut mark
Igrave => chr 204, #capital I, grave accent
Iacute => chr 205, #capital I, acute accent
Icirc => chr 206, #capital I, circumflex accent
Iuml => chr 207, #capital I, dieresis or umlaut mark
ETH => chr 208, #capital Eth, Icelandic
Ntilde => chr 209, #capital N, tilde
Ograve => chr 210, #capital O, grave accent
Oacute => chr 211, #capital O, acute accent
Ocirc => chr 212, #capital O, circumflex accent
Otilde => chr 213, #capital O, tilde
Ouml => chr 214, #capital O, dieresis or umlaut mark
times => chr 215, #multiply sign
Oslash => chr 216, #capital O, slash
Ugrave => chr 217, #capital U, grave accent
Uacute => chr 218, #capital U, acute accent
Ucirc => chr 219, #capital U, circumflex accent
Uuml => chr 220, #capital U, dieresis or umlaut mark
Yacute => chr 221, #capital Y, acute accent
THORN => chr 222, #capital THORN, Icelandic
szlig => chr 223, #small sharp s, German (sz ligature)
agrave => chr 224, #small a, grave accent
aacute => chr 225, #small a, acute accent
acirc => chr 226, #small a, circumflex accent
atilde => chr 227, #small a, tilde
auml => chr 228, #small a, dieresis or umlaut mark
aring => chr 229, #small a, ring
aelig => chr 230, #small ae diphthong (ligature)
ccedil => chr 231, #small c, cedilla
egrave => chr 232, #small e, grave accent
eacute => chr 233, #small e, acute accent
ecirc => chr 234, #small e, circumflex accent
euml => chr 235, #small e, dieresis or umlaut mark
igrave => chr 236, #small i, grave accent
iacute => chr 237, #small i, acute accent
icirc => chr 238, #small i, circumflex accent
iuml => chr 239, #small i, dieresis or umlaut mark
eth => chr 240, #small eth, Icelandic
ntilde => chr 241, #small n, tilde
ograve => chr 242, #small o, grave accent
oacute => chr 243, #small o, acute accent
ocirc => chr 244, #small o, circumflex accent
otilde => chr 245, #small o, tilde
ouml => chr 246, #small o, dieresis or umlaut mark
divide => chr 247, #divide sign
oslash => chr 248, #small o, slash
ugrave => chr 249, #small u, grave accent
uacute => chr 250, #small u, acute accent
ucirc => chr 251, #small u, circumflex accent
uuml => chr 252, #small u, dieresis or umlaut mark
yacute => chr 253, #small y, acute accent
thorn => chr 254, #small thorn, Icelandic
yuml => chr 255, #small y, dieresis or umlaut mark
);
####################################################
# now fill in all the numbers to match themselves
####################################################
for $chr ( 0 .. 255 ) {
$entity{ ‘#’ . $chr } = chr $chr;
}
}
#########################################################
# premature finish lest someone clip my signature
#########################################################
# NOW FOR SOME SAMPLE DATA — Switch ARGV to DATA above
# to test
__END__
<title>Tom Christiansen’s Mox.Perl.COM Home Page</title>
<!– begin header –>
<A HREF=”http://perl-ora.songline.com/universal/header.map”><IMG SRC=”http://perl-ora.songline.com/graphics/header-nav.gif” HEIGHT=”18″ WIDTH=”515″ ALT=”Nav bar” BORDER=”0″ usemap=”#header-nav”></A>
<map name=”header-nav”>
<area shape=”rect” alt=”Perl.com” coords=”5,1,103,17″ href=”http://www.perl.com/index.html“>
<area shape=”rect” alt=”CPAN” coords=”114,1,171,17″ href=”http://www.perl.com/CPAN/CPAN.html“>
<area shape=”rect” alt=”Perl Language” coords=”178,0,248,16″ href=”http://language.perl.com/“>
<area shape=”rect” alt=”Perl Reference” coords=”254,0,328,16″ href=”http://reference.perl.com/“>
<area shape=”rect” alt=”Perl Conference” coords=”334,0,414,17″ href=”http://perl-conf.songline.com“>
<area shape=”rect” alt=”Programming Republic of Perl” coords=”422,0,510,17″ href=”http://republic.perl.com“>
</map>
<!– end header –>
<BODY BGCOLOR=#ffffff TEXT=#000000>
<!–
<BODY BGCOLOR=”#000000″ TEXT=”#FFFFFF”
LINK=”#FFFF00″ VLINK=”#22AA22″ ALINK=”#0077FF”>
!–>
<A NAME=TOP>
<CENTER>
<h3>
<A HREF=”#PERL”>perl</a> /
<A HREF=”#MAGIC”>magic</a> /
<A HREF=”#USENIX”>usenix</a> /
<A HREF=”#BOULDER”>boulder</a>
</h3>
<BR>
The word of the day is <i>nidificate</i>.
</CENTER>
Testing: E Ê Ä
</a>
<HR NOSHADE SIZE=3>
<A NAME=PERL>
<CENTER>
<h1>
<IMG SRC=”/deckmaster/gifs/camel.gif” ALT=””>
<font size=7>
Perl
</font>
<IMG SRC=”/deckmaster/gifs/camel.gif” ALT=””>
</h1>
</a>
DOCTYPE START1
<!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”
— This is an annoying comment > —
>
END1
DOCTYPE START2
<!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”
— This is an annoying comment —
>
END2
<I>
<BLOCKQUOTE>
<DL><DT>A ship then new they built for him
<DD>of mithril and of elven glass…
</DL>
</I>
</BLOCKQUOTE>
</CENTER>
<HR size=3 noshade>
<BLOCKQUOTE>
Wow! I really can’t believe that anyone has read this far
in this very long news posting about irregular expressions. 🙂
Is anyone really still with me? If so, make my day and
drop me a piece of email.
</BLOCKQUOTE>
<UL>
<LI>
<A HREF=”/CPAN/README.html”>CPAN
(Comprehensive Perl Archive Network)</a> sites are replicated around the world; please ch
oose
from <A HREF=”/CPAN/CPAN.html”>one near you</a>.
The <A HREF=”/CPAN/modules/01modules.index.html”>CPAN index</a
>
to the <A HREF=”/CPAN/modules/00modlist.long.html”>full module
s file</a>
are also good places to look.
<LI><IMG SRC=”/deckmaster/gifs/new.gif” WIDTH=26 HEIGHT=13 ALT=”NEW”>
Here’s a table of perl and CGI-related books and publications, in either
<A HREF=”/info/books.html”><SMALL>HTML</SMALL> 3.0 table format</a>
or else in
<A HREF=”/info/books.txt”>pre-formatted</a> for old browsers.
What’s missing from Perl’s regular expressions? Anything? Well, yes. The first is that they should be first-class objects. There are some really embarassing optimization hacks to get around not having compiled regepxs directly-usable accessible. The /o flag I used above is just one of them. (I’m *not* talking about the study() function, which is a neat thing to turbo-ize your matching.) A much more egregious hack involving closures is demonstrated here using the match_any funtion, which itself returns a function to do the work:
$f = match_any(‘^begin’, ‘end$’, ‘middle’);
while (<>) {
print if &$f();
}
sub match_any {
die “usage: match_any pats” unless @_;
my $code = <<EOCODE;
sub {
EOCODE
$code .= <<EOCODE if @_ > 5;
study;
EOCODE
for $pat (@_) {
$code .= <<EOCODE;
return 1 if /$pat/;
EOCODE
}
$code .= “}\n”;
print “CODE: $code\n”;
my $func = eval $code;
die “bad pattern: $@” if $@;
return $func;
}
That’s the kind of thing I just despise writing: the only thing worse would be not being able to do it at all. 🙁 1st-class compiled regexps would surely help a great deal here.
Sometimes people expect backreferences to be forward references, as in the pattern /\1\s(\w+)/, which just isn’t the way it works. A related issue is that while lookaheads work, these are not lookbehinds, which can confuse people. This means /\n(?=\s)/ is ok, but you cannot use this for lookbehind: /(?!foo)bar/ will not find an occurrence of “bar” that is preceded by something which is not “foo”. That’s because the (?!foo) is just saying that the next thing cannot be “foo”–and it’s not, it’s a “bar”, so “foobar” will match.
There isn’t really much support for user-defined character classes. You see a bit of that in the urlify program above. On the other hand, this might be the most clear way of writing it.
Another thing that would be nice to have is the ability to someone specify a recursive match with nesting. That ways you could pull out matching parens or braces or begin/end blocks etc. I don’t know what a good syntax for this might be. Maybe (?{…) for the opening one and (?}…) for the closing one, as in:
/\b(?{begin)\b.*\b(?}end)\b/i
Finally, while it’s cool that perl’s patterns are 8-bit clean, will match strings even with null bytes in them, and have support for alternate 8-bit character sets, it would certainly make the world happy if there were full Unicode support.
Escape Mission 逃脫任務
Escape Mission 逃脫任務
軟體:Escape Mission(版本:N/A)
類別:益智遊戲
性質:Freeware()
【編輯/宗文】
這是一款益智遊戲,玩家的目標是要推開擋住主角的箱子,然後找出一條可以前往出口處的通道,當主角走到出口處便能順利過關。(會呈現閃動狀態的棕色格子便是出口處。)遊戲中玩家可以看到畫面下方有一個「MOVES」字樣,其底下有許多小白點,每當玩家移動一次就會扣一白點,當關卡結束後留有愈多白點可以獲得更多積分。
遊戲中玩家一次只能推動一個箱子,而且如果箱子推到牆壁邊緣或者路徑的旁邊時就無法再推動,因此玩家必須先經過縝密思考再來推動箱子,否則很可能會困住自己。另外遊戲中還會出現一些障礙物,例如鐵製轉門,當轉門旁邊有箱子時,將不能推動轉門,玩家必須先將旁邊的箱子移走。又例如一些帶有三角形圖案的地板,必須是特定的主角才能通過,這些限制都增加了遊戲的困難度,讓玩家更加傷腦筋。遊戲中有些關卡是要多人走到出口才能過關,因此有可能要多人合作,才能清出一條通往出口的通道。
遊戲操控說明:
1.利用四個方向鍵移動主角。
2.空白鍵可以切換不同主角。
3.滑鼠左鍵點擊「RESET!」可以重玩本關卡,不過會扣一生命值。
下載:
Super Pirate Isle 海盜之島
Super Pirate Isle 海盜之島
軟體:Super Pirate Isle(版本:N/A)
類別:動作遊戲
性質:Freeware()
【編輯/宗文】
玩家將會隨著遊戲的進行,前往各處不同的島嶼,然後引導海盜來找到藏寶箱,只有在限定的時間之內找完關卡中的全部藏寶箱,才能順利過關,並且可以有資格來挑戰更艱難的下一個關卡。而如果無法在時間之內找完,則整個遊戲都會結束。愈早過關,可以獲得時間積分會愈多。
遊戲中玩家會遇到許多敵人,他們會到處走動,阻擾玩家找到藏寶箱。玩家可以利用炸彈來攻擊他們,不過再放置炸彈之後,要迅速躲避,要不然可能會被炸彈所波及。不管是碰觸到敵人或者被炸彈所炸傷,這些都會扣一生命值,玩家要小心才能順利過關。當玩家要走到地圖的另一側時要注意,因為敵人有能就躲在邊邊角角的地方,一不注意就會碰觸到。
遊戲操控說明:
1.利用四個方向鍵來移動主角。
2.空白鍵可以放置炸彈。
下載:
Panik in Platform Peril 潘尼克冒險記
Panik in Platform Peril 潘尼克冒險記
軟體:Panik in Platform Peril(版本:N/A)
類別:動作遊戲
性質:Freeware()
【編輯/宗文】
這是一款動作遊戲,而玩家的目標是先找到鑰匙,然後尋找出口處來用鑰匙開啟出口,如此才能順利過關,並且可以來挑戰更艱難的下一關。
遊戲中玩家會發現一路上會有許多魚骨頭出現,玩家可以盡量來收集它們,將可獲得更多的積分。當主角遭受敵人攻擊,或者碰觸到各種機關時,畫面正下方的力量值會減少,如果玩家能找到胡蘿蔔的話,將可以補充部分減少的力量值。遊戲中玩家可以丟擲武器來攻擊敵人,將它們暫時冰凍住,但要小心一會兒後他們又可以開始自由行動了。
遊戲中除了會出現敵人之外,還會有各種機關,例如會刺人的仙人掌,或者可怕的機械拳擊手套等等,玩家要小心避開它們。另外跳躍能力是玩家要過關所倚重的能力,許多地方必須來連續跳躍,例如某些地方必須藉由跳過多顆氣球,才能到達高處,而這些氣球又很容易破損,所以動作要快,才能順利到達目的地。
遊戲操控說明:
1.利用方向鍵中的左右鍵移動主角。
2.方向鍵中的上鍵可以進行跳躍,下鍵則是蹲下。
3.空白鍵可以丟出攻擊武器,暫時冰凍住敵人。
下載:
Asteroids Revenge 3 行星的復仇行動
Asteroids Revenge 3 行星的復仇行動
軟體:Asteroids Revenge 3(版本:N/A)
類別:動作遊戲
性質:Freeware()
【編輯/宗文】
遊戲中玩家將要控制一顆行星,與眾多敵人對抗。攻擊的方式是利用行星本身來撞擊敵人,或者也可以利用周邊的小行星來攻擊敵人。敵人的種類繁多,有些會成群結隊出現,發射出大量子彈來攻擊主角,有些則是非常會閃躲,要打倒他們可是要費一番功夫的。另外有些類似炸彈,會產生大爆炸,如果太靠近爆炸區域可是會受到重傷害的。有些則是會產生排擠的力量,讓主角很難靠近來攻擊他們。因此要順利打敗這些敵人可是不容易的喔!
遊戲中玩家必須保護自己的主要行星,盡量不讓他受傷害,而要達到此目的,可以利用周邊的小行星來抵擋各式子彈的攻擊。另外每次過關後,會有一些增強我方各式能力的選項,玩家可以選擇一個項目來增強能力,例如讓行星的速度加快,或者讓行星變的更大等等,這些將是過關斬將的利器。
遊戲操控方面:
1.利用滑鼠移動或四個方向鍵來移動我們的主角行星。
2.Z鍵可以讓周邊的小行星靠攏,X鍵則是讓小行星與行星距離拉大。
3.C鍵可以讓行星與小行星分離。
下載:
stay the distance 賽馬
stay the distance 賽馬
軟體:stay the distance(版本:N/A)
類別:動作遊戲
性質:Freeware()
【編輯/宗文】
這是一款賽馬的遊戲,玩家的目標是能夠達成第一個跑回終點站的騎士。不過要當冠軍可不簡單喔!必須要考慮眾多因素,每個環節都能注意才能順利達成。玩家可以看到畫面右上角有一些指示的訊息,綠色部分是代表距離終點站還有多少距離,紅色部分則代表此匹馬的體力多寡程度,當沒有體力時,馬匹的移動速度會相當緩慢。最靠近右上角的「PACE」代表馬匹移動的速度,玩家控制的馬移動速度快的話,下方體力則會消耗的快,因此要如何搭配得當,就要靠玩家的智慧了!
遊戲中有一個項目「WHIPS」,這就是用鞭子抽打馬匹,這可使馬匹的速度急速增加,不過只能支撐一會兒的時間,很快的速度又會降回來,另外此功能只有三次機會,玩家要慎選時機來利用。遊戲中遇到障礙物,必須要操控馬匹跨欄,這個時間點可要選好,否則可是會摔下馬的喔!
遊戲操控說明:
1.利用方向鍵中的上下鍵增加或減少馬匹移動速度。
2.利用方向鍵中的左右鍵使馬匹左右橫向移動。
3.Ctrl鍵可使用「WHIPS」功能。
4.空白鍵可以使馬匹跨欄。
下載:http://www.miniclip.com/games/stay-the-distance/en/
Super Chick Sisters 超級雞姊妹
Super Chick Sisters 超級雞姊妹
軟體:Super Chick Sisters(版本:N/A)
類別:動作遊戲
性質:Freeware()
【編輯/宗文】
這是一款動作遊戲,它的玩法非常類似「超級瑪莉」。玩家的最終目標是打敗大魔王,但是要找到大魔王前,必須經過層層的關卡,與眾多敵人對抗。遊戲中玩家會遇到一些小雞,玩家可以盡量來收集這些小雞,當收集滿一百隻時便能增加一生命值。
遊戲中玩家會看到一些帶有問號的石頭,玩家可以用主角的頭部去撞擊,有些時候可以增加收集小雞的數量,有些時候還可以得到讓主角變大的寶物喔!遊戲中主角取得寶物後,身體會變大,當遭後敵人攻擊後身體會恢復原本大小,不過不會扣一生命值。遊戲中除了要對付眾多不同種類的敵人之外,還有許多危險的地形與機關要注意喔!例如有許多坑洞,或者溫度非常高的油鍋,或者會將主角壓扁的機關等等,玩家必須利用跳躍的方式或者找適當時間差來避開。
遊戲中會有許多隱藏的地點,藏有可以增加生命值的寶物,或者許多的小雞,玩家可以多多探索,來發現這些地點。另外遊戲一開始只能選擇兩位主角,而玩家如能全破關後會有一個密碼,在片頭輸入密碼後便能選擇另外一位女主角進行遊戲。遊戲操控方面,利用方向鍵中的左右鍵移動主角,上鍵可以跳躍。
下載:http://www.kentuckyfriedcruelty.com/superchicksisters/superChickSisters.zip