How MySQL Optimises ORDER BY

How MySQL Optimises ORDER BY



In some cases MySQL can uses index to satisfy an ORDER BY or GROUP BY request without doing any extra sorting.


The index can also be used even if the ORDER BY doesn’t match the index exactly, as long as all the unused index parts and all the extra are ORDER BY columns are constants in the WHERE clause. The following queries will use the index to resolve the ORDER BY / GROUP BY part:



SELECT * FROM t1 ORDER BY key_part1,key_part2,…
SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2
SELECT * FROM t1 WHERE key_part1=constant GROUP BY key_part2
SELECT * FROM t1 ORDER BY key_part1 DESC,key_part2 DESC
SELECT * FROM t1 WHERE key_part1=1 ORDER BY key_part1 DESC,key_part2 DESC


Some cases where MySQL can not use indexes to resolve the ORDER BY: (Note that MySQL will still use indexes to find the rows that matches the WHERE clause):


* You are doing an ORDER BY on different keys: SELECT * FROM t1 ORDER BY key1,key2
* You are doing an ORDER BY using non-consecutive key parts. SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2
* You are mixing ASC and DESC. SELECT * FROM t1 ORDER BY key_part1 DESC,key_part2 ASC
* The key used to fetch the rows are not the same one that is used to do the ORDER BY: SELECT * FROM t1 WHERE key2=constant ORDER BY key1
* You are joining many tables and the columns you are doing an ORDER BY on are not all from the first not-const table that is used to retrieve rows (This is the first table in the EXPLAIN output which doesn’t use a const row fetch method).
* You have different ORDER BY and GROUP BY expressions.
* The used table index is an index type that doesn’t store rows in order. (Like the HASH index in HEAP tables).


In the cases where MySQL have to sort the result, it uses the following algorithm:


* Read all rows according to key or by table scanning. Rows that don’t match the WHERE clause are skipped.
* Store the sort-key in a buffer (of size sort_buffer).
* When the buffer gets full, run a qsort on it and store the result in a temporary file. Save a pointer to the sorted block. (In the case where all rows fits into the sort buffer, no temporary file is created)
* Repeat the above until all rows have been read.
* Do a multi-merge of up to MERGEBUFF (7) regions to one block in another temporary file. Repeat until all blocks from the first file are in the second file.
* Repeat the following until there is less than MERGEBUFF2 (15) blocks left.
* On the last multi-merge, only the pointer to the row (last part of the sort-key) is written to a result file.
* Now the code in `sql/records.cc’ will be used to read through them in sorted order by using the row pointers in the result file. To optimise this, we read in a big block of row pointers, sort these and then we read the rows in the sorted order into a row buffer (read_rnd_buffer_size) .


You can with EXPLAIN SELECT … ORDER BY check if MySQL can use indexes to resolve the query. If you get Using filesort in the extra column, then MySQL can’t use indexes to resolve the ORDER BY. See section 5.2.1 EXPLAIN Syntax (Get Information About a SELECT).


If you want to have a higher ORDER BY speed, you should first see if you can get MySQL to use indexes instead of having to do an extra sorting phase. If this is not possible, then you can do:


* Increase the size of the sort_buffer_size variable.
* Increase the size of the read_rnd_buffer_size variable.
* Change tmpdir to point to a dedicated disk with lots of empty space. If you use MySQL 4.1 or later you can spread load between several physical disks by setting tmpdir to a list of paths separated by colon : (semicolon ; on Windows). They will be used in round-robin fashion. Note: These paths should end up on different physical disks, not different partitions of the same disk.


By default, MySQL sorts all GROUP BY x,y[,…] queries as if you specified ORDER BY x,y[,…] in the query as well. If you include the ORDER BY clause explicitly, MySQL optimises it away without any speed penalty, though the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can supress sorting by specifying ORDER BY NULL:


INSERT INTO foo SELECT a,COUNT(*) FROM bar GROUP BY a ORDER BY NULL;


Learning the Java Language

Learning the Java Language

This trail covers the fundamentals of programming in the Java programming language.

 Object-Oriented Programming Concepts teaches you the core concepts behind object-oriented programming: objects, messages, classes, and inheritance. This lesson ends by showing you how these concepts translate into code. Feel free to skip this lesson if you are already familiar with object-oriented programming.


 Language Basics describes the traditional features of the language, including variables, arrays, data types, operators, and control flow.


 Classes and Objects describes how to write the classes from which objects are created, and how to create and use the objects.


 Interfaces and Inheritance describes interfaces—what they are, why you would want to write one, and how to write one. This section also describes the way in which you can derive one class from another. That is, how a subclass can inherit fields and methods from a superclass. You will learn that all classes are derived from the Object class, and how to modify the methods that a subclass inherits from superclasses.


 Numbers and Strings This lesson describes how to use Number and String objects The lesson also shows you how to format data for output.


 Generics are a powerful feature of the Java programming language. They improve the type safety of your code, making more of your bugs detectable at compile time.


 Packages are a feature of the Java programming language that help you to organize and structure your classes and their relationships to one another.


Packages

Packages

This lesson explains how to bundle classes and interfaces into packages, how to use classes that are in packages, and how to arrange your file system so that the compiler can find your source files.

Generics

Generics

Generics are a built-in language feature that will make your software more reliable. This lesson discusses the following topics:

Introduction


This section explains some common shortcomings associated with non-generic code. Specifically, it shows how certain kinds of bugs will crash an application at runtime, since they are not detectable by the compiler.

Generic Types


This section explains generic type declarations, type variables, type parameters, and type arguments. It also describes the naming conventions that are specific to generics.

Generic Methods and Constructors


This section shows how type parameters can be used to define generic methods and constructors.

Bounded Type Parameters


This section describes how type parameters can specify an upper bound that limits the kind of types that can be passed in.

Subtyping


This section describes how generic subtyping differs from non-generic subtyping.

Wildcards


This section continues the discussion of subtyping by describing bounded and unbounded wildcards.

Type Erasure


This section describes type erasure, raw types, and unchecked warnings.

Type Erasure

Type Erasure


When a generic type is instantiated, the compiler translates those types by a technique called type erasure — a process where the compiler removes all information related to type parameters and type arguments within a class or method. Type erasure enables Java applications that use generics to maintain binary compatibility with Java libraries and applications that were created before generics.


For instance, Box<String> is translated to type Box, which is called the raw type — a raw type is a generic class or interface name without any type arguments. This means that you can’t find out what type of Object a generic class is using at runtime. The following operations are not possible:


public class MyClass<E> {
public static void myMethod(Object item) {
if (item instanceof E) { //Compiler error

}
E item2 = new E(); //Compiler error
E[] iArray = new E[10]; //Compiler error
E obj = (E)new Object(); //Unchecked cast warning
}
}
The operations shown in bold are meaningless at runtime because the compiler removes all information about the actual type argument (represented by the type parameter E) at compile time.

Type erasure exists so that new code may continue to interface with legacy code. Using a raw type for any other reason is considered bad programming practice and should be avoided whenever possible.


When mixing legacy code with generic code, you may encounter warning messages similar to the following:


Note: WarningDemo.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
This can happen when using an older API that operates on raw types, as shown in the following WarningDemo program:
public class WarningDemo {
public static void main(String[] args){
Box<Integer> bi;
bi = createBox();
}

/**
* Pretend that this method is part of an old library,
* written before generics. It returns
* Box instead of Box<T>.
*/
static Box createBox(){
return new Box();
}
}

Recompiling with -Xlint:unchecked reveals the following additional information:
WarningDemo.java:4: warning: [unchecked] unchecked conversion
found : Box
required: Box<java.lang.Integer>
bi = createBox();
^
1 warning

Wildcards

Wildcards

Earlier we mentioned that English is ambiguous. The phrase “animal cage” can reasonably mean “all-animal cage”, but it also suggests an entirely different concept: a cage designed not for any kind of animal, but rather for some kind of animal whose type is unknown. In generics, an unknown type is represented by the wildcard character “?“.

To specify a cage capable of holding some kind of animal:

	Cage<? extends Animal> someCage = …;
Read “? extends Animal” as “an unknown type that is a subtype of Animal, possibly Animal itself”, which boils down to “some kind of animal”. This is an example of a bounded wildcard, where Animal forms the upper bound of the expected type. If you’re asked for a cage that simply holds some kind of animal, you’re free to provide a lion cage or a butterfly cage.





Note: It’s also possible to specify a lower bound by using the super keyword instead of extends. The code <? super Animal>, therefore, would be read as “an unknown type that is a supertype of Animal, possibly Animal itself”. You can also specify an unknown type with an unbounded wilcard, which simply looks like <?>. An unbounded wildcard is essentially the same as saying <? extends Object>.



While Cage<Lion> and Cage<Butterfly> are not subtypes of Cage<Animal>, they are in fact subtypes of Cage<? extends Animal>:

	someCage = lionCage; // OK
someCage = butterflyCage; // OK
So now the question becomes, “Can you add butterflies and lions directly to someCage?”. As you can probably guess, the answer to this question is “no”.
	someCage.add(king);	// compiler-time error
someCage.add(monarch); // compiler-time error
If someCage is a butterfly cage, it would hold butterflies just fine, but the lions would be able to break free. If it’s a lion cage, then all would be well with the lions, but the butterflies would fly away. So if you can’t put anything at all into someCage, is it useless? No, because you can still read its contents:
	void feedAnimals(Cage<? extends Animal> someCage) {
for (Animal a : someCage)
a.feedMe();
}
Therefore, you could house your animals in their individual cages, as shown earlier, and invoke this method first for the lions and then for the butterflies:
	feedAnimals(lionCage);
feedAnimals(butterflyCage);
Or, you could choose to combine your animals in the all-animal cage instead:
	feedAnimals(animalCage);

Generic Subtyping

Subtyping

As you already know, it’s possible to assign an object of one type to an object of another type provided that the types are compatible. For example, you can assign an Integer to an Object, since Object is one of Integer‘s supertypes:
    Object someObject = new Object();
Integer someInteger = new Integer(10);
someObject = someInteger; // OK
In object-oriented terminology, this is called an “is a” relationship. Since an Integer is a kind of Object, the assignment is allowed. But Integer is also a kind of Number, so the following code is valid as well:
    public void someMethod(Number n){
// method body omitted
}

someMethod(new Integer(10)); // OK
someMethod(new Double(10.1)); // OK


The same is also true with generics. You can perform a generic type invocation, passing Number as its type argument, and any subsequent invocation of add will be allowed if the argument is compatible with Number:

    Box<Number> box = new Box<Number>();
box.add(new Integer(10)); // OK
box.add(new Double(10.1)); // OK

Now consider the following method:

    public void boxTest(Box<Number> n){
// method body omitted
}
What type of argument does it accept? By looking at its signature, we can see that it accepts a single argument whose type is Box<Number>. But what exactly does that mean? Are you allowed to pass in Box<Integer> or Box<Double>, as you might expect? Surprisingly, the answer is “no”, because Box<Integer> and Box<Double> are not subtypes of Box<Number>.

Understanding why becomes much easier if you think of tangible objects — things you can actually picture — such as a cage:

	// A cage is a collection of things, with bars to keep them in.
interface Cage<E> extends Collection<E&gt$$




Note: The Collection interface is the root interface of the collection hierarchy; it represents a group of objects. Since a cage would be used for holding a collection of objects (the animals), it makes sense to include it in this example.



A lion is a kind of animal, so Lion would be a subtype of Animal:

	interface Lion extends Animal {}
Lion king = …;

Where we need some animal, we’re free to provide a lion:

	Animal a = king;

A lion can of course be put into a lion cage:

	Cage<Lion> lionCage = …;
lionCage.add(king);

and a butterfly into a butterfly cage:

	interface Butterfly extends Animal {}
Butterfly monarch = …;
Cage<Butterfly> butterflyCage = …;
butterflyCage.add(monarch);

But what about an “animal cage”? English is ambiguous, so to be precise let’s assume we’re talking about an “all-animal cage”:


	Cage<Animal> animalCage = …;
This is a cage designed to hold all kinds of animals, mixed together. It must have bars strong enough to hold in the lions, and spaced closely enough to hold in the butterflies. Such a cage might not even be feasible to build, but if it is, then:
	animalCage.add(king);
animalCage.add(monarch);

Since a lion is a kind of animal (Lion is a subtype of Animal), the question then becomes, “Is a lion cage a kind of animal cage? Is Cage<Lion> a subtype of Cage<Animal>?”. By the above definition of animal cage, the answer must be “no”. This is surprising! But it makes perfect sense when you think about it: A lion cage cannot be assumed to keep in butterflies, and a butterfly cage cannot be assumed to hold in lions. Therefore, neither cage can be considered an “all-animal” cage:

	animalCage = lionCage;	// compile-time error
animalCage = butterflyCage; // compile-time error

Without generics, the animals could be placed into the wrong kinds of cages, where it would be possible for them to escape.

Generic Bounded Type Parameters

Bounded Type Parameters

There may be times when you’ll want to restrict the kinds of types that are allowed to be passed to a type parameter. For example, a method that operates on numbers might only want to accept instances of Number or its subclasses. This is what bounded type parameters are for.

To declare a bounded type parameter, list the type parameter’s name, followed by the extends keyword, followed by its upper bound, which in this example is Number. Note that, in this context, extends is used in a general sense to mean either “extends” (as in classes) or “implements” (as in interfaces).



/**
* This version introduces a bounded type parameter.
*/
public class Box<T> {

private T t;

public void add(T t) {
this.t = t;
}

public T get() {
return t;
}

public <U extends Number> void inspect(U u){
System.out.println(“T: ” + t.getClass().getName());
System.out.println(“U: ” + u.getClass().getName());
}

public static void main(String[] args) {
Box<Integer> integerBox = new Box<Integer>();
integerBox.add(new Integer(10));
integerBox.inspect(“some text”); // error: this is still String!
}
}

By modifying our generic method to include this bounded type parameter, compilation will now fail, since our invocation of inspect still includes a String:
Box.java:21: <U>inspect(U) in Box<java.lang.Integer> cannot
be applied to (java.lang.String)
integerBox.inspect(“10”);
^
1 error

To specify additional interfaces that must be implemented, use the & character, as in:


<U extends Number & MyInterface>

Generic Methods and Constructors


Generic Methods and Constructors

Type parameters can also be declared within method and constructor signatures to create generic methods and generic constructors. This is similar to declaring a generic type, but the type parameter’s scope is limited to the method or constructor in which it’s declared.
/**
* This version introduces a generic method.
*/
public class Box<T> {

private T t;

public void add(T t) {
this.t = t;
}

public T get() {
return t;
}

public <U> void inspect(U u){
System.out.println(“T: ” + t.getClass().getName());
System.out.println(“U: ” + u.getClass().getName());
}

public static void main(String[] args) {
Box<Integer> integerBox = new Box<Integer>();
integerBox.add(new Integer(10));
integerBox.inspect(“some text”);
}
}

Here we’ve added one generic method, named inspect, that defines one type parameter, named U. This method accepts an object and prints its type to standard output. For comparison, it also prints out the type of T. For convenience, this class now also has a main method so that it can be run as an application.

The output from this program is:


T: java.lang.Integer
U: java.lang.String

By passing in different types, the output will change accordingly.


A more realistic use of generic methods might be something like the following, which defines a static method that stuffs references to a single item into multiple boxes:

    public static <U> void fillBoxes(U u, List<Box<U>> boxes) {
for (Box<U> box : boxes) {
box.add(u);
}
}

To use this method, your code would look something like the following:
    Crayon red = …;
List<Box<Crayon>> crayonBoxes = …;
The complete syntax for invoking this method is:
Box.<Crayon>fillBoxes(red, crayonBoxes);
Here we’ve explicitly provided the type to be used as U, but more often than not, this can be left out and the compiler will infer the type that’s needed:
Box.fillBoxes(red, crayonBoxes); // compiler infers that U is Crayon
This feature, known as type inference, allows you to invoke a generic method as you would an ordinary method, without specifying a type between angle brackets.

Generic Types

Generic Types


Let’s update our Box class to use generics. We’ll first create a generic type declaration by changing the code “public class Box” to “public class Box<T>“; this introduces one type variable, named T, that can be used anywhere inside the class. This same technique can be applied to interfaces as well. There’s nothing particularly complex about this concept. In fact, it’s quite similar to what you already know about variables in general. Just think of T as a special kind of variable, whose “value” will be whatever type you pass in; this can be any class type, any interface type, or even another type variable. It just can’t be any of the primitive data types. In this context, we also say that T is a formal type parameter of the Box class.

/**
* Generic version of the Box class.
*/
public class Box<T> {

private T t; // T stands for “Type”

public void add(T t) {
this.t = t;
}

public T get() {
return t;
}
}


As you can see, we’ve replaced all occurrences of Object with T. To reference this generic class from within your own code, you must perform a generic type invocation, which replaces T with some concrete value, such as Integer:


Box<Integer> integerBox;
You can think of a generic type invocation as being similar to an ordinary method invocation, but instead of passing an argument to a method, you’re passing a type argumentInteger in this case — to the Box class itself. Like any other variable declaration, this code does not actually create a new Box object. It simply declares that integerBox will hold a reference to a “Box of Integer“, which is how Box<Integer> is read.

An invocation of a generic type is generally known as a parameterized type.



To instantiate this class, use the new keyword, as usual, but place <Integer> between the class name and the parenthesis:


integerBox = new Box<Integer>();

Or, you can put the entire statement on one line, such as:


Box<Integer> integerBox = new Box<Integer>();

Once integerBox is initialized, you’re free to invoke its get method without providing a cast, as in BoxDemo3:


public class BoxDemo3 {

public static void main(String[] args) {
Box<Integer> integerBox = new Box<Integer>();
integerBox.add(new Integer(10));
Integer someInteger = integerBox.get(); // no cast!
System.out.println(someInteger);
}
}

Furthermore, if you try adding an incompatible type to the box, such as String, compilation will fail, alerting you to what previously would have been a runtime bug:
    BoxDemo3.java:5: add(java.lang.Integer) in Box<java.lang.Integer>
cannot be applied to (java.lang.String)
integerBox.add(“10”);
^
1 error

It’s important to understand that type variables are not actually types themselves. In the above examples, you won’t find T.java or T.class anywhere on the filesystem. Furthermore, T is not a part of the Box class name. In fact during compilation, all generic information will be removed entirely, leaving only Box.class on the filesystem. We’ll discuss this later in the section on Type Erasure


Also note that a generic type may have multiple type parameters, but each parameter must be unique within its declaring class or interface. A declaration of Box<T,T>, for example, would generate an error on the second occurrence of T, but Box<T,U>, however, would be allowed.



Type Parameter Naming Conventions

By convention, type parameter names are single, uppercase letters. This stands in sharp contrast to the variable naming conventions that you already know about, and with good reason: Without this convention, it would be difficult to tell the difference between a type variable and an ordinary class or interface name.

The most commonly used type parameter names are:



  • E – Element (used extensively by the Java Collections Framework)

  • K – Key

  • N – Number

  • T – Type

  • V – Value

  • S,U,V etc. – 2nd, 3rd, 4th types
You’ll see these names used throughout the Java SE API and the rest of this tutorial.