Skip to main content

Encoding multi-dimensional value into 1-D with preserving closeness

After a very long time, this will be my first post. I hope that it will not be the last, and the rest will come.

For a project about taxi trajectory prediction, I have applied k-NN algorithm on coordinate sequence of taxi trips. Did I have promising results? Well, it is arguable, but I have discovered some good techniques. One of them is Z-order curve (Figure 1).

Figure 1: Four levels of Z-ordering (taken from Wikipedia page)

I was looking for a technique to decrease dimensionality of 2-D coordinate representation (lat-lon) into 1-D. But on this 1-D value, I would like to apply k-NN algorithm. For this purpose, 1-D values must preserve the closeness of two different coordinates. If I do linear ordering of latitude and longitude values, I couldn't achieve this goal, because closeness will depend on only one value which is placed on the right-most. Let's assume I do linear ordering as lat-lon:

Latitude : +47.312
Longitude: +35.02

To represent these float values in int, we multiply them with 1000.

Latitude (int) : 47312
Longitude(int) : 35020
 
Latitude  (Binary) : 1011100011010000
Longitude (Binary) : 1000100011001100

Linear Representation (bin) : 10111000110100001000100011001100
Linear Representation (int) : 3100674252
It is sensitive to the closeness of longitude values, but same situation does not apply for latitude values. Let's assume we have another coordinate (47.313, 35.02) in (lat,lon) representation. Linear representation of this coordinate will be 3,100,739,788. The distance with the one in the example is

|3100674252-3100739788| = 65,536
For such close coordinates, it is not an ignorable distance. Obviously, it will not claim closeness. Therefore, encoding coordinates with linear ordering is not sensible to use in k-NN.

On the other hand, Z-ordering brings a novel approach to address such problem. In this method, an index value (k) is assigned to each one(e.g. index values are 0 and 1 for 2-D). Then, each value is represented in binary way. Each bit in index i of the binary representation is placed into (2*i+k)th index of encoded value. As an example:

Latitude : +47.312
Longitude: +35.02

To represent these float values in int, we multiply them with 1000.

Latitude (int) : 47312
Longitude(int) : 35020
 
Latitude  (Binary) : 1011100011010000
Longitude (Binary) : 1000100011001100

Z-Order Representation (bin) : 11001010110000001111001001010000

Z-Order Representation (int) : 3401642576
Let's use same coordinate (47.313, 35.02) to check distance in linear representation method. Its Z-order representation will be 3,401,642,578. The distance with the one in the example is

|3401642576-3401642578| = 2
By following this way, I have encoded all coordinates (latitude and longitude pairs) into 1-D values. Since closeness property among coordinates is preserved, it is suitable to apply k-NN algorithm on such data. It gives considerably good performance improvement with accuracy.

This method is also applicable on N-dimension values.  For value N, assign an index (k) to each number from beginning 0 to N-1. Then, bits of each number will be placed at (2*i+k)th index. One major drawback is that the final value will contain many number of bits. Therefore, it would be necessary to diminish the numbers into smaller values. For example, if the value range is 40000-42000, then it can be represented as 0-2000. By this way, you will be able to decode N-dimensions into lower number of bits.

Don't hesitate to mention any improvements if I forget to mention. What could be the other ways to achieve this goal? Any better/alternative approach is welcome. Just drop a comment. I will try to write a post about sensible ones :)

Comments

  1. The Hilbert curve might be interesting too, it achieves similar outcomes in traversing a 2D space in 1D, but doesn't have the large jumps of the Z-order curve.

    ReplyDelete
    Replies
    1. All space filling curves will have discontinuities.

      Delete
    2. I remember I had to extract pixel brightness information from the iris of the eye, using Hilbert curves, for a faculty project. I tried writing my own algorithm but ended up using somebody's code for generating hilbert curves in matlab and adapted that.

      Delete
    3. This comment has been removed by the author.

      Delete
    4. It appears that neither the Z curve nor the Hilbert curve preserves closeness of the 2D coordinates. There appear to be points arbitrarily close together in the 2D space that are not close together in Z coordinates and the same is true for Hilbert coordinates.

      In the case of the Z curve, the problem is that points very close together don't only differ in the lower-order bits of their coordinate values.

      Going from (000111111111, 000000000000) to (001000000000, 000000000000) is a tiny jump (in the x-axis), but the Z curve coordinates are 000000101010101010101010 and 000010000000000000000000 which are a long way apart.

      The Hilbert curve does preserve locality _in the opposite direction_. So, points with close Hilbert coordinates come from close points in 2D. The Z-curve also does this, but slightly more roughly.

      Delete
  2. What was the project you were working on?

    ReplyDelete
    Replies
    1. There was a contest on Kaggle for taxi trajectory prediction. https://www.kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i

      Delete
  3. Interleaving latitude & longitude digits is also known as Peano coding. Named after Guiseppe Peano the mathematician.

    ReplyDelete
  4. One interesting solution to compare is Google's S2 library http://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/

    ReplyDelete
  5. Why not try multidimensional scaling? You can compute the distance matrix between points, since you have the lat/lon. If you have noisy lat/lon and/or incomplete information, you could use something like stochastic triplet embeddings.

    ReplyDelete

Post a Comment

Popular posts from this blog

Integration of MuPDF Project as a Library into an Android Studio Project

I have needed to use MuPDF library in my android project. After some research, I have seen that there are many integration tutorials but, but integrated projects are developed on Eclipse. For projects on AndroidStudio+Gradle, there is no example. I mean there is no specific example which exactly refers to this issue. So, after achieving my goal, I want to share the steps publicly so that it can be reused by others.

Migration from Proxmox to Openstack

I needed to migrate virtual machines in proxmox to openstack. VMs are in raw format. I needed to take some actions for a succesfull migration. I have perform all actions on Ubuntu 12.04 with virt-manager. qemu-kvm is installed. Here is the list of actions that I took: First, close the machine and copy the image file into your Ubuntu. Convert raw image to qcow2 format: qemu-img convert -O qcow2 image1.raw image1.qcow2 You need the image in qcow2 format for compatibility with openstack platform.  Open the converted image in virt-manager. Before opening, edit disk options. Under ' advanced options ' section, select ' qcow2 ' as ' storage forma t '. Start the virtual machine. You should see the login screen soon. (If you don't set storage format, vm will not find a bootable device. )   If everything is ok so far, close the vm. Take qcow2 image and upload it into glance. It may take time depending on size of it. After this process is completed, open a...

How to avoid API-level warning of Android Studio

Before giving the solution, let's start with a scenario. setSelectionFromTop() is a new method in Android Lollipop API. This method is basically beneficial to precisely keep scroll state of a ListView. By keeping that info, a developer can go back to old scroll state after doing some operation like data set change. You are aware of API level and you do your control before you call this function: if (currentapiVersion >= Build.VERSION_CODES.LOLLIPOP) { srlistview.setSelectionFromTop(index, top); } But if project minSdk is set to a lower level (in this case it is 15), this warning will still be displayed in Android Studio: