Maheshakya Wijewardena

2014-03-04 16:09:48 UTC

Hi,

Currently, unsupervised neighbor search is implemented in

sklearn.neighbors. There are three algorithms used to perform neighbor

search.

- kd tree

- ball tree

- brute force method

kd tree and ball tree are implemented separately as cython implementations.

Both of them use binary tree(binary_tree.pxi). In the NeighborBass class,

above implementations are used.

As Locality Sensitivity Hashing will also be used in approximating nearest

neighbors, its' implementation should also be a part of sklearn.neighbors.

But Locality sensitivity hashing algorithms do not involve binary tree

implementation. Therefore it has to be implemented separately and be used

in sklearn.neighbors as another algorithm.Is this approach correct?

I'm trying to implement a base LSH class and the hashing algorithms as

cython implementations. To approximate neighbor search using those

algorithms, multiple hash tables will be created with those hash functions

so, for that an efficient way to store is required.

These are some of the issues I found disturbing while I was trying to

implement LSH to approximate neighbor search. Is this an appropriate

approach to implement this or are there any other better ways?

Regards,

Maheshakya,

Currently, unsupervised neighbor search is implemented in

sklearn.neighbors. There are three algorithms used to perform neighbor

search.

- kd tree

- ball tree

- brute force method

kd tree and ball tree are implemented separately as cython implementations.

Both of them use binary tree(binary_tree.pxi). In the NeighborBass class,

above implementations are used.

As Locality Sensitivity Hashing will also be used in approximating nearest

neighbors, its' implementation should also be a part of sklearn.neighbors.

But Locality sensitivity hashing algorithms do not involve binary tree

implementation. Therefore it has to be implemented separately and be used

in sklearn.neighbors as another algorithm.Is this approach correct?

I'm trying to implement a base LSH class and the hashing algorithms as

cython implementations. To approximate neighbor search using those

algorithms, multiple hash tables will be created with those hash functions

so, for that an efficient way to store is required.

These are some of the issues I found disturbing while I was trying to

implement LSH to approximate neighbor search. Is this an appropriate

approach to implement this or are there any other better ways?

Regards,

Maheshakya,

--

Undergraduate,

Department of Computer Science and Engineering,

Faculty of Engineering.

University of Moratuwa,

Sri Lanka

Undergraduate,

Department of Computer Science and Engineering,

Faculty of Engineering.

University of Moratuwa,

Sri Lanka