David Marek

2012-05-14 22:12:34 UTC

Hi,

I have worked on multilayer perceptron and I've got a basic

implementation working. You can see it at

https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp The most

important part is the sgd implementation, which can be found here

https://github.com/davidmarek/scikit-learn/blob/gsoc_mlp/sklearn/mlp/mlp_fast.pyx

I have encountered a few problems and I would like to know your opinion.

1) There are classes like SequentialDataset and WeightVector which are

used in sgd for linear_model, but I am not sure if I should use them

here as well. I have to do more with samples and weights than just

multiply and add them together. I wouldn't be able to use numpy

functions like tanh and do batch updates, would I? What do you think?

Am I missing something that would help me do everything I need with

SequentialDataset? I implemented my own LossFunction because I need a

vectorized version, I think that is the same problem.

2) I used Andreas' implementation as an inspiration and I am not sure

I understand some parts of it:

* Shouldn't the bias vector be initialized with ones instead of

zeros? I guess there is no difference.

* I am not sure why is the bias updated with:

bias_output += lr * np.mean(delta_o, axis=0)

shouldn't it be:

bias_output += lr / batch_size * np.mean(delta_o, axis=0)?

* Shouldn't the backward step for computing delta_h be:

delta_h[:] = np.dot(delta_o, weights_output.T) * hidden.doutput(x_hidden)

where hidden.doutput is a derivation of the activation function for

hidden layer?

I hope my questions are not too stupid. Thank you.

David

I have worked on multilayer perceptron and I've got a basic

implementation working. You can see it at

https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp The most

important part is the sgd implementation, which can be found here

https://github.com/davidmarek/scikit-learn/blob/gsoc_mlp/sklearn/mlp/mlp_fast.pyx

I have encountered a few problems and I would like to know your opinion.

1) There are classes like SequentialDataset and WeightVector which are

used in sgd for linear_model, but I am not sure if I should use them

here as well. I have to do more with samples and weights than just

multiply and add them together. I wouldn't be able to use numpy

functions like tanh and do batch updates, would I? What do you think?

Am I missing something that would help me do everything I need with

SequentialDataset? I implemented my own LossFunction because I need a

vectorized version, I think that is the same problem.

2) I used Andreas' implementation as an inspiration and I am not sure

I understand some parts of it:

* Shouldn't the bias vector be initialized with ones instead of

zeros? I guess there is no difference.

* I am not sure why is the bias updated with:

bias_output += lr * np.mean(delta_o, axis=0)

shouldn't it be:

bias_output += lr / batch_size * np.mean(delta_o, axis=0)?

* Shouldn't the backward step for computing delta_h be:

delta_h[:] = np.dot(delta_o, weights_output.T) * hidden.doutput(x_hidden)

where hidden.doutput is a derivation of the activation function for

hidden layer?

I hope my questions are not too stupid. Thank you.

David