Ryan R. Rosario
2015-12-17 07:26:16 UTC
Hi,
I have a very large dense numpy matrix. To avoid running out of RAM, I use np.float32 as the dtype instead of the default np.float64 on my system.
When I do an L1 normalization of the rows (axis=1) in my matrix in-place (copy=False), I frequently get rows that do not sum to 1. Since these are probability distributions that I pass to np.random.choice, these must sum to exactly 1.0.
pp.normalize(term, norm='l1', axis=1, copy=False)
sums = term.sum(axis=1)
sums[np.where(sums != 1)]
array([ 0.99999994, 0.99999994, 1.00000012, ..., 0.99999994,
0.99999994, 0.99999994], dtype=float32)
I wrote some code to manually add/subtract the small difference from 1 to each row, and I make some progress, but still all the rows do not sum to 1.
Is there a way to avoid this problem?
— Ryan
------------------------------------------------------------------------------
I have a very large dense numpy matrix. To avoid running out of RAM, I use np.float32 as the dtype instead of the default np.float64 on my system.
When I do an L1 normalization of the rows (axis=1) in my matrix in-place (copy=False), I frequently get rows that do not sum to 1. Since these are probability distributions that I pass to np.random.choice, these must sum to exactly 1.0.
pp.normalize(term, norm='l1', axis=1, copy=False)
sums = term.sum(axis=1)
sums[np.where(sums != 1)]
array([ 0.99999994, 0.99999994, 1.00000012, ..., 0.99999994,
0.99999994, 0.99999994], dtype=float32)
I wrote some code to manually add/subtract the small difference from 1 to each row, and I make some progress, but still all the rows do not sum to 1.
Is there a way to avoid this problem?
— Ryan
------------------------------------------------------------------------------