Co-occurrence Approach to an Item Based Recommender Update
In a previous post I talked about a Co-occurrence Approach to an Item Based Recommender, that utilized the Math.Net Numerics library. Recently the Math.Net Numerics library was updated to version 2.3.0. With this version of the library I was able to update the code to more efficiently read the Sparse Matrix entries. As such I have updated the code to reflect these library changes:
https://code.msdn.microsoft.com/Co-occurrence-Approach-to-57027db7
The new Mat.Net Numerics Library changes were around the storage of the Vector and Matrix elements. As such I was now able to access the storage directly and use the Compress Sparse Row Matrix format to more efficiently access the Sparse Matrix elements.
The original code that accessed the elements of the Sparse Matrix was a simple row/column traverse:
let getQueue (products:int array) =
// Define the priority queue and lookup table
let queue = PriorityQueue(coMatrix.ColumnCount)
let lookup = HashSet(products)
// Add the items into a priority queue
products
|> Array.iter (fun item ->
let itemIdx = item - offset
if itemIdx >= 0 && itemIdx < coMatrix.ColumnCount then
seq {
for idx = 0 to (coMatrix.ColumnCount - 1) do
let productIdx = idx + offset
let item = coMatrix.[itemIdx, idx]
if (not (lookup.Contains(productIdx))) && (item > 0.0) then
yield KeyValuePair(item, productIdx)
}
|> queue.Merge)
// Return the queue
queue
Now one has access to the storage elements I was able to more efficiently access just the sparse element values:
products
|> Array.iter (fun item ->
let itemIdx = item - offset
let sparse = coMatrix.Storage :?> SparseCompressedRowMatrixStorage<double>
let last = sparse.RowPointers.Length - 1
if itemIdx >= 0 && itemIdx <= last then
let (startI, endI) =
if itemIdx = last then
(sparse.RowPointers.[itemIdx], sparse.RowPointers.[itemIdx])
else
(sparse.RowPointers.[itemIdx], sparse.RowPointers.[itemIdx + 1] - 1)
seq {
for idx = startI to endI do
let productIdx = sparse.ColumnIndices.[idx] + offset
let item = sparse.Values.[idx]
if (not (lookup.Contains(productIdx))) && (item > 0.0) then
yield KeyValuePair(item, productIdx)
}
|> queue.Merge)
// Return the queue
queue
In the new version of the code The Values array provides access to the underlying non-empty values. The RowPointers array provides access to the value indexes where each row starts. Finally, the ColumnIndicies are the column indices corresponding to the values.
Other than this change all other aspects of the library’s usage were effectively unchanged; including the MapReduce code (postings can be found here), as this uses a collection of Vector types. I did however update the job submission scripts.