I'd like to build a recommendation system

Status
Not open for further replies.

marfle

Registered Member
Hi folks, I'm a newbie to these forums, acquisition brought me over, and it seems like a pretty swell place.

For several different projects I'm working on, I'm in need of a recommendation system similiar to Amazon's item-to-item similiarity mappings. I've been reading some technical articles on how to do it, and I have a printed out copy of Amazon's patent (6266649) that I'm trying to read. Basically how they do it, as I understand is that for every item in their catalog there is a row and a number of columns equaling the number of items in the system.

So for 1,000 objects in a set, you'd have 1,000 rows in a table, with 1,000 columns relating that item to every other item with a percentage noting how similiar it is.

First of all, I'm certainly no database genius, so at what point would the system be too big to work with, or would that not be an issue?

Secondly, I need a little guidance on how to develop the algorithm that would take a list of objects from individual users and map that to the percentage similiarity table.

I stumbled upon movielens.umn.edu, and it's pretty nifty, but gives no insight into how what they're doing is accomplished.

Any pointers or suggestions would be greatly appreciated, obviously I'm not looking for anyone to write any code for me, I just need some ideas.

Thanks alot,
Andrew Triboletti
 
Well, if you go with what you described and keep information for every pairing, you'll get pretty big pretty quick. Say you use 8 bytes per pairing to rate how similar two items are. With 1000 items, that's roughly 8 MB. (1000*1000*8 = 8,000,000). Go up to 10000 items and that's on the order of 800 MB. (10000*10000*8 = 800,000,000). While I can't speak for the point at which a database becomes 'too big', I can say that you're going to need a pretty decent database management system to handle something on the order of a few hundred MB or more. If I had to guess (and it's purely a guess), I would say that Amazon probably doesn't keep pairings for every single item, but rather puts them into groups somehow. Of course, they still have to figure out some similarity measure to put them into groups, so perhaps not.
The way I understand it is that when you look at item a on Amazon, it sees that 90% of the people that bought that item also bought item b, but only 10% bought item c. I think it also looks at other demographics like location, age group, and makes recommendations with a similar method. I haven't read the patent, so I'm not entirely sure.
A naive approach might go something like this: whenever someone buys an item (or says they liked this movie, or whatever), you bump up the score for similarity between the purchased item a and all other previously purchased items X. You'll also keep track of the purchase for that user so you can bump up the score for this item when they buy something else in the future. You'll need to normalize the scores, also.
I'm sure much more sophisticated approaches are possible. Maybe that can get you started.
 
Status
Not open for further replies.
Back
Top