Collaborative filtering

Collaborative filtering (CF) is a technique used by recommender systems.^[1] Collaborative filtering has two senses, a narrow one and a more general one.^[2]

In the newer, narrower sense, collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person. For example, a collaborative filtering recommendation system for preferences in television programming could make predictions about which television show a user should like given a partial list of that user's tastes (likes or dislikes).^[3] These predictions are specific to the user, but use information gleaned from many users. This differs from the simpler approach of giving an average (non-specific) score for each item of interest, for example based on its number of votes.

In the more general sense, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc.^[2] Applications of collaborative filtering typically involve very large data sets. Collaborative filtering methods have been applied to many different kinds of data including: sensing and monitoring data, such as in mineral exploration, environmental sensing over large areas or multiple sensors; financial data, such as financial service institutions that integrate many financial sources; or in electronic commerce and web applications where the focus is on user data, etc. The remainder of this discussion focuses on collaborative filtering for user data, although some of the methods and approaches may apply to the other major applications as well.

Overview

The growth of the Internet has made it much more difficult to effectively extract useful information from all the available online information.^{[according to whom?]} The overwhelming amount of data necessitates mechanisms for efficient information filtering.^{[according to whom?]} Collaborative filtering is one of the techniques used for dealing with this problem.

The motivation for collaborative filtering comes from the idea that people often get the best recommendations from someone with tastes similar to themselves.^{[citation needed]} Collaborative filtering encompasses techniques for matching people with similar interests and making recommendations on this basis.

Collaborative filtering algorithms often require (1) users' active participation, (2) an easy way to represent users' interests, and (3) algorithms that are able to match people with similar interests.

Typically, the workflow of a collaborative filtering system is:

A user expresses his or her preferences by rating items (e.g. books, movies, or music recordings) of the system. These ratings can be viewed as an approximate representation of the user's interest in the corresponding domain.
The system matches this user's ratings against other users' and finds the people with most "similar" tastes.
With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user (presumably the absence of rating is often considered as the unfamiliarity of an item)

A key problem of collaborative filtering is how to combine and weight the preferences of user neighbors. Sometimes, users can immediately rate the recommended items. As a result, the system gains an increasingly accurate representation of user preferences over time.

Methodology

Collaborative filtering systems have many forms, but many common systems can be reduced to two steps:

Look for users who share the same rating patterns with the active user (the user whom the prediction is for).
Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user

This falls under the category of user-based collaborative filtering. A specific application of this is the user-based Nearest Neighbor algorithm.

Alternatively, item-based collaborative filtering (users who bought x also bought y), proceeds in an item-centric manner:

Build an item-item matrix determining relationships between pairs of items
Infer the tastes of the current user by examining the matrix and matching that user's data

See, for example, the Slope One item-based collaborative filtering family.

Another form of collaborative filtering can be based on implicit observations of normal user behavior (as opposed to the artificial behavior imposed by a rating task). These systems observe what a user has done together with what all users have done (what music they have listened to, what items they have bought) and use that data to predict the user's behavior in the future, or to predict how a user might like to behave given the chance. These predictions then have to be filtered through business logic to determine how they might affect the actions of a business system. For example, it is not useful to offer to sell somebody a particular album of music if they already have demonstrated that they own that music.

Relying on a scoring or rating system which is averaged across all users ignores specific demands of a user, and is particularly poor in tasks where there is large variation in interest (as in the recommendation of music). However, there are other methods to combat information explosion, such as web search and data clustering.

Types

Memory-based

The memory-based approach uses user rating data to compute the similarity between users or items. Typical examples of this approach are neighbourhood-based CF and item-based/user-based top-N recommendations. For example, in user based approaches, the value of ratings user u gives to item i is calculated as an aggregation of some similar users' rating of the item:

r_{u,i}=\operatorname {aggr} _{u^{\prime }\in U}r_{u^{\prime },i}

where U denotes the set of top N users that are most similar to user u who rated item i. Some examples of the aggregation function include:

r_{u,i}={\frac {1}{N}}\sum \limits _{u^{\prime }\in U}r_{u^{\prime },i}

r_{u,i}=k\sum \limits _{u^{\prime }\in U}\operatorname {simil} (u,u^{\prime })r_{u^{\prime },i}

where k is a normalizing factor defined as $k=1/\sum _{u^{\prime }\in U}|\operatorname {simil} (u,u^{\prime })|$ , and

r_{u,i}={\bar {r_{u}}}+k\sum \limits _{u^{\prime }\in U}\operatorname {simil} (u,u^{\prime })(r_{u^{\prime },i}-{\bar {r_{u^{\prime }}}})

where ${\bar {r_{u}}}$ is the average rating of user u for all the items rated by u.

The neighborhood-based algorithm calculates the similarity between two users or items, and produces a prediction for the user by taking the weighted average of all the ratings. Similarity computation between items or users is an important part of this approach. Multiple measures, such as Pearson correlation and vector cosine based similarity are used for this.

The Pearson correlation similarity of two users x, y is defined as

\operatorname {simil} (x,y)={\frac {\sum \limits _{i\in I_{xy}}(r_{x,i}-{\bar {r_{x}}})(r_{y,i}-{\bar {r_{y}}})}{{\sqrt {\sum \limits _{i\in I_{xy}}(r_{x,i}-{\bar {r_{x}}})^{2}}}{\sqrt {\sum \limits _{i\in I_{xy}}(r_{y,i}-{\bar {r_{y}}})^{2}}}}}

where I_xy is the set of items rated by both user x and user y.

The cosine-based approach defines the cosine-similarity between two users x and y as:^[4]

\operatorname {simil} (x,y)=\cos({\vec {x}},{\vec {y}})={\frac {{\vec {x}}\cdot {\vec {y}}}{||{\vec {x}}||\times ||{\vec {y}}||}}={\frac {\sum \limits _{i\in I_{xy}}r_{x,i}r_{y,i}}{{\sqrt {\sum \limits _{i\in I_{x}}r_{x,i}^{2}}}{\sqrt {\sum \limits _{i\in I_{y}}r_{y,i}^{2}}}}}

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.

[1]

[2]

[3]

[4]