[Review] Real-world Noisy Image Denoising: A New Benchmark

Today I come across with a very interesting paper which tries to tackle real-world noisy image dataset. As many papers from Lei Zhang’s group, I really enjoy the introduction, problem description, and their approach. The following is my note from reading this paper.


A real-world noisy image dataset is required because

  • Major sources of real-noise (will be covered in more details in another post)
    • Photon shot: inevitable, caused by the stochastic arrival of the photons to the sensors, modeled as a Poisson process, proportional to the mean of the intensity of the specific pixel, not stationary across the whole image.
    • (Fixed pattern) pixel response non-uniformity (PRNU): each pixel has a slightly different output level or response for a fixed light level. This is because of the sensor loss light and color mixture in the neighboring pixels.
    • (Fixed pattern) dark current non-uniformity (DCNU): the sensor chip is not perfect so there is some electronics event in no light condition.
    • Readout: genreated due to inaccurate charge to voltage conversion
    • Quantization: noise due to quantization analog to digital (ADC) process.
    • Other noise: CCD specific source such as transfer efficiency and CMOS specific noise like column noise, etc.
  • Additive White Gaussian Noise (AWGN) model is too simple and does not hold for the real-world noisy image. Real-world noise is signal dependent.
  • Evaluate the quality of the denoised image is difficult as no “ground truth”
    • Subjective quality assessment is time-consuming and much effort
    • Blind quality assessments were developed from not-real-world datasets.

The motivation for this work are

  • Previous real-world dataset either limited in
    • The process of capturing and handling real-world images (especially related to ISO, reduce post-processing which is the source of distortion),
    • Estimated ‘ground truth’ image
    • Limited in the number of camera brands.

Contributions of this paper are

  • Propose new dataset of real-world noisy images with the ground truth with more ISO, images, and various camera brand.
  • Define the process of handling, capturing images
  • Evaluate existing denoising methods for the real-world denoising.

The proposed dataset 40 scenes with various contents and objects for indoor scenes

  • 5 cameras brand: Cannon (Mark 5D, 80D, 600D), Nikon, Sony
  • More camera setting: 6 ISOs (800, 1600, 3200, 6400 , 12800 and 25600)
  • various lighting: normal/dark and outdoor normal lighting condition.
  • Captured scenes: buildings, classrooms, cafe rooms, outdoor scenes
  • Objects: books, pens, bottles, boxes, joys, etc.

Denoising benchmark:

  • Methods designed for AWGN achieves lower performance in PSNR and SSIM (even the Deep learning method) compared to methods developed for real-world noisy image
  • Gray image techniques process color channel independently –> create more artifact, cannot handle different noise characteristic for different channels, different local patch
  • Training based methods depend on training dataset as well as the resolution of the training dataset. Using real-world dataset for training might be useful.
  • Best methods are conventional methods (not deep learning): Guided, MC-WNNM, TWSC (see their paper for more details)

My comments

  • The data is certainly only for the indoor scene, missing human/person –> outdoor dataset and human data are desired
  • Low-light condition dataset will be very interesting. The most advanced cameras are claimed to handle low-light condition better.
  • I wonder the similar dataset for smart-phone cameras. However, it is quite difficult since images taken by smartphone camera are heavily processed.
  • More and more smartphone adopt multiple cameras, it would be interesting to see the problem of joint images denoising of same or different camera types.

 

Paper:

J. Xu, H. Li, Z. Liang, D. Zhang, and L. Zhang, “Real-world noisy image denoising: A new benchmark,” Available at arxiv

Advertisements

Quadtree plus Binary Tree (QTBT)

This is the first post in the series keys technologies for future video coding, which are implemented in the Joint Exploration Model (JEM).

As our previous review on Quadtree partition, H.265/HEVC creates multiple partition types (CU, PU, TU) [1]. JEM unifies all partition types with the combination of Quadtree and Binary tree, thus named as QTBT [2, 3]. The details different are given as follows

QTBT_table

One details implementation, JEM 7.1 supports different QTBT tree for luma and chroma for Intra frame while using a single tree for Interframe.

In general, QTBT outperforms HEVC for its simplified structure of CU, PU, and TU which is enabled by the non-square transforms. However, there are several issues could be further improved with QTBT as follow

  • QTBT is limited in the depth of Quadtree and Binary tree. Currently, QTdepth is 3. This fixed structure limits the minimum and maximum CU size in a CTU.
  • Currently, 2 bits are required to distinguish partitioning mode (quadtree or binary) and 2 bits for horizontal or vertical splitting. Therefore, one possible distinct partitioning is wasted.
  • The structure of QT first then BT somehow reduces the inefficiency of coding tree representation but also might create some in-efficient in partitioning.
  • QTBT mimics asymmetry PU partitioning with binary partitioning but not exactly the same. For instance, instead of having 2 asymmetric blocks as NxN/4 and Nx3N/4 in HEVC PU, JEM will split into 3 sub-block of NxN/4, NxN/4, and NxN/2. Each sub-block will perform independent prediction thus might cause some redundancy either in prediction mode or later merge mode information.

These problems of QTBT motivated further research in partitioning for JEM which will be addressed in the future posts.

Reference

  1. V. Sze, M. Budagavi, and G. J. Sullivan. High-efficiency video coding (HEVC): Algorithms and Architectures, Integrated Circuit and Systems, Algorithms and Architectures, 2014.
  2. J. Chen and et al., “Algorithm Description of Joint Exploration Test Model 7 (JEM 7)”, JVET-G1001-v1, 7th Meeting: Torino, IT, 13-21 July 2017.
  3. Wang, Zhao, et al. “Local-constrained quadtree plus binary tree block partition structure for enhanced video coding.” Visual Communications and Image Processing (VCIP), 2016. IEEE, 2016.

Quadtree Partition in HEVC

This post briefly reviews quadtree partitioning technique in HEVC.

Firstly, I would like to remind the hybrid coding scheme that driven video coding standard since the 1980s. A picture is first partitioned into blocks, and then each block is predicted using either intra-picture or inter-picture prediction. In either case, the resulting prediction error (taking the difference between the original block and its prediction) is transmitted using transform coding.

hyrbid coding

Before diving into the detail, let clarify some basic term in HEVC. Firstly, Coding Tree Units (CTU) is the basic processing unit in HEVC which correspond to the concept of the macroblock in H.264/AVC. One CTU consist of 1 luma Coding Tree Block (CTB) and 2 chroma CTBs with the associated syntaxes.

CTU concept

The CTU size is configurable in HEVC and will be transmitted with the N values. HEVC support CTU size of 16×16 to 64×64. At the first step of partitioning, HEVC a frame into independent CTUs which can be further partitioned to smaller size block named Coding Unit (CU). We also have the similar concept of Coding Block (CB). This process is necessary for video coding for several reasons

  • Spatial, color and temporal correlation are locations dependent so that CTU might be too big to decide whether Intra or Inter prediction is better.
  • The small block is required for the assumption of motion prediction. That is every pixel inside a block is moving in the same direction thus shared the same motion vector.
  • The efficient transform for each block size is different.
  • Encoding whole frames would require significant computing resources including computation complexity and memory access. This could lead to significant delay or low-throughput.

Therefore, partitioning is a significant important process in the video coding standard. In fact, many other coding tools including intra coding, inter coding, transform, deblocking filters, etc. have to develop to suitable to the partitioning technique.

QT examp

The CUs inside a CTU are coded in a depth-first order (Z-scan order). It ensures that for each CU (except at the top or left boundary of a slice), all samples above the CU and left to the CU have already been coded so that the corresponding samples can be used for intra prediction. HEVC encode the splitting structure from depth 0 to the maximum depth with 1/0 represent split/not split. In addition, HEVC only enables coding with frame size divided by the minimum CU size, which is 8×8 by default.
For each CU, there can be multiple Prediction Units (PUs) to indicates whether the CU is coded using intra-picture prediction or motion-compensated prediction. HEVC support 2 PU size for intra, and 8 PU partition sizes for inter prediction.

QT_Intra_PUQT_Inter_PU

Besides, the relationship between PU and TU is also different in intra and inter prediction. In intra prediction, residual quadtree for TU will consider PU as the root size and intra prediction is performed with respect to TU. In inter prediction, TU residual quadtree has no relationship with PU partitioning. These differences make QT quite cumbersome to understand. In addition, separating three different trees creates syntax overhead.

Therefore, in the future video compression standard, Quadtree plus Binary Tree (QTBT) structure was proposed to address the problem of QTBT.

Reference.

  1. Sze, Vivienne, Madhukar Budagavi, and Gary J. Sullivan. “High-efficiency video coding (HEVC).” Integrated Circuit and Systems, Algorithms and Architectures (2014)

Install Keras in Ubuntu 14.04, CUDA 7.5.18

Keras Install Ubuntu

I really went through difficult time in installing Keras on Ubuntu 14.04 Trusty Tahr. The following is my step on installing. The basic installation is guided [1], [2] and my experience on installing it.

Firstly, my system information are following

  • Ubuntu 14.04 Trusty Tahr
  • GPU: GTX 980ti
  • Miniconda 2
  • Python 2.7
  • CUDA: 7.5.18

1. Install requires package in python 2.7

Those package are: numpy, scipy, pyyaml, h5py, HDF5 through pip command

$ sudo apt-get install python-pip
$ pip install numpy

If you encounter problem of fails with errorcode 1 then you should update pip and python package as

$ sudo apt-get install python-dev -—upgrade pip

If you still problem of installing package with pip, then you should use

$ sudo -H pip install numpy
$ sudo -H pip install scipy
$ sudo -H pip install pyyaml
$ sudo -H pip install HDF5
$ sudo -H pip install h5py

 

2. Install miniconda:

We will instal Miniconda 2, with support python 2.7. You should go to folder contain Miniconda downloaded package if needed.

$ wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-`uname -p`.sh
$ bash Miniconda-latest-Linux-`uname -p`.sh -b
$ rm Miniconda-latest-Linux-`uname -p`.sh

-b option makes script run with default setting. Now, we need to setup $PATH to use miniconda

$ export PATH=/root/miniconda2/bin:$PATH

3. Install Keras

$ sudo -H pip install --upgrade keras

4. Install Tensorflow

This is the workaround the bug for install tensorflow. See this for more information. Here I select Tensorflow

$ pip install --upgrade -I setuptools
$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.10.0rc0-cp27-none-linux_x86_64.whl

There is a note here. At the time I install Tensor flow (2016.10.12), the lastest Tensorflow broke Keras with error like

keras tensorflow attributeerror module object has no attribute 'control_flow_ops'

Then you have to check the Tensorflow version. In my case, I have to stick with tensor flow version v.0.10.0rc0.

 

 

5. Install NVida CUDA 7.5.18

Perhaps installing NVidia on ubuntu is the most frustrated step. I faced with black screen Ubuntu, Grub corrupt, etc.. and have to reinstall my PC several times. I hope this following step will help you out and save your time

Step 1. Check your Graphic card model

$ lspci | grep -i nvidia
$ uname -m
$ gcc --version

At this step, you need to make sure that your GPU is on the CUDA support list, and computer is x86_64 and gcc is installed.

Step 2. Download the Cuda runfile. It’s about 1.1Gb, so take your time.

$ wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run

It will be stored in your current directory.

Step 3: Disable Neuveau driver.

By creating a file

$ sudo nano /etc/modprobe.d/blacklist-nouveau.conf

with the following contents:

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

 

Step 4: Enter text-only mode.

In some guideline, they suggest Ctrl-Alt-F1. However this one not work for me. My screen turn dark and nothing happen. You can comeback to graphic stage with Ctr-Alt-F6 (or other Fx, x= 2,3,..,9). I follows here.

$ sudo sed -i -e 's/#GRUB_TERMINAL/GRUB_TERMINAL/g' /etc/default/grub
$ sudo update-grub

Then reboot your computer, it should enter the command mode, then login and type

$ sudo init 3

which is the most powerful level and then shut down the X server (GUI linux) desktop manager, which is named “lightdm” in ubuntu.

$ sudo service lightdm stop

Step 5. Install the CUDA.

Go to the folder that you download the runfile and install it

$ sudo sh cuda_7.5.18_linux.run

Then you follow all the defaults setting. In my experience,  Ubuntu collapse once time. My computer no longer boot but only grub screen. The reason would be “I agree on install Accelerate Driver (the first option after “Accept”)” while I already install Nvidia driver before. In fact, I was not agree on install Accelerate default setting first, and it turns out error too. Then, I follow their suggestion with

$ sudo sh cuda_7.5.18_linux.run -silent -driver

Step 6. Restart X server (GUI linux) desktop manager,

which is named “lightdm” in ubuntu

$ sudo init 5
$ sudo service lightdm start
$ sudo nano ~/.bashrc

 

Step 7. Configure CUDA for your environment path,

So edit your bashrc file

$ sudo nano ~/.bashrc

and edit the file as follow

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Then it’s time to reboot your computer.

 

Step 8. Checking CUDA installation.

To check this we should compile cuda samples first which often located in /usr/local/cuda/samples. Go there and compile by

$ sudo make

It depend on your PC configuration, you can speed up the compiling time by using multiple thread with

$ sudo make -j8

Then you can test your driver with

$ cd /1_Utilities/deviceQuery
$ ./deviceQuery

or

$ lspci | grep -i nvidia

 

6. Setup the backend of Keras

$ mkdir -p ~/.keras
$ echo '{"epsilon":1e-07,"floatx":"float32","backend":"tensorflow"}' > ~/.keras/keras.json

 

7. Setup Tensorflow for Keras

Make sure that Keras runs with TensorFlow backed

$ curl -sSL https://github.com/fchollet/keras/raw/master/examples/mnist_mlp.py | python

If it run and return you the results. Congrats! You done with this installation mess.

360-degree Video Compression – Challenges, Related Work and Motivations

This post, we only concern about challenges of delivering 360 video content [7] as well as some prior information that we can use to improve the compression ratio. Firstly, the challenges of compression are listed as follows [7]
1. High frame rate and resolution: as showing video very close to your eye, it requires very high frame rate (60fps) at high resolution. Usually, it would need 30 Mbps [3] for acceptable viewing quality which far greater than normal Wifi speed (5 Mbps).
2. Huge computation for encoding, decoding: There are significant data to compress and huge computational complexity requirement for both encoder and decoder side. You will “burn” your Samsung smartphone with Gear VR within 30 minutes. A lot of computing power will be required.
3. Low delay: your VR has to response to your movement very fast. In IPTV industry, a delay of 1-2 is acceptable before user switches the channel. But for VR content, the delay should be much smaller.

So we need a very efficient video compression technique for 360 video content.

As prior information ( that is something you already knew in advance) will help us improve compression problem, we list some here

1. We just view a small portion scene at the same time. There no need to process full 360-degree view. To utilize this prior, we need to know the current view, as well as predict the next view.

2. The scene quality would be proportional to the possibility that you will look at it in seconds. For instance, we do not need to process the opposite to the current view at full quality (e.g., resolution, frame rate).

3. We often focus on the main view (the center of the scene), that is the reason why some work was blurring your boundary frame to gain some bit-rate.

3. Physical limitation: human head can not move in any direction at any speed. For instance, many people found that they move faster in the horizontal than the vertical direction. Therefore, there will be a head moving pattern, velocity, limitation for all people in general, and for each in particular. Besides, we always move our head a little bit.

4. Scenes/views are not equally important. There are often “main view” that contains the most interesting things so that most people will watching it. For example, when a player scores a goal, the main scene of a movie, etc.

 

Previous work: We only review the most recent work in this topic, including Facebook approach [4] and researchers [2, 3, 8].

Facebook [8] has done a great job by introducing (1) cube model and (2) pyramid resolution scaling and using H.264 compression standard. Instead of projection on the sphere, they project different views onto a cube. In this sense, the unfolded picture will not be distorted which will benefit motion compensation technique of conventional video coding ( standard video coding often assumes transitional motion). They avoid black region by re-arrange frame into a rectangular frame.

They pointed out that blurring helps improving coding efficiency but with some threshold. Meanwhile, downscaling gives same compression performance as blurring but better visual quality. Their pyramid-like model follows the motivation 2 and 3.

Moreover, they were pre-rendering the video with more than 30 different views in the server. For more details, the readers should refer to [4].

Certainly, their method is very straightforward and efficient. (1) In my opinion, their simple cube model would introduce more distortion into the image. —> there should have a little more sophisticated rectangular cube (more than six faces).  Moreover, their solution is limited to streaming application, which is not necessary the solution for viewing storage content.

 

In another hand, H.264 is popular now, but it is already old. H. 265 would be a better choice for coming years, especially when you working with the new smartphones, devices. One of the fascinating features of HEVC is tile-based processing [1] which over-performs slice-based processing regarding parallelization. Instead of processing whole frames, you can process each tile (a large image block) independently.  (You can see the elegance review paper [1] for more detail on tile-based processing). Tile-based processing for the 360-degree video was first proposed in [2] and extended to SHVC (scalable HEVC) in [3].

Why we need tile-based processing? As we interested in a particular view, which do not require to decode all frames. And tile-based decoder realizes this process in the much efficient way. Also, SHVC is more suitable for both streaming and storage video content. (you need more “scale” in SHVC than usual).

However, the second approach does not address the problem of motion compensation. Therefore, I expect an excellent performance from combining that two approach which is “Tiled-based SHVC for 360-video coding with cube projection.” 

Moreover, both approaches do not utilize the prior information of human limitation which will be useful to predict viewing change. Also, the machine-learning approach would be helpful for prior information 3 and 4 for predicting view.  (Facebook did mention about this but under development) 

 

In next post, I will present the idea of machine-learning based approach for viewing-prediction. 

 

Reference

[1] K. Misra, A. Segall, M. Horowitz, S. Xu, A. Fuldseth, and M. Zhou, “An Overview of Tiles in HEVC,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 6, pp. 969–977, Dec. 2013.

[2]Y. Sanchez, R. Skupin, and T. Schierl, “Compressed domain video processing for tile based panoramic streaming using HEVC,” in Image Processing (ICIP), 2015 IEEE International Conference on, 2015, pp. 2244–2248.

[3]Y. Sanchez de la Fuente, R. Skupin, and T. Schierl, “Compressed Domain Video Processing for Tile-Based Panoramic Streaming using SHVC,” 2015, pp. 13–18.

[4] http://venturebeat.com/2016/01/21/facebook-open-sources-transform-a-tool-that-cuts-360-degree-video-file-size-by-25/

[5] S. Perera and N. Barnes, “1-Point Rigid Motion Estimation and Segmentation with a RGB-D Camera,” Digital Image Computing: Techniques and Applications (DICTA), 2013 International Conference on, Hobart, TAS, 2013, pp. 1-8.

[6] http://bertolami.com/index.php?engine=blog&content=posts&detail=virtual-reality-video-codec

[7]  http://blog.beamr.com/blog/2015/08/06/360-degree-videos-bring-a-new-cinematic-experience-but-create-delivery-viewing-challenges/

[8] M. Budagavi, J. Furton, G. Jin, A. Saxena, J. Wilkinson, and A. Dickerson, “360 degrees video coding using region adaptive smoothing,” in Image Processing (ICIP), 2015 IEEE International Conference on, 2015, pp. 750–754.

Challenges of 360 degree video

360-degree videos ( or 360 videos, immerse videos) creates immerse viewing experience so that you can select interested views. You can control your views via interactions (head movement with head up display, a relative position of your smartphone, or interactive buttons, etc.). Currently (2016), you can record this immersed video with Samsung Gear 360, LG 360, Nokia camera, GoPro and much more. You can enjoy your video on your smartphone with Samsung Gear VR, LG VR, or in PC with Oculus Rift, or many VR headset out there.
The current processing pipeline of 360 content is given as follow [8]

processing_chain
Fig. 1. 360-degree video processing chain
processing_chain_effect
Fig. 2 360-degree video processing chain, results at each step.

After capturing multiple images at difference views, images are aligned stitched together to form a sphere (from a to c). The equirectangular map the 360degree to the 2D image (as in fig b) be for we encode it. Why we have to do that? It is because the conventional video coding only processes rectangular block image. The decoder side has to handle the mapping onto a sphere and rendering the corresponding output. As you might observed, there is alway a distortion in the boundary of a view. Working with 360 videos, we have to address the problem of capturing, compressing/delivery and viewing its content. I only concern about the processing/compression part.
There are several problem that we have to improve the 360 video contents including (many parts had been mentioned here [6])

  1. Distorted image: due to the limitation of hardware ( to save money, manufacturers often reduce the number of camera in their 360 video capturing (e.g. Samsung Gear 360 with only two cameras) and use wide lens (like fisheye lens). It results in more distortion of the captured images.
  2.  Stitching problem: unlike common stitching problem in Panorama imaging where you have many images to play. Sticking in 360 videos is very challenges with just a few input images. Their overlapping region is very limited. Therefore, more distortion, missing object, ghosting effect, flick will present.
  3. Rendering problem: as projecting images onto a sphere, another distortion is introduced. If the sphere diameter is long, then you will experience less distortion, but it often another case. You may experience some distorted object in the border of the current view. If there is some person in the current view, nearby the border, half of his/her body is going to bigger than another half.
  4. High resolution and high frame rate: if you want to display something very close to your eye, it has to be viewed at very high resolution and frame rate. One thing I hate most, that they tend to blur the region nearby the boundary. I still in the current view, and I might look at the edge without moving your head, and you will see a blur thing which is very inconvenient. –> they should find a better way to compress it, instead of blurring.
  5. Limited interaction: even it was defined as “immersive video” but it not that immersive. You only can moving your head around but cannot move around. —> light field 360 videos would somehow solve this problem as we can moving around the scene a little bit.
  6. HDR content: Unlike the conventional HDR content where we could see both very bright and dark region at a same time. 360-HDR content requires adaptive changing the dynamic range of the views which is closer to our human eye.
  7. Too much effort to enjoy: to enjoy the movie, you have to move a lot which is not necessary a good idea. So many movements. You would get tired soon. –> content creator should take this problem into account.
  8. High chance of missing contents: Again, there are many views, but which one should I look at before it disappears. Not every scene is equally important and exciting. This problem becomes more severe when watching contents like movies (the main story is more likely to miss as each person see a different version of the content), sports (some players score a goal, who will you blame for, yourself?), etc.   –> There should have content/view suggestion during the playing content.
  9. There is a blind spot. You cannot see it. –> we can overcome this by using inpainting technique (fill something in, not just blurring).

 

Next post, we will focus on the problem of compression 360-degree video contents.

Reference

[1] K. Misra, A. Segall, M. Horowitz, S. Xu, A. Fuldseth, and M. Zhou, “An Overview of Tiles in HEVC,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 6, pp. 969–977, Dec. 2013.
[2]Y. Sanchez, R. Skupin, and T. Schierl, “Compressed domain video processing for tile based panoramic streaming using HEVC,” in Image Processing (ICIP), 2015 IEEE International Conference on, 2015, pp. 2244–2248.
[3]Y. Sanchez de la Fuente, R. Skupin, and T. Schierl, “Compressed Domain Video Processing for Tile-Based Panoramic Streaming using SHVC,” 2015, pp. 13–18.
[4] http://venturebeat.com/2016/01/21/facebook-open-sources-transform-a-tool-that-cuts-360-degree-video-file-size-by-25/
[5] S. Perera and N. Barnes, “1-Point Rigid Motion Estimation and Segmentation with an RGB-D Camera,” Digital Image Computing: Techniques and Applications (DICTA), 2013 International Conference on, Hobart, TAS, 2013, pp. 1-8.
[6] http://bertolami.com/index.php?engine=blog&content=posts&detail=virtual-reality-video-codec
[7] http://blog.beamr.com/blog/2015/08/06/360-degree-videos-bring-a-new-cinematic-experience-but-create-delivery-viewing-challenges/
[8] M. Budagavi, J. Furton, G. Jin, A. Saxena, J. Wilkinson, and A. Dickerson, “360 degrees video coding using region adaptive smoothing,” in Image Processing (ICIP), 2015 IEEE International Conference on, 2015, pp. 750–754.

Recommendation System 1

As reading along the “recommendation system (RS)” topic, I realize that they have many potential applications. RS is everywhere from any eCommerce websites (Amazone, Alibaba), to news ( Google News, any digital newspaper), social network (Facebook, Twitter, Instagram), application store (Apple App Store, Google play, Microsoft Store, Steam, GoG, etc), to music (iTune, Spotify, Pandora), movie (Netflix, IMDB, Jini ), education (Coursera, edx, Udemy), transportation (Uber), job suggestion (LinkedIn), dating (Tinder, Okcupid) … We can saw it from many websites, devices. With RS, it can suggest us what/where/when/who/which to do/eat/listen/watch/go/learn etc. Personal assistance is also provided us suggestions.

Picture1

As RS is such important, I decide to study it a little bit. I even can link patch-based image processing as recommendations as both share a root of machine learning. So let start our journey with RS from the scratch. I will follow the book “Recommendation System in R” as they provide source code in R programming ( very nice, free language). This post we will give the brief definition of RS and list of interesting reference.

What is Recommendation System?

Recommendation System is a system help matching users (viewer, reader, buyers, listeners, organizers, students) with items (movie, news, shopping items, job, candidate, class, partner).

or

Recommendation System is an information filtering technique which suggested content that you likely to interest.

Picture2    —>     Picture3

 

Why we need RS?

(1) Ease information: The Internet contains an ocean of information, tons of movies, music, news, etc. Do you interest in all of those? Of course not. We only like, consume a particular movies, music (much less than 1% of available contents). You only want to see something you interest, something you like (most of the time).

(2) Everyone has their tastes. You love romance; I love action movies.

(3) Seller wants to sell things, and you want to buy stuff: Recommendation satisfies the user needs, the content provider/seller and the website (business goal) are satisfied.  Of course, there is another side of the coin, but we are not going to discuss it this time.

How RS works?

The RS is guessing related item to the user with given input data.

Picture4

Another explanation can be found here.

 Reference

[1] http://www.slideshare.net/VikrantArya/recommendation-system-33379953?qid=ded2904b-d569-4a26-bb2c-a61092256eb7&v=&b=&from_search=2
[2]